When Perfect Correlation Breaks Attribution: Why Most MMMs Randomly Assign Credit
The Core Problem
When marketing channels move in lockstep (correlated spend patterns), standard linear regression models suffer from multicollinearity - they mathematically cannot distinguish between channels, so they arbitrarily assign credit based on which variable the solver encounters first.
This isn't a strategy problem. It's a linear algebra degeneracy that causes most MMMs to produce unstable, untrustworthy coefficients.
The Scenario
You're a marketing analyst on December 27th running a post-mortem on Q4 performance. Leadership wants to know: "Should we double down on Facebook or Google for Q1?
Your Clients data:
- Week 1: Facebook $10K, Google $10K → Revenue $50K
- Week 2: Facebook $20K, Google $20K → Revenue $100K
- Week 3: Facebook $5K, Google $5K → Revenue $25K
Perfect correlation (r = 1.0). Both channels scaled identically throughout the quarter.
What Standard Linear Regression Does
You run a simple OLS (Ordinary Least Squares) model:
from sklearn.linear_model import LinearRegression X = [[10, 10], [20, 20], [5, 5]] # [FB spend, Google spend] y = [50, 100, 25] # Revenue model = LinearRegression() model.fit(X, y) print(model.coef_) # Output: [2.5, 0.0] or [0.0, 2.5] or [1.25, 1.25] # Completely unstable - changes based on random solver initialization ``` The mathematical reality: The model is trying to solve this equation: ``` Revenue = (β₁ × Facebook) + (β₂ × Google) ```
But when Facebook and Google are perfectly correlated, there are infinite valid solutions:
Revenue = 2.5 × Facebook + 0 × Google ✓
Revenue = 0 × Facebook + 2.5 × Google ✓
Revenue = 1.25 × Facebook + 1.25 × Google ✓
The model picks one arbitrarily. Run it twice, get different coefficients. This is why CFOs lose faith in MMMs.
The Real-World Impact: A $180K Budget Misallocation
I encountered this exact problem working with a DTC furniture brand ($850 AOV, 18-day consideration cycle). They ran Facebook and Google in tandem for 6 months—both channels scaled up during peak season, both pulled back during slow periods.
Their initial MMM results (using standard OLS):
| Model Run | Facebook ROAS | Google ROAS | Recommendation |
|---|---|---|---|
| Run 1 (Monday) | 3.2x | 0.1x | Kill Google, go all-in Facebook |
| Run 2 (Tuesday) | 0.2x | 3.1x | Kill Facebook, go all-in Google |
| Run 3 (Wednesday) | 1.6x | 1.5x | Keep both roughly equal |
How Ridge Regression Solves This
Ridge regression solves multicollinearity by adding a penalty term that forces the model to distribute credit more evenly when it can't distinguish between variables.
The Modified Equation:
Minimize: Σ(actual - predicted)² + λ × Σ(coefficients²)
↑ ↑
Fit the data well But keep coefficients small & stable
The penalty term (λ) prevents any single coefficient from dominating. When two channels are correlated, Ridge forces them to share credit rather than arbitrarily giving it all to one.
from sklearn.linear_model import Ridge from sklearn.preprocessing import StandardScaler import numpy as np # Your spend data X = np.array([[10, 10], [20, 20], [5, 5]]) y = np.array([50, 100, 25]) # CRITICAL: Standardize features before Ridge # (Ridge penalty is scale-dependent) scaler = StandardScaler() X_scaled = scaler.fit_transform(X) # Ridge with lambda=1.0 ridge_model = Ridge(alpha=1.0) ridge_model.fit(X_scaled, y) print(ridge_model.coef_) # Output: [1.25, 1.25] (stable across multiple runs)
What changed:
- Standard OLS: Coefficients swing wildly ([2.5, 0] → [0, 2.5])
- Ridge Regression: Coefficients stabilize at [1.25, 1.25] - credit shared equally
Choosing Lambda (α): The Bias-Variance Tradeoff
The alpha parameter controls how aggressively you penalize large coefficients:
- α = 0: No penalty (equivalent to standard OLS - unstable)
- α = 0.1: Light penalty (slight stability improvement)
- α = 1.0: Moderate penalty (good starting point)
- α = 10: Heavy penalty (very stable, but might underfit)
from sklearn.model_selection import RidgeCV
# Test multiple alpha values via cross-validation
alphas = [0.01, 0.1, 1.0, 10, 100]
ridge_cv = RidgeCV(alphas=alphas, cv=5)
ridge_cv.fit(X_scaled, y)
print(f"Optimal alpha: {ridge_cv.alpha_}")
print(f"Stable coefficients: {ridge_cv.coef_}")
Cross-validation finds the α that best balances:
- Fit quality (how well the model predicts revenue)
- Coefficient stability (how consistent results are across data splits)
The Furniture Brand Fix: From Chaos to Clarity
After implementing Ridge regression with α = 2.5 (selected via CV), here's what we found:
Stabilized Attribution (10 model runs, consistent results):
- Facebook ROAS: 1.8x (±0.1x variance)
- Google ROAS: 1.6x (±0.1x variance)
- Interpretation: Both channels drive incremental value; Facebook slightly edges Google
Validation via Geo-Holdout Test: We ran a 4-week geo-based incrementality test to verify:
- Facebook true ROAS: 1.9x (Ridge estimate was 1.8x - 5% error)
- Google true ROAS: 1.7x (Ridge estimate was 1.6x - 6% error)
The saved $180K: Instead of killing Google, we maintained 60/40 Facebook/Google split. Over Q1:
- Total spend: $420K
- Incremental revenue: $738K
- Blended ROAS: 1.76x
If we'd followed the unstable OLS model and gone 100% Facebook:
- Projected revenue: $626K (based on incrementality test)
- Opportunity cost: $112K in lost revenue
When Ridge Regression Isn't Enough
Ridge handles multicollinearity, but it doesn't solve causality. It only stabilizes correlation-based attribution.
You still need incrementality testing when:
- Channels have different lag structures (Facebook converts Day 1, Google converts Day 7)
- Adstock effects differ meaningfully (TV has 6-week decay, Paid Social has 3-day)
- Severe selection bias exists (retargeting only reaches high-intent users)
The Hierarchy of Evidence:
- Geo-holdout experiments → Gold standard (true causality)
- Ridge/Bayesian MMM → Good directional guidance (stable correlation)
- Standard OLS MMM → Unreliable when multicollinearity exists
What I'd Do Differently Today
1. Always Check VIF (Variance Inflation Factor) First
Before running any MMM, diagnose multicollinearity:
from statsmodels.stats.outliers_influence import variance_inflation_factor import pandas as pd vif_data = pd.DataFrame() vif_data["feature"] = ["Facebook", "Google"] vif_data["VIF"] = [variance_inflation_factor(X_scaled, i) for i in range(2)] print(vif_data) # VIF > 10 = severe multicollinearity, Ridge is essential
2. Use Bayesian MMM (Robyn/Meridian) Instead of Rolling Your Own
Modern Bayesian frameworks handle this better:
- Meta's Robyn: Built-in regularization + hyperparameter tuning
- Google's Meridian: Hierarchical priors that encode channel skepticism
# Example Robyn implementation # Robyn automatically applies regularization robyn_model <- robyn_run( InputCollect=robyn_inputs, lambda="auto" , # Cross-validated penalty selection iterations=2000 ) ``` ### 3. Document Coefficient Stability in Every MMM Report Leadership doesn't trust "Facebook ROAS = 2.1x" when they've seen it swing to 0.3x the next week. Show them this instead: - Facebook ROAS: 1.8x (95% CI: 1.6x - 2.0x) - Coefficient stability: ±0.1x across 100 bootstrap samples - Validation: Geo-holdout test confirmed 1.9x (within 5% of model) ## The Broader Lesson Most MMMs fail not because of bad data, but because of unstable math. When channels are correlated (which they almost always are in real marketing programs), standard linear regression produces random, untrustworthy coefficients. You're essentially flipping a coin to decide which team to fire. Ridge regression doesn't give you causality - but it gives you consistent, defensible estimates that won't change when you re-run the model tomorrow. Code doesn't just calculate value; it defines it. Without regularization, your attribution strategy is at the mercy of a random solver initialization. With Ridge, you force the model to acknowledge uncertainty and share credit appropriately. The fix isn't better spend data. It's better linear algebra.