Partial Out Everything - from FWL theory to CUPED, Control Variates, and Double ML

-partial-out-everything

In recent research, I found that multiple methods in AB testing and causal inference are relyed on the same theory: Frisch-Waugh-Lovell Theorem. However, to dig in this topic, I want to breifly explain the relation between linear regression and t-test.

T-test v.s. Linear Regression

They are the same, if variable in linear regression is dummy ({0,1})

In T-test, we always calculate the t-statistic $T$ after experiment using treament $X$ and outcome $Y$. However, if we use linear regression, we can find out that

$$ Y = TX + \varepsilon $$

Also, since $X$ means whether the outcome $Y$ is come from control group ($X=0$) or treatment group ($X=1$).

Frisch-Waugh-Lovell Theorem

In linear regression, when we want to know the relation between $X$ and $Y$, but I want to control (partial out) $Z$, I just do the regression like the following

$$
Y \sim X+Z
$$

However, there is another way to get it, and it is called FWL theorem:

$Y\sim Z$, and we get $Y$ residual $Y^*$
$X\sim Z$, and we get $X$ residual $X^*$
$Y^\sim X^$, and we still get the same relation.

double-machine-fighting

Double Machine Learning

Double Machine Learning is a causal inference method, the goal is to find the relationship without other variates effect. The concept is simple.

In FWL theorem, we use linear regression $f(\cdot)$:

$Y=f(Z)+\varepsilon$, and we get $Y$ residual $Y^*$
$X=f(Z)+\delta$, and we get $X$ residual $X^*$
$Y^=f(X^)+\eta$, and we still get the same relation.

In Double Machine Learning, we change linear regression $f(\cdot)$ to machine learning method $D(\cdot)$.

$Y=D(Z)+\varepsilon$, and we get $Y$ residual $Y^*$
$X=D(Z)+\delta$, and we get $X$ residual $X^*$
$Y^=D(X^)+\eta$, and we get the relation.

Control Variates

Control Variates is also called Regression Adjustment, is a way to decrease the variance.

How to do variance reduction with this method, first we need to find variate $Z$ other than $X$ and $Y$. $Z$ needs to follow some rules

$Z$ depends on $Y$, more correlated better.
$Z$ is independent on $X$

If $Z$ is independent on $X$, then we do not have to do the second step on FWL theorem. If we use it on linear regression, then after finding $Z$, the following steps are:

$Y\sim Z$, and we get $Y$ residual $Y^*$
$Y^*\sim X$, and we get the relation.

Here is called Regression Adjustment. But if we change our view on Hypothesis Testing, then our process will focus on how to change the $Y$ into $Y_{adj}$.

$$
Y_{adj}=Y-f(Z)=Y-\theta Z
$$

However, since we want $Y$ is an unbiased estimator, that is $E(Y_{adj})=E(Y)$, so we need to change the above equation into below one:

$$
Y_{adj}=Y-\theta Z+\theta \bar{Z}
$$

where $\bar{Z}$ is the average of $Z$. Also, since our goal is to decrease variance, after some simple calculation, the $\theta$ will be

$$
\theta = \cfrac{cov(Y,Z)}{var(Z)}
$$

This is just the result from $Y\sim Z$. Therefore, we just need to find the suitable $Z$, and we can reduce variance from $Y$ to $Y_{adj}$. Here is called Control Variates

the-shadow-of-the-shadow

CUPED (Controlled-experiment Using Pre-Experiment Data)

How to find a perfect $Z$? The most direct one is

The pre-experiment outcome is perfect $Z$

The pre-experiment outcome is related to $Y$, since they are the same outcome. Moreover, The pre-experiment outcome is independent to $X$, since at that time, treatment is not in the outcome.

Also, pre-experiment outcomes is a prefect $Z$ since, we always need to do AA testing before AB testing. We can directly use the data from AA testing.

The method using pre-experiment data is called CUPED.