In recent research, I found that multiple methods in AB testing and causal inference are relyed on the same theory: Frisch-Waugh-Lovell Theorem. However, to dig in this topic, I want to breifly explain the relation between linear regression and t-test.
T-test v.s. Linear Regression
They are the same, if variable in linear regression is dummy ({0,1})
In T-test, we always calculate the t-statistic $T$ after experiment using treament $X$ and outcome $Y$. However, if we use linear regression, we can find out that
$$ Y = TX + \varepsilon $$
Also, since $X$ means whether the outcome $Y$ is come from control group ($X=0$) or treatment group ($X=1$).
Frisch-Waugh-Lovell Theorem
In linear regression, when we want to know the relation between $X$ and $Y$, but I want to control (partial out) $Z$, I just do the regression like the following
$$
Y \sim X+Z
$$
However, there is another way to get it, and it is called FWL theorem:
- $Y\sim Z$, and we get $Y$ residual $Y^*$
- $X\sim Z$, and we get $X$ residual $X^*$
- $Y^\sim X^$, and we still get the same relation.
Double Machine Learning
Double Machine Learning is a causal inference method, the goal is to find the relationship without other variates effect. The concept is simple.
In FWL theorem, we use linear regression $f(\cdot)$:
- $Y=f(Z)+\varepsilon$, and we get $Y$ residual $Y^*$
- $X=f(Z)+\delta$, and we get $X$ residual $X^*$
- $Y^=f(X^)+\eta$, and we still get the same relation.
In Double Machine Learning, we change linear regression $f(\cdot)$ to machine learning method $D(\cdot)$.
- $Y=D(Z)+\varepsilon$, and we get $Y$ residual $Y^*$
- $X=D(Z)+\delta$, and we get $X$ residual $X^*$
- $Y^=D(X^)+\eta$, and we get the relation.
Control Variates
Control Variates is also called Regression Adjustment, is a way to decrease the variance.
How to do variance reduction with this method, first we need to find variate $Z$ other than $X$ and $Y$. $Z$ needs to follow some rules
- $Z$ depends on $Y$, more correlated better.
- $Z$ is independent on $X$
If $Z$ is independent on $X$, then we do not have to do the second step on FWL theorem. If we use it on linear regression, then after finding $Z$, the following steps are:
- $Y\sim Z$, and we get $Y$ residual $Y^*$
- $Y^*\sim X$, and we get the relation.
Here is called Regression Adjustment. But if we change our view on Hypothesis Testing, then our process will focus on how to change the $Y$ into $Y_{adj}$.
$$
Y_{adj}=Y-f(Z)=Y-\theta Z
$$
However, since we want $Y$ is an unbiased estimator, that is $E(Y_{adj})=E(Y)$, so we need to change the above equation into below one:
$$
Y_{adj}=Y-\theta Z+\theta \bar{Z}
$$
where $\bar{Z}$ is the average of $Z$. Also, since our goal is to decrease variance, after some simple calculation, the $\theta$ will be
$$
\theta = \cfrac{cov(Y,Z)}{var(Z)}
$$
This is just the result from $Y\sim Z$. Therefore, we just need to find the suitable $Z$, and we can reduce variance from $Y$ to $Y_{adj}$. Here is called Control Variates
CUPED (Controlled-experiment Using Pre-Experiment Data)
How to find a perfect $Z$? The most direct one is
The pre-experiment outcome is perfect $Z$
The pre-experiment outcome is related to $Y$, since they are the same outcome. Moreover, The pre-experiment outcome is independent to $X$, since at that time, treatment is not in the outcome.
Also, pre-experiment outcomes is a prefect $Z$ since, we always need to do AA testing before AB testing. We can directly use the data from AA testing.
The method using pre-experiment data is called CUPED.
Reference
https://zhuanlan.zhihu.com/p/604335170
https://ai.stanford.edu/~ronnyk/2013-02CUPEDImprovingSensitivityOfControlledExperiments.pdf
https://www.evanmiller.org/you-cant-spell-cuped-without-frisch-waugh-lovell.html
https://en.wikipedia.org/wiki/Frisch%E2%80%93Waugh%E2%80%93Lovell_theorem