Find the Gap - Regression Discontinuity Design

Today, we venture into another fascinating topic in the realm of econometrics and statistics: the Regression Discontinuity Design (RDD). This quasi-experimental pretest-posttest design is used to identify causal effects and is particularly useful when random assignment of treatment is not feasible.

Regression Discontinuity Design: An Overview

The Regression Discontinuity Design (RDD) is a research design that leverages a precisely set cutoff point on a continuum of scores to assign subjects to a treatment or control group. Subjects on one side of the cutoff are assigned to the treatment group, and those on the other side to the control group. The key idea is that subjects close to the cutoff are likely to be very similar, allowing for a quasi-experimental comparison.

Assumptions

RDD relies on several key assumptions:

Continuity of the Regression Function: The expected outcome of the untreated subjects should be a continuous function of the assignment variable at the cutoff point.
No Manipulation of the Assignment Variable: The subjects cannot manipulate the assignment variable to control whether they receive the treatment or not.
Local Randomization: Around the cutoff, the assignment to the treatment can be considered as good as random.

Method

The RDD effect is typically estimated using local linear regression or polynomial regression. It's estimated as the difference in the outcomes just to the right and left of the cutoff.

Example

Consider an education policy where students scoring above a certain cutoff on a test receive a scholarship for higher education. We want to measure the effect of this scholarship on students' future earnings. Using RDD, we compare the future earnings of students who scored just above the cutoff (and received the scholarship) with those who scored just below (and did not receive the scholarship).

Limitations, Pros and Cons

RDD has its limitations. It provides local treatment effects, which are valid only around the cutoff. It's also sensitive to the correct specification of the regression function and the bandwidth selection.

However, RDD also offers significant advantages. It does not require a strong assumption like the parallel trends assumption in DiD. Also, it can provide a credible way of estimating causal effects when random assignment of treatment is not feasible.

Regression Discontinuity Design in Practice

To illustrate RDD, let's create a plot using a hypothetical example. We'll generate some data around a cutoff, apply a treatment effect, and demonstrate how RDD works.

Please note that in a real-world scenario, data would be collected from actual observations rather than being generated.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Create an array of x values from -50 to 50
x = np.linspace(-50, 50, 500)

# Set a cutoff at x = 0
cutoff = 0

# Generate some noise
np.random.seed(0)
noise = np.random.normal(0, 10, 500)

# Generate y values: for x < 0, y = 2x + noise; for x >= 0, y = 2x + 20 (treatment effect) + noise
y = np.where(x < cutoff, 2*x + noise, 2*x + 20 + noise)

# Fit linear regression models on either side of the cutoff
model_left = LinearRegression().fit(x[x < cutoff].reshape(-1, 1), y[x < cutoff])
model_right = LinearRegression().fit(x[x >= cutoff].reshape(-1, 1), y[x >= cutoff])

# Generate predicted y values for the fitted models
y_pred_left = model_left.predict(x[x < cutoff].reshape(-1, 1))
y_pred_right = model_right.predict(x[x >= cutoff].reshape(-1, 1))

# Plotting
plt.figure(figsize=(10, 6))
plt.scatter(x, y, s=10, label="Data Points")
plt.plot(x[x < cutoff], y_pred_left, color='r', label="Regression Line (Left of Cutoff)")
plt.plot(x[x >= cutoff], y_pred_right, color='g', label="Regression Line (Right of Cutoff)")
plt.axvline(x=cutoff, color='b', linestyle='--', label="Cutoff")
plt.xlabel("Assignment Variable")
plt.ylabel("Outcome Variable")
plt.legend()
plt.title("Regression Discontinuity Design Plot")
plt.show()

In the plot, we can see the discontinuity at the cutoff. By comparing the outcomes just to the right and left of the cutoff, we can estimate the causal effect of the treatment.

Conclusion

The Regression Discontinuity Design provides a robust and credible way of estimating causal effects when random assignment is not possible. It's widely used in economics, political science, education, and other fields. While it comes with its own set of assumptions and limitations, when used correctly, RDD can provide valuable insights into the causal impact of interventions.