library(tidyverse)
Introduction: pre-post and diff-in-diff
Causal impact assessment workshop
This is the first practical, where we introduce the dataset and we use it to create pre-post and diff-in-diff estimates for the causal effect of the California proposition 99 policy intervention.
You can use your preferred way of working in R to do the practicals. Our preferred way is this:
- Create a new folder with a good name, e.g.,
practicals_causal_impact
- Open RStudio
- Create a new project from RStudio, which you associate with the folder
- Create a
raw_data
subfolder - Create an R script for the current practical, e.g.,
introduction.R
- Create your well-documented and well-styled code in this R script
The answers to each exercise are available as a collapsed code
block. Try to work out the answer yourself before looking at this code block!
In all practicals in this workshop, we make extensive use of the tidyverse
set of packages. You can load these packages like so:
In this practical, we will also use the following two packages:
library(sandwich)
library(lmtest)
The data
We will be using the proposition99
dataset that we introduced in the lecture. We have prepared the dataset for you to download here
. It is an rds
file, which is a convenient, portable, and fast binary file format for R.
Pre-post estimator
In this section, you will estimate the causal effect of the policy using the pre-post estimator. For this, you need to select only California from the data, then create a factor variable for the pre and post period, and then use linear regression to estimate the causal effect.
In the lecture, we chose to include 12 years before and after the intervention. In this practical, we will use only 5 years before and after the intervention for our effect estimate.
In the lecture, we did not correct the inference (p-value) for potential autocorrelation. We can do this with the function coeftest()
on our fitted model object.
Difference-in-differences estimator
In this section, we select a suitable control state to perform a diff-in-diff estimate of the causal effect of the policy intervention. In this section, you will not choose Utah as a control state as in the lectures, but one of the following states:
- Nevada
- Montana
- Colorado
Here are the data plots for these three states:
Code
# Diff-in-diff time series figure
|>
prop99 filter(state %in% c("California", "Nevada", "Montana", "Colorado")) |>
ggplot(aes(x = year, y = cigsale, colour = state)) +
geom_line(linewidth = 1) +
geom_vline(xintercept = 1988, lty = 2) +
theme_minimal() +
scale_colour_manual(values = c("orange", "#AA8888", "#88AA88","#8888AA")) +
annotate("label", x = 1988, y = 150, label = "Intervention") +
labs(title = "Panel data for California three potential control states",
y = "Cigarette sales", x = "Year", colour = "")
Conclusion
You have created causal effect estimates using a pre-post design and using a diff-in-diff design, and you have corrected the inferences using heteroskedasticity and autocorrelation consistent standard errors. You have seen that the conclusions are very dependent on the choices made, for example about which period to consider and which control unit to choose.