Synthetic control: tidysynth, inference, robustness

Causal impact assessment workshop

Author

Erik-Jan van Kesteren & Oisín Ryan

In this practical, we will use the following two packages:

library(tidyverse)
library(tidysynth)

We will again be using the proposition 99 dataset:

prop99 <- read_rds("raw_data/proposition99.rds")

In the following sections, you will create an causal effect estimate using the synthetic control method, you will perform inference for this estimate, and you will do some robustness checks.

Synthetic control in tidysynth

The first step in the tidysynth package framework is to create an object from the dataset that will provide the basis of the estimation method.

Exercise 1

Use the the synthetic_control() function to create a synthetic control object from the prop99 data called prop99_syn. Read the help file to (?synthetic_control) if you need to know more about the arguments needed. Set the argument generate_placebos = TRUE (we will need this later).

Code

# create a synthetic control object
prop99_syn <- 
  prop99 |> 
  synthetic_control(
    outcome = cigsale,
    unit = state,
    time = year,
    i_unit = "California",
    i_time = 1988,
    generate_placebos = TRUE
  )

In tidysynth, the grab_*() functions can be used to inspect the object in detail. For example, you can use grab_outcome() we inspect the outcome (cigsale) for the treated unit and the potential controls.

Exercise 2

Inspect the outcome variable for the treated and the control units, to check that everything worked as you expected.

Code

# Treated unit (california)
grab_outcome(prop99_syn)

# control units
grab_outcome(prop99_syn, type = "controls")

The next step is to determine and create the variables that will be used for matching and estimating weights. These are called “predictors” in tidysynth.

Exercise 3

Generate the following predictors using the generate_predictor() function. Assign the result to the prop99_syn object (think of this function as editing the object). You will need to run this function multiple times, once for each time period considered.

Mean (log-)income in 1980-1988
Mean retail price of cigarettes in 1980-1988
Mean proportion of people aged 15 to 24 in 1980-1988
Mean beer consumption in 1984-1988
Cigarette sales in 1975
Cigarette sales in 1980
Cigarette sales in 1988

NB: there are some missing values in this data, so use na.rm = TRUE inside your mean() function.

Code

# create predictors
prop99_syn <- 
  prop99_syn |> 
  # The first three predictors
  generate_predictor(
    time_window = 1980:1988,
    lnincome = mean(lnincome, na.rm = TRUE),
    retprice = mean(retprice, na.rm = TRUE),
    age15to24 = mean(age15to24, na.rm = TRUE)
  ) |> 
  # Beer consumption in 1984-1988
  generate_predictor(
    time_window = 1984:1988,
    beer = mean(beer, na.rm = TRUE)
  ) |> 
  # Cigarette sales in 1975
  generate_predictor(
    time_window = 1975,
    cigsale_1975 = cigsale
  ) |> 
  # Cigarette sales in 1980
  generate_predictor(
    time_window = 1980,
    cigsale_1980 = cigsale
  ) |> 
  # Cigarette sales in 1988
  generate_predictor(
    time_window = 1988,
    cigsale_1988 = cigsale
  )

Now you have created a synthetic control object that includes both the target variable and covariates (predictors) for the treated unit and the units in the donor pool. The next step is to add the weights that define the synthetic control unit.

Exercise 4

Estimate synthetic control weights using the generate_weights() function. Just like the predictors, you should add these weights to the prop99_syn object. Inspect the unit and variable weights using the plot_weights() function.

Code

# generate the weights using the pre-intervention
# time period as the optimization window
prop99_syn <- 
  prop99_syn |> 
  generate_weights(optimization_window = 1970:1988)

# inspect the unit and variable weights
plot_weights(prop99_syn)

Now, everything is in place to create the synthetic control time-series.

Exercise 5

Create the synthetic control time-series for California cigarette sales using the function generate_control(). Then, inspect the result using grab_synthetic_control() and plot_trends().

Code

# generate and inspect the dataset
prop99_syn <- generate_control(prop99_syn)
grab_synthetic_control(prop99_syn)

# plot the synthetic and observed cigsales
plot_trends(prop99_syn)

Inference using permutation test

Now we have our synthetic control timeseries \(\hat{Y}^0_t\), we can estimate the average causal effect in the post-intervention period 1989-2000.

Exercise 6

Estimate the average causal effect in the post-intervention time period:

\[\bar{CE} = \frac{1}{T} \sum_{t = 1}^{T} Y^1_t - Y^0_t\]

Code

# estimating the average causal effect
grab_synthetic_control(prop99_syn) |> 
  filter(time_unit > 1988) |>
  mutate(dif = real_y - synth_y) |> 
  summarize(CE = mean(dif))

With tidysynth, it’s easy to perform a permutation test. In fact, you have already done this by specifying generate_placebos = TRUE.

Exercise 7

Use the function plot_placebos() to compare the counterfactual estimate to the reference distribution obtained via a permutation test.

Code

# Placebo plot
plot_placebos(prop99_syn)

Exercise 8 (OPTIONAL!)

If you have time, use the grab_synthetic_control() function to create a similar permutation test plot for the average causal effect you computed before (comparing California’s effect to the reference distribution).

Code

# Create a dataset with average causal effect
# for each state, also in the placebo group
ce_data <- 
  prop99_syn |> 
  grab_synthetic_control(placebo = TRUE) |> 
  filter(time_unit > 1988) |>
  mutate(dif = real_y - synth_y) |> 
  group_by(.id, .placebo) |> 
  summarize(average_causal_effect = mean(dif), .groups = "drop") 

# Create density plot with vline for California
ce_data |> 
  filter(.placebo == 1) |> 
  ggplot(aes(x = average_causal_effect)) +
  geom_density(fill = "grey") +
  geom_rug() +
  geom_vline(
    mapping = aes(xintercept = average_causal_effect), 
    data = ce_data |> filter(.placebo == 0)
  ) +
  geom_label(
    aes(label = .id),
    y = 0.02, 
    data = ce_data |> filter(.placebo == 0)
  ) + 
  theme_minimal()

Robustness checks for units and variables

As mentioned in the lecture, this whole procedure hinges on a lot of choices. Through a robustness check (or sensitivity analysis) you can find out if your conclusions would have been differen if you had made a different choice somewhere in the study.

Exercise 9

Change one of the choices, rerun the analysis, and compare the results to the results you just created. For example:

Change the donor pool by taking out Utah and Nevada
Change the covariates by adding more, removing some, or changing the time window
Set the variable weights to the inverse of the covariate’s variances instead of RMSPE estimation

Code

# do all three of the above.
prop99_syn_robust <- 
  prop99 |> 
  # remove utah and nevada
  filter(state != "Utah", state != "Nevada") |> 
  # create synthetic control object
  synthetic_control(
    outcome = cigsale,
    unit = state,
    time = year,
    i_unit = "California",
    i_time = 1988,
    generate_placebos = TRUE
  ) |> 
  # The first three predictors
  generate_predictor(
    time_window = 1976:1988, # Changed time window
    lnincome = mean(lnincome, na.rm = TRUE),
    retprice = mean(retprice, na.rm = TRUE),
    age15to24 = mean(age15to24, na.rm = TRUE)
  ) |> 
  # Removed beer consumption in 1984-1988
  # Cigarette sales in 1975
  generate_predictor(
    time_window = 1975,
    cigsale_1975 = cigsale
  ) |> 
  # Cigarette sales in 1980
  generate_predictor(
    time_window = 1980,
    cigsale_1980 = cigsale
  ) |> 
  # Added cigarette sales in 1984
  generate_predictor(
    time_window = 1984,
    cigsale_1984 = cigsale
  ) |> 
  # Cigarette sales in 1988
  generate_predictor(
    time_window = 1988,
    cigsale_1988 = cigsale
  )

# compute variance for each predictor
v_weights <- 
  grab_predictors(prop99_syn_robust, type = "controls") |> 
  pivot_longer(-variable) |> 
  group_by(variable) |> 
  summarize(inverse_var = 1/var(value)) |> 
  mutate(v_weight = inverse_var/sum(inverse_var)) |> 
  pull(v_weight)

# create weights and synthetic control
prop99_syn_robust <- 
  prop99_syn_robust |> 
  generate_weights(custom_variable_weights = v_weights) |> 
  generate_control()

# plot the trend and permutation test
plot_trends(prop99_syn_robust)
plot_placebos(prop99_syn_robust)

Conclusion

In this practical, you have created a causal effect estimate using the synthetic control method in the tidysynth package, you have performed inferences for this method, and you have created a single robustness check for this inference to see whether the conclusions change if a different decision were made earlier in the study design.