Week 8 - Shocks

Shriya Yarlagadda

2024/10/26

Although we discussed the impacts of electoral shocks in class this week, my blog post this week will focus on refining my model from last week. As I did last week, I will continue to compare two models, one using data from after 1984 and the other using data from after 1996.

I planned to make three initial changes this week. First, upon the recommendation of Matt Dardet, I took the log of grant allocation in order to see if my model predicted a significant effect. Second, after being inspired by my classmate Alex Heuss, I decided to measure the outcome of Democratic vote share, rather than two-party vote share, in order to better account for the disruptive ability of third-party candidates. Finally, I added confidence intervals to my measures, seeking to better estimate the precision of my predictions.

Update 1: Grant Allocation + Single Party Vote Share + Confidence Intervals

 Post 84 Post 96
(Intercept)−2.390−0.836
(2.434)(2.173)
GDP_growth_quarterly−0.327***−0.356***
(0.044)(0.039)
mean_5_wk_poll_support0.962***1.092***
(0.030)(0.031)
turnout_lag10.0080.019
(0.029)(0.027)
unemployment−0.576***−0.979***
(0.159)(0.156)
log(total_grant)1.353***0.694**
(0.229)(0.243)
Num.Obs.322230
R20.8850.937
R2 Adj.0.8830.935
AIC1666.91083.0
BIC1693.31107.0
Log.Lik.−826.442−534.485
F485.880661.851
RMSE3.152.47
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

Post 1984 Model:

StatePredicted WinnerDemocratic Vote ShareLow PredHigh Pred
ArizonaDEM52.2251.1753.26
GeorgiaDEM50.0649.1051.02
MichiganDEM52.2351.1353.34
NevadaDEM51.3350.4752.19
North CarolinaDEM51.0549.9952.11
PennsylvaniaDEM52.8051.7953.82
WisconsinDEM51.9850.8153.16

Post 1996 Model:

StatePredicted WinnerDemocratic Vote ShareLow PredHigh Pred
ArizonaDEM52.8751.7753.97
GeorgiaDEM52.0251.1152.93
MichiganDEM53.7252.6354.82
NevadaDEM53.0152.1953.83
North CarolinaDEM52.7251.6853.76
PennsylvaniaDEM53.9052.8654.94
WisconsinDEM53.8852.7355.04

By taking the log of grant allocation, we find a highly significant prediction value in the ultimate outcome. Furthermore, although the models still overwhelmingly appear to predict that the Democrats will win all of the battleground states, which seems relatively unlikely, these models show additional uncertainty, with the values for both Georgia and North Carolina falling within a range of uncertainty. To get an additional benchmark for the performance of both of these models, I also run out-of-sample cross validation

Though these results are slightly left skewed, we find a relatively small range of out-of-sample errors, especially for the post 84 model, and a high clustering around 0! This suggests that the post 84, especially, may be a strong predictor of out-of-sample outcomes.

To further improve my model, I was interested in including several additional metrics that we currently have access to, namely 1) Democratic vote share in the last election and 2) Democratic vote share in the second-last election. I anticipated that these metrics will add an estimation baseline to the predictions

Update 2: Adding Democratic Vote Share in Last Two Elections

&nbsp;Post 84&nbsp;Post 96
(Intercept)−3.550+−1.680
(2.101)(1.885)
GDP_growth_quarterly−0.271***−0.348***
(0.038)(0.035)
mean_5_wk_poll_support0.744***0.843***
(0.034)(0.039)
turnout_lag10.008−0.006
(0.025)(0.024)
unemployment−0.477***−0.562***
(0.139)(0.144)
log(total_grant)0.883***0.290
(0.202)(0.215)
D_pv_lag10.261***0.241***
(0.035)(0.037)
D_pv_lag20.0400.052
(0.033)(0.033)
Num.Obs.322230
R20.9150.953
R2 Adj.0.9130.951
AIC1572.21018.8
BIC1606.21049.8
Log.Lik.−777.103−500.417
F484.614641.010
RMSE2.702.13
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

When adding in these variables, we interestingly find that the R^2 value of both models is still incredibly high, suggesting that these models have high in-sample prediction ability. To see how well these models perform on out-of-sample data, I again conduct out-of-sample cross-validation.

Unfortunately, it appears that these models are still left-skewed, but have more values that appear further away from zero. In general, it appears that these models are underpredicting Democratic vote share relative to the actual outcome. For reference, I also wanted to test how this would translate to 2024 prediction outcomes.

Post 1984 Model:

StatePredicted WinnerDemocratic Vote ShareLow PredHigh Pred
ArizonaDEM51.8450.8952.79
GeorgiaDEM50.5349.6351.42
MichiganDEM52.4351.4353.42
NevadaDEM51.6750.9052.44
North CarolinaDEM51.0050.0551.95
PennsylvaniaDEM52.6451.7453.54
WisconsinDEM51.9850.9353.04

Post 1996 Model:

StatePredicted WinnerDemocratic Vote ShareLow PredHigh Pred
ArizonaDEM50.9649.8852.04
GeorgiaDEM50.8349.9751.69
MichiganDEM52.1651.1353.20
NevadaDEM51.7851.0152.54
North CarolinaDEM50.9649.9651.96
PennsylvaniaDEM52.1351.1353.12
WisconsinDEM52.0550.9553.15

Despite this greater skew, we do not see a substantive shift in outcomes, with a Democratic sweep still being predicted. However, it is interesting to note that the post 96 model appears to predict a more conservative Democratic vote share given these specifications, with the Arizona, Georgia, and North Carolina models all predicting uncertain outcomes.

In addition to altering the variables in my model, I wanted to explore if changing their form would add additional precision. To inform this decision, I plotted the relationship between Democratic vote share and each of the explanatory models that I had included in my prediction, aside from federal funding allocation.

From these graphs, we can confirm that lagged vote share from the prior two elections and mean poll support are strongly linearly related to our predicted outcome. This suggests that a it might not be necessary to alter the functional form of these variables in our model. However, it appears that the impact of GDP growth, unemployment rate, and turnout in the last election have a rather ambiguous effect.

Out of curiosity, I wanted to how much the results would vary if I only included the three highly predictive variables identified here in my prediction models. I run this analysis below.

Update 3: Only Poll Support and Vote Share in Last Two Elections

&nbsp;Post 84&nbsp;Post 96
(Intercept)−2.185*−3.538***
(0.889)(0.883)
mean_5_wk_poll_support0.745***0.820***
(0.030)(0.035)
D_pv_lag10.324***0.358***
(0.032)(0.037)
D_pv_lag20.036−0.044
(0.031)(0.035)
Num.Obs.434290
R20.8780.928
R2 Adj.0.8780.927
AIC2208.51379.3
BIC2228.81397.6
Log.Lik.−1099.240−684.649
F1035.4861221.381
RMSE3.052.56
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

While both of these models have relatively high in-sample fit, they both interestingly have a much smaller range of out-of-sample errors than our previous models. The post-84 model appears especially strong in this metric, with a histogram of out-of-sample errors centered around zero. Though the spread of the out-of-sample errors appears quite large, this is likely because the error measures are relatively small, therefore not detracting from this model’s strength. I again explored how this would translate to predictions.

Post 1984 Model:

StatePredicted WinnerDemocratic Vote ShareLow PredHigh Pred
ArizonaDEM51.8449.9550.71
GeorgiaDEM50.5350.3351.08
MichiganDEM52.4351.4052.15
NevadaDEM51.6751.0951.79
North CarolinaDEM51.0050.2450.95
PennsylvaniaDEM52.6451.1651.87
WisconsinDEM51.9851.3152.08
Interestingly, despite this additional precision, we find a greater certainty of a Democratic sweep, with only Arizona having uncertain results.

Next, given the argument made by Shaw and Petrocik (5), and also addressed by Matt Dardet, that turnout does not predict party-specific outcomes, I was interested in seeing how my models would perform without the lagged turnout variable.

Update 4: Removing Lagged Turnout

&nbsp;Post 84&nbsp;Post 96
(Intercept)−2.836*−2.561*
(1.350)(1.221)
GDP_growth_quarterly−0.236***−0.330***
(0.036)(0.033)
mean_5_wk_poll_support0.726***0.830***
(0.030)(0.036)
unemployment−0.396**−0.538***
(0.121)(0.122)
log(total_grant)0.869***0.419*
(0.180)(0.192)
D_pv_lag10.273***0.276***
(0.031)(0.035)
D_pv_lag20.0320.016
(0.029)(0.030)
Num.Obs.434290
R20.8940.948
R2 Adj.0.8930.946
AIC2153.31291.6
BIC2185.91321.0
Log.Lik.−1068.664−637.809
F602.696852.690
RMSE2.842.18
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

Somewhat aligning with Shaw and Petrocik’s argument, the out-of-sample error range appears to be slightly smaller than when turnout was included. However, these errors are still significantly larger than when only the three strongly predictive variables were included. Again, I include election predictions for reference.

Post 1984 Model:

StatePredicted WinnerDemocratic Vote ShareLow PredHigh Pred
ArizonaDEM50.3350.5452.06
GeorgiaDEM50.7050.2751.53
MichiganDEM51.7851.7752.94
NevadaDEM51.4451.2552.40
North CarolinaDEM50.6050.4751.67
PennsylvaniaDEM51.5151.6852.97
WisconsinDEM51.6951.5452.80

Post 1996 Model:

StatePredicted WinnerDemocratic Vote ShareLow PredHigh Pred
ArizonaDEM51.3050.5452.06
GeorgiaDEM50.9050.2751.53
MichiganDEM52.3651.7752.94
NevadaDEM51.8251.2552.40
North CarolinaDEM51.0750.4751.67
PennsylvaniaDEM52.3251.6852.97
WisconsinDEM52.1751.5452.80

These results continue to show a Democratic sweep, with more certainty than most of the previous models (other than the first post-96 model)

Moving forward, there are a few additional steps I would like to take. First, I would like to incorporate the Supreme Court precedent data that I modified to include whether or not the shifts were accompanied by a conservative shift in the “median justice” of the court, as described by Rubin in Axios (5). In particular, I would like to evaluate the effect of an interaction term between precedent changes and the court’s ideological shifts. However, following this, I am interested in running these models, especially the models that include more than the three highly predictive variables, through Elastic Net in order to engage in feature selection and select my final model.

## Additional code added after Week 8 submission to save final data

write.csv(test, "test_1.csv", row.names = FALSE)
write.csv(train_post84, "train84_1.csv", row.names = FALSE)
write.csv(train_post96, "train96_1.csv", row.names = FALSE)

References

  1. Julia. 2020. “Answer to ‘Export R Data to Csv.’” Stack Overflow. https://stackoverflow.com/a/62017887.; Bobbitt, Zach. 2023. “How to Convert Datetime to Date in R.” Statology. January 25, 2023. https://www.statology.org/r-convert-datetime-to-date/. GKi. 2023. “Answer to ‘Extract Month and Year From Date in R.’” Stack Overflow. https://stackoverflow.com/a/76709941. rafa.pereira. 2016. “Answer to ‘Extract Month and Year From Date in R.’” Stack Overflow. https://stackoverflow.com/a/37704385.; “Extract the Last N Characters from String in R - Spark By {Examples}.” n.d. Accessed October 26, 2024. https://sparkbyexamples.com/r-programming/extract-the-last-n-characters-from-string-in-r/.; Rubin, April. 2023. “Supreme Court Ideology Continues to Lean Conservative, New Data Shows.” Axios. July 3, 2023. https://www.axios.com/2023/07/03/supreme-court-justices-political-ideology-chart.
  2. Andina, Matias. 2016. “Answer to ‘How to Remove $ and % from Columns in R?’” Stack Overflow. https://stackoverflow.com/a/35757945.; camille. 2022. “Answer to ‘Convert Column Names to Title Case.’” Stack Overflow. https://stackoverflow.com/a/70804865.
  3. Andina, Matias. 2016. “Answer to ‘How to Remove $ and % from Columns in R?’” Stack Overflow. https://stackoverflow.com/a/35757945. “RPubs - Linear Regression Confidence and Prediction Intervals.” n.d. Accessed October 26, 2024. https://rpubs.com/aaronsc32/regression-confidence-prediction-intervals.
  4. “How to Replace Values in R with Examples - Spark By {Examples}.” n.d. Accessed October 26, 2024. https://sparkbyexamples.com/r-programming/replace-values-in-r/. camille. 2022. “Answer to ‘Convert Column Names to Title Case.’” Stack Overflow. https://stackoverflow.com/a/70804865.
  5. Rubin, April. 2023. “Supreme Court Ideology Continues to Lean Conservative, New Data Shows.” Axios. July 3, 2023. https://www.axios.com/2023/07/03/supreme-court-justices-political-ideology-chart.

Data Sources

All data sources are provided by GOV 1372 course staff

Popular Vote Datasets

Economic Data

Polling Data

Data sources for state and county-level turnout, protests, and Supreme Court cases are unknown, but were generously provided by the GOV 1372 course staff. However, I append the Supreme Court data with a metric of how the Supreme Court’s leaninted over time with data from Axios (Rubin, April. 2023. “Supreme Court Ideology Continues to Lean Conservative, New Data Shows.” Axios. July 3, 2023. https://www.axios.com/2023/07/03/supreme-court-justices-political-ideology-chart. )