Introduction

Study Design

A study on congressional candidate fundraising and expenditures in the 2018 midterm elections is considered. The variables are: Candidate Contribution (CAND_CONTRIB), Candidate Loans (CAND_LOANS), Other Loans (OTHER_LOANS), Total Contributions from Individuals (TTL_INDIV_CONTRIB), Total Contributions from Political Party Organizations (POL_PTY_CONTRIB), and Contributions from Other Political Organizations (OTHER_POL_CMTE_CONTRIB), all recorded as dollars. Total Spending in the race was also controlled as a proportion of a candidate’s spending out of the total spending of the top two candidates in the race (PROPORTION_SPENT), and the candidate’s incumbency status and party were also included (incumbent, party).

Fundraising data were collected from the Federal Election Commission’s Candidate Financial Summary file for the 2017-2018 election cycle. Final vote shares were collected from the New York Times’ “U.S. House Election Results 2018.” The decision was made to only look at candidates in races that were relatively competitive in order to avoid extremely one-sided races as well as uncontested races. The dataset selection was made by using the FiveThirtyEight Classic House Forecast predictions as of 11 AM on November 6, 2018 (the morning of election day). Any candidate whose probability of winning was estimated as greater than or equal to 5% and less than or equal to 95% was included in the dataset. In order to control for low-performing third-party candidates, all candidate voteshares are calculated as a proportion of the votes that went to the two most competitive candidates in a given race.

A data dictionary for variables included in the dataset but not analyzed in this study can be found here.

Study Aims

The purpose of this study is to investigate the possible relationship between different sources of fundraising and candidate success while controlling for candidate spending, party, and incumbency.

Statistical Model

A multiple linear regression model is considered. Let

\(Y_i =\) the candidate’s voteshare as a proportion of votes allocated to the top two candidates in their race
\(X_{i1} =\) the dollar amount contributed to the campaign by the candidate
\(X_{i2} =\) the dollar amount loaned to the campaign by the candidate
\(X_{i3} =\) the dollar amount loaned to the campaign by entities other than the candidate
\(X_{i4} =\) the dollar amount donated to the campaign by individual donors
\(X_{i5} =\) the dollar amount donated to the campaign by non-party political organizations
\(X_{i6} =\) the dollar amount donated to the campaign by party political organizations
\(X_{i7} =\) the candidate committee’s spending as a proportion of total spending by the top two candidate committees in their race \(X_{i8} =\) the candidate’s incumbency status
\(X_{i9} =\) the candidate’s party

The initial model is given by \(Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \beta_3X_{i3} + \beta_4X_{i4} + \beta_5X_{i5} + \beta_6X_{i6} + \beta_7X_{i7} + \epsilon_i\) where \(\epsilon_i ~ iidN(0,\sigma^2), i = 1, 2, ..., 214\), and \(\beta_0,...,\beta_7\), and \(\sigma^2\) are the unknown model parameters.

In order to account for an unusual relationship between Candidate Vote Share and non-party political donations, subsequent models also included \(\beta_8X_{i8} + \beta_9X_{i9} + \beta_{10}X_{i5}^2\).

mdl.1 <- lm(VOTESHARE_TWOPARTY ~ CAND_CONTRIB + CAND_LOANS + OTHER_LOANS + TTL_INDIV_CONTRIB + POL_PTY_CONTRIB + PROPORTION_SPENT + incumbent * OTHER_POL_CMTE_CONTRIB * I(OTHER_POL_CMTE_CONTRIB^2))

Preliminary Analyses

Bivariate Associations

pairs(VOTESHARE_TWOPARTY ~ CAND_CONTRIB + CAND_LOANS + OTHER_LOANS)

pairs (VOTESHARE_TWOPARTY ~ TTL_INDIV_CONTRIB + OTHER_POL_CMTE_CONTRIB + POL_PTY_CONTRIB + PROPORTION_SPENT)

cor(data[c(38,18,19,20,24,32,33,37)])

##                        VOTESHARE_TWOPARTY CAND_CONTRIB   CAND_LOANS
## VOTESHARE_TWOPARTY            1.000000000  -0.09054081  0.012030279
## CAND_CONTRIB                 -0.090540814   1.00000000  0.150723168
## CAND_LOANS                    0.012030279   0.15072317  1.000000000
## OTHER_LOANS                   0.027166717  -0.01595215  0.010391833
## TTL_INDIV_CONTRIB             0.009426731   0.04867836  0.002234779
## OTHER_POL_CMTE_CONTRIB        0.244601407  -0.15661335 -0.102594288
## POL_PTY_CONTRIB              -0.020479217  -0.02317839 -0.013671286
## PROPORTION_SPENT              0.312053117   0.12447793  0.179332972
##                         OTHER_LOANS TTL_INDIV_CONTRIB OTHER_POL_CMTE_CONTRIB
## VOTESHARE_TWOPARTY      0.027166717       0.009426731             0.24460141
## CAND_CONTRIB           -0.015952151       0.048678357            -0.15661335
## CAND_LOANS              0.010391833       0.002234779            -0.10259429
## OTHER_LOANS             1.000000000      -0.080761057            -0.07485761
## TTL_INDIV_CONTRIB      -0.080761057       1.000000000            -0.05572635
## OTHER_POL_CMTE_CONTRIB -0.074857605      -0.055726350             1.00000000
## POL_PTY_CONTRIB        -0.004049138       0.018830526            -0.06550671
## PROPORTION_SPENT       -0.085267520       0.382284595             0.16818905
##                        POL_PTY_CONTRIB PROPORTION_SPENT
## VOTESHARE_TWOPARTY        -0.020479217       0.31205312
## CAND_CONTRIB              -0.023178394       0.12447793
## CAND_LOANS                -0.013671286       0.17933297
## OTHER_LOANS               -0.004049138      -0.08526752
## TTL_INDIV_CONTRIB          0.018830526       0.38228459
## OTHER_POL_CMTE_CONTRIB    -0.065506708       0.16818905
## POL_PTY_CONTRIB            1.000000000      -0.05329044
## PROPORTION_SPENT          -0.053290442       1.00000000

The strongest correlations between VOTESHARE_TWOPARTY and the predictors are with OTHER_POL_CMTE_CONTRIB and PROPORTION_SPENT. There is also possibly a negative correlation with candidate contribution.

Screening of Covariates and Verification of Assumptions

The relationship between VOTESHARE_TWOPARTY and OTHER_POL_CMTE_CONTRIB does not appear to be linear. Log transformations did not result in a more linear relationship. Introducing a higher order term for OTHER_POL_CMTE_CONTRIB did result in a better model, but ultimately the decision was made to instead introduce the categorical covariate of incumbency into the model and to include an interaction effect.

Based on automatic variable selection methods in combination with criterion-based statistics, the only variables included in the model were PROPORTION_SPENT, incumbent, and OTHER_POL_CMTE_CONTRIB. Partial residual plots, residual-versus-fitted plots, and measures of influence were investigated and no issues with high influence points, linearity, constant variance, or normality were identified.

Final Model

The final model is given by \(Y_i = \beta_0 + \beta_1X{i5} + \beta_2X_{i7} + \beta_3X_{i8} + \beta_4X_{i5}X_{i8} + \epsilon_i\)

where \(\epsilon_i ~ iidN(0,\sigma^2), i = 1, 2, ..., 214\), and \(\beta_0, \beta_1,..., \beta_4,\) and \(\sigma^2\) are the unknown model parameters.

Statistical Analysis

The fitted model is displayed below. A candidate’s vote share increases, on average, 0.045% for every 1% proportional increase in campaign spending relative to their opponent (95% CI 0.0175, 0.0722). Incumbency offers candidates an advantage of 0.0823% in voteshare over their opponent with an opposite disadvantage for nonincumbents (95% CI -0.101, -0.0637) when OTHER_POL_CMTE_CONTRIB is fixed at 0. For every additional 10,000 dollars raised from Other Political Committees, there is a decrease in candidate voteshare of 0.000244% for incumbents (95% CI 0.000368, 0.000121). The interaction term suggests that nonincumbents gain 0.001278% increase in voteshare for every 10,000 dollars they raise from Other Political Committees (95% CI 0.000887, 0.006684). Proportional spending explains 4.766371% of the variation in voteshare, compared to 16.5731% for the interaction term Incumbency*Other Political Committee Contributions. Incumbency and Other Political Committee Contributions are both negligible in their explanatory power when considered independently of one another; because \(\beta_4 = 0\) for incumbent candidates, this suggests that Political Committee Contributions are only a good indicator of voteshare for nonincumbents, which makes sense when the relationship between Voteshare and Other Political Committee Contributions is plotted with incumbents and nonincumbents differentiated.

## 
## Call:
## lm(formula = VOTESHARE_TWOPARTY ~ PROPORTION_SPENT + incumbent * 
##     OTHER_POL_CMTE_CONTRIB)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.09216 -0.01978 -0.00124  0.02165  0.09173 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        5.245e-01  1.049e-02  49.986  < 2e-16 ***
## PROPORTION_SPENT                   4.486e-02  1.387e-02   3.234 0.001418 ** 
## incumbentN                        -8.235e-02  9.678e-03  -8.509 3.39e-15 ***
## OTHER_POL_CMTE_CONTRIB            -2.448e-08  6.285e-09  -3.895 0.000132 ***
## incumbentN:OTHER_POL_CMTE_CONTRIB  1.278e-07  1.983e-08   6.444 7.91e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03318 on 209 degrees of freedom
## Multiple R-squared:  0.3607, Adjusted R-squared:  0.3484 
## F-statistic: 29.48 on 4 and 209 DF,  p-value: < 2.2e-16

##                                           2.5 %        97.5 %
## (Intercept)                        5.037822e-01  5.451503e-01
## PROPORTION_SPENT                   1.751742e-02  7.220843e-02
## incumbentN                        -1.014314e-01 -6.327339e-02
## OTHER_POL_CMTE_CONTRIB            -3.687355e-08 -1.209207e-08
## incumbentN:OTHER_POL_CMTE_CONTRIB  8.866936e-08  1.668430e-07

## Warning: package 'asbio' was built under R version 4.0.4

## Loading required package: tcltk

## [1] 4.766371

## [1] 16.5731

Summary of Findings

In general, the sources of a candidate’s funding are not particularly important to how well they perform in an election. A more important predictor of success is whether or not they outspent their opponent, and by how much. That said, there does seem to be a positive correlation between how much a nonincumbent raises from Other Political Committees and their ultimate voteshare.

Appendix

Diagnostics for Predictors

The purpose of this section is to examine the distribution of all potential predictors, identify any values that might have disproportionate influence on the model parameters, and to examine the bivariate associations to identify multicollinearity.

Strip plots for all predictors are shown next to boxplots of the same data. Individual Contributions, Contributions from Other Political Committees, and Contributions from Party Committees are all skewed. However, transformations were not applied. Initially a natural log transformation was applied to Contributions from Other Political Committees, but ultimately introducing the variable of incumbency proved a better way to deal with the skew in those data.

Screening of predictors

Based on automatic variable selection methods in combination with criterion-based statistics, only Proportional Spending, Other Political Committee Contributions, and Incumbency were selected. Partial residual plots, residual-versus-fitted plots, and measures of influence were investigated and no issues with high influence points, linearity, constant variance, independence, or normality were identified.

pairs(VOTESHARE_TWOPARTY ~ CAND_CONTRIB + CAND_LOANS + OTHER_LOANS)

pairs (VOTESHARE_TWOPARTY ~ TTL_INDIV_CONTRIB + OTHER_POL_CMTE_CONTRIB + POL_PTY_CONTRIB + PROPORTION_SPENT)

# Appears to be a positive correlation with TTL_INDIV_CONTRIB, OTHER_POL_CMTE_CONTRIB, PROPORTION_SPENT, and possibly POL_PTY_CONTRIB. 
cor(data[c(38,18,19,20,24,32,33,37)])

##                        VOTESHARE_TWOPARTY CAND_CONTRIB   CAND_LOANS
## VOTESHARE_TWOPARTY            1.000000000  -0.09054081  0.012030279
## CAND_CONTRIB                 -0.090540814   1.00000000  0.150723168
## CAND_LOANS                    0.012030279   0.15072317  1.000000000
## OTHER_LOANS                   0.027166717  -0.01595215  0.010391833
## TTL_INDIV_CONTRIB             0.009426731   0.04867836  0.002234779
## OTHER_POL_CMTE_CONTRIB        0.244601407  -0.15661335 -0.102594288
## POL_PTY_CONTRIB              -0.020479217  -0.02317839 -0.013671286
## PROPORTION_SPENT              0.312053117   0.12447793  0.179332972
##                         OTHER_LOANS TTL_INDIV_CONTRIB OTHER_POL_CMTE_CONTRIB
## VOTESHARE_TWOPARTY      0.027166717       0.009426731             0.24460141
## CAND_CONTRIB           -0.015952151       0.048678357            -0.15661335
## CAND_LOANS              0.010391833       0.002234779            -0.10259429
## OTHER_LOANS             1.000000000      -0.080761057            -0.07485761
## TTL_INDIV_CONTRIB      -0.080761057       1.000000000            -0.05572635
## OTHER_POL_CMTE_CONTRIB -0.074857605      -0.055726350             1.00000000
## POL_PTY_CONTRIB        -0.004049138       0.018830526            -0.06550671
## PROPORTION_SPENT       -0.085267520       0.382284595             0.16818905
##                        POL_PTY_CONTRIB PROPORTION_SPENT
## VOTESHARE_TWOPARTY        -0.020479217       0.31205312
## CAND_CONTRIB              -0.023178394       0.12447793
## CAND_LOANS                -0.013671286       0.17933297
## OTHER_LOANS               -0.004049138      -0.08526752
## TTL_INDIV_CONTRIB          0.018830526       0.38228459
## OTHER_POL_CMTE_CONTRIB    -0.065506708       0.16818905
## POL_PTY_CONTRIB            1.000000000      -0.05329044
## PROPORTION_SPENT          -0.053290442       1.00000000

# Strongest correlations with OTHER_POL_CMTE_CONTRIB and PROPORTION_SPENT. Maybe a negative correlation with candidate contribution.
plot(VOTESHARE_TWOPARTY ~ OTHER_POL_CMTE_CONTRIB) # Seems like a transformation would be a good idea

plot(VOTESHARE_TWOPARTY ~ log(OTHER_POL_CMTE_CONTRIB)) # Log transformation is not that much better

Automatic Variable Selection

Automatic Variable Selection methods were used to eliminate redundant variables. There were three main iterations of this process performed to find a suitable model: First, continuous variables without second-order, interaction, or categorical terms were considered. Second, a second-order term involving the predictor OTHER_POL_CMTE_CONTRIB was considered. Third, interaction terms and the categorical variables of incumbency and party were considered.

#Automatic variable selection - without second-order of OTHER_POL_CMTE_CONTRIB, categorical variables, or interaction terms.
library(leaps)
m1 <- lm(VOTESHARE_TWOPARTY ~ CAND_CONTRIB + CAND_LOANS + OTHER_LOANS + TTL_INDIV_CONTRIB + OTHER_POL_CMTE_CONTRIB + POL_PTY_CONTRIB + PROPORTION_SPENT)
ma <- regsubsets(VOTESHARE_TWOPARTY ~ CAND_CONTRIB + CAND_LOANS + OTHER_LOANS + TTL_INDIV_CONTRIB + OTHER_POL_CMTE_CONTRIB + POL_PTY_CONTRIB + PROPORTION_SPENT, data = data)
sma <- summary(ma)
sma$adj # Seems like best model is 4: VOTESHARE_TWOPARTY ~ CAND_CONTRIB + TTL_INDIV_CONTRIB + OTHER-POL_CMTE_CONTRIB + PROPORTION_SPENT
plot(1:7,sma$adj,xlab = "Subset", ylab = expression(R^2[adj]))
sma$bic # This method prefers the second: VOTESHARE_TWOPARTY ~ OTHER_POL_CMTE_CONTRIB + PROPORTION_SPENT
plot(1:7,sma$bic,xlab = "Subset",ylab = expression(BIC))
sma$cp # Maslow's CP agrees with adjusted R^2
plot(1:7,sma$cp,xlab = "Subset",ylab = "Maslow's Cp")

# Automatic variable selection - with second-order of OTHER_POL_CMTE_CONTRIB to try to account for the odd relationship between this predictor and voteshare
m2 <- lm(VOTESHARE_TWOPARTY ~ CAND_CONTRIB + CAND_LOANS + OTHER_LOANS + TTL_INDIV_CONTRIB + OTHER_POL_CMTE_CONTRIB + POL_PTY_CONTRIB + PROPORTION_SPENT + I(OTHER_POL_CMTE_CONTRIB^2))
ma2 <- regsubsets(VOTESHARE_TWOPARTY ~ CAND_CONTRIB + CAND_LOANS + OTHER_LOANS + TTL_INDIV_CONTRIB + OTHER_POL_CMTE_CONTRIB + POL_PTY_CONTRIB + PROPORTION_SPENT + I(OTHER_POL_CMTE_CONTRIB^2), data = data)
(sma2 <- summary(ma2))
sma2$adj # Seems like best model is 5: OTHER_LOANS + TTL_INDIV_CONTRIB + OTHER_POL_CMTE_CONTRIB + PROPORTION_SPENT + I(OTHER_POL_CMTE_CONTRIB^2). But 4 is almost as good, and has one less variable.
plot(1:8,sma2$adj,xlab = "Subset", ylab = expression(R^2[adj]))
sma2$bic # This method prefers the first: VOTESHARE_TWOPARTY ~ PROPORTION_SPENT + I(OTHER_POL_CMTE_CONTRIB^2)
plot(1:8,sma2$bic,xlab = "Subset",ylab = expression(BIC))
sma2$cp # Maslow's CP says model 3. OTHER_POL_CMTE_CONTRIB + PROPORTION_SPENT + I(OTHER_POL_CMTE_CONTRIB^2). Model 4 is almost as good.
plot(1:8,sma2$cp,xlab = "Subset",ylab = "Maslow's Cp")

# Let's also look at incumbency as a possible way to deal with the odd relationship between voteshare and OTHER_POL_CMTE_CONTRIB
plot(VOTESHARE_TWOPARTY~incumbent)
plot(VOTESHARE_TWOPARTY~PROPORTION_SPENT,pch=as.character(incumbent),data)
plot(VOTESHARE_TWOPARTY~OTHER_POL_CMTE_CONTRIB,pch=as.character(incumbent),data)
incumbents <- data[ which(data$incumbent=="I"),]
plot(VOTESHARE_TWOPARTY~OTHER_POL_CMTE_CONTRIB,pch=as.character(incumbent),incumbents)
plot(VOTESHARE_TWOPARTY~PROPORTION_SPENT,pch=as.character(incumbent),incumbents)

# Almost looks like there's a negative (or nonexistant) relationship between OTHER_POL_CMTE_CONTRIB and voteshare for incumbents, but a positive for challengers/non-incumbents

#Automatic variable selection - with second-order of OTHER_POL_CMTE_CONTRIB and incumbent
m3 <- lm(VOTESHARE_TWOPARTY ~ CAND_CONTRIB + CAND_LOANS + OTHER_LOANS + TTL_INDIV_CONTRIB + POL_PTY_CONTRIB + PROPORTION_SPENT + incumbent * OTHER_POL_CMTE_CONTRIB * I(OTHER_POL_CMTE_CONTRIB^2))
ma3 <- regsubsets(VOTESHARE_TWOPARTY ~ CAND_CONTRIB + CAND_LOANS + OTHER_LOANS + TTL_INDIV_CONTRIB + POL_PTY_CONTRIB + PROPORTION_SPENT + incumbent * OTHER_POL_CMTE_CONTRIB * I(OTHER_POL_CMTE_CONTRIB^2), data = data)
(sma3 <- summary(ma3))

sma3$adj # This says the 7th is the best. OTHER_LOANS + TTL_INDIV_CONTRIB + PROPORTION_SPENT + incumbent + I(OTHER_POL_CMTE_CONTRIB^2) + incumbentN:OTHER_POL_CMTE_CONTRIB + incumbentN:OTHER_POL_CMTE_CONTRIB:I(OTHER_POL_CMTE_CONTRIB^2). Seems way too complicated, and not much better than 4th
plot(1:8,sma3$adj,xlab = "Subset", ylab = expression(R^2[adj]))
sma3$bic # Likes the fourth one best.
plot(1:8,sma3$bic,xlab = "Subset",ylab = expression(BIC))
sma3$cp # Maslow's CP says 6th one.
plot(1:8,sma3$cp,xlab = "Subset",ylab = "Maslow's Cp")

The final summary output includes a matrix indicating which predictors were included in each of the candidate models. Several criteria for selecting the best model were used, including \(R^2_{adj}\) (large values are better), Bayes Information Criterion (BIC) (smaller values are better), and Mallows \(C_p\) statistic (values of \(C_p\) close to the number of beta coefficients are better). The BIC and Mallow’s \(C_p\) statistics both indicate that the fourth subset is the best. The \(R^2_{adj}\) statistic favors the 5th-7th subsets over the 4th, but they are only minimally superior at the expense of significant model complication, so the fourth subset was chosen.

m4 <- lm(VOTESHARE_TWOPARTY~PROPORTION_SPENT + incumbent + I(OTHER_POL_CMTE_CONTRIB^2) + incumbent*OTHER_POL_CMTE_CONTRIB)
summary(m4)

## 
## Call:
## lm(formula = VOTESHARE_TWOPARTY ~ PROPORTION_SPENT + incumbent + 
##     I(OTHER_POL_CMTE_CONTRIB^2) + incumbent * OTHER_POL_CMTE_CONTRIB)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.092169 -0.020791  0.000143  0.020376  0.092512 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        5.096e-01  1.547e-02  32.933  < 2e-16 ***
## PROPORTION_SPENT                   4.300e-02  1.392e-02   3.088  0.00229 ** 
## incumbentN                        -6.732e-02  1.504e-02  -4.476 1.25e-05 ***
## I(OTHER_POL_CMTE_CONTRIB^2)       -1.063e-14  8.151e-15  -1.304  0.19365    
## OTHER_POL_CMTE_CONTRIB             4.498e-09  2.309e-08   0.195  0.84576    
## incumbentN:OTHER_POL_CMTE_CONTRIB  1.057e-07  2.605e-08   4.056 7.07e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03312 on 208 degrees of freedom
## Multiple R-squared:  0.3659, Adjusted R-squared:  0.3506 
## F-statistic:    24 on 5 and 208 DF,  p-value: < 2.2e-16

m5 <- lm(VOTESHARE_TWOPARTY~PROPORTION_SPENT + incumbent * I(OTHER_POL_CMTE_CONTRIB^2))
summary(m5)

## 
## Call:
## lm(formula = VOTESHARE_TWOPARTY ~ PROPORTION_SPENT + incumbent * 
##     I(OTHER_POL_CMTE_CONTRIB^2))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.088253 -0.021334 -0.000435  0.020434  0.086305 
## 
## Coefficients:
##                                          Estimate Std. Error t value Pr(>|t|)
## (Intercept)                             5.086e-01  8.710e-03  58.392  < 2e-16
## PROPORTION_SPENT                        5.020e-02  1.379e-02   3.641 0.000343
## incumbentN                             -5.726e-02  6.580e-03  -8.702 9.77e-16
## I(OTHER_POL_CMTE_CONTRIB^2)            -9.110e-15  2.249e-15  -4.051 7.18e-05
## incumbentN:I(OTHER_POL_CMTE_CONTRIB^2)  1.583e-13  3.155e-14   5.019 1.11e-06
##                                           
## (Intercept)                            ***
## PROPORTION_SPENT                       ***
## incumbentN                             ***
## I(OTHER_POL_CMTE_CONTRIB^2)            ***
## incumbentN:I(OTHER_POL_CMTE_CONTRIB^2) ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03361 on 209 degrees of freedom
## Multiple R-squared:  0.3438, Adjusted R-squared:  0.3312 
## F-statistic: 27.37 on 4 and 209 DF,  p-value: < 2.2e-16

m6 <- lm(VOTESHARE_TWOPARTY~PROPORTION_SPENT + incumbent * OTHER_POL_CMTE_CONTRIB)
summary(m6)

## 
## Call:
## lm(formula = VOTESHARE_TWOPARTY ~ PROPORTION_SPENT + incumbent * 
##     OTHER_POL_CMTE_CONTRIB)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.09216 -0.01978 -0.00124  0.02165  0.09173 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        5.245e-01  1.049e-02  49.986  < 2e-16 ***
## PROPORTION_SPENT                   4.486e-02  1.387e-02   3.234 0.001418 ** 
## incumbentN                        -8.235e-02  9.678e-03  -8.509 3.39e-15 ***
## OTHER_POL_CMTE_CONTRIB            -2.448e-08  6.285e-09  -3.895 0.000132 ***
## incumbentN:OTHER_POL_CMTE_CONTRIB  1.278e-07  1.983e-08   6.444 7.91e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03318 on 209 degrees of freedom
## Multiple R-squared:  0.3607, Adjusted R-squared:  0.3484 
## F-statistic: 29.48 on 4 and 209 DF,  p-value: < 2.2e-16

Additionally, Party was viewed as a possible predictor, but the model that instead used incumbency had a higher \(R^2_{adj}\) and was therefore selected.

plot(VOTESHARE_TWOPARTY~OTHER_POL_CMTE_CONTRIB,pch=as.character(party),data)

republicans <- data[ which(data$party=="R"),]
plot(VOTESHARE_TWOPARTY~OTHER_POL_CMTE_CONTRIB,pch=as.character(party),republicans)

m7 <- lm(VOTESHARE_TWOPARTY~PROPORTION_SPENT + party * OTHER_POL_CMTE_CONTRIB)
summary(m7) # R^2 is higher for m6, so let's stick with that one.

## 
## Call:
## lm(formula = VOTESHARE_TWOPARTY ~ PROPORTION_SPENT + party * 
##     OTHER_POL_CMTE_CONTRIB)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.089825 -0.020180  0.000655  0.022962  0.093288 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    4.325e-01  8.174e-03  52.912  < 2e-16 ***
## PROPORTION_SPENT               7.065e-02  1.510e-02   4.679 5.17e-06 ***
## partyR                         5.641e-02  7.478e-03   7.543 1.38e-12 ***
## OTHER_POL_CMTE_CONTRIB         6.498e-08  1.388e-08   4.681 5.12e-06 ***
## partyR:OTHER_POL_CMTE_CONTRIB -7.530e-08  1.436e-08  -5.244 3.84e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03415 on 209 degrees of freedom
## Multiple R-squared:  0.3225, Adjusted R-squared:  0.3095 
## F-statistic: 24.87 on 4 and 209 DF,  p-value: < 2.2e-16

The preliminary model fit is shown below.

m6 <- lm(VOTESHARE_TWOPARTY~PROPORTION_SPENT + incumbent * OTHER_POL_CMTE_CONTRIB)
summary(m6)

## 
## Call:
## lm(formula = VOTESHARE_TWOPARTY ~ PROPORTION_SPENT + incumbent * 
##     OTHER_POL_CMTE_CONTRIB)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.09216 -0.01978 -0.00124  0.02165  0.09173 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        5.245e-01  1.049e-02  49.986  < 2e-16 ***
## PROPORTION_SPENT                   4.486e-02  1.387e-02   3.234 0.001418 ** 
## incumbentN                        -8.235e-02  9.678e-03  -8.509 3.39e-15 ***
## OTHER_POL_CMTE_CONTRIB            -2.448e-08  6.285e-09  -3.895 0.000132 ***
## incumbentN:OTHER_POL_CMTE_CONTRIB  1.278e-07  1.983e-08   6.444 7.91e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03318 on 209 degrees of freedom
## Multiple R-squared:  0.3607, Adjusted R-squared:  0.3484 
## F-statistic: 29.48 on 4 and 209 DF,  p-value: < 2.2e-16

Added Variable Plots

Added variable plots for each of the covariates are shown. These plots provide evidence of the importance of each covariate given the other covariates already in the model. The plots all indicate no need for transformations because linear relationships are apparent.

## Warning: package 'faraway' was built under R version 4.0.4

Test for multicolinearity

Formal testing for multicollinearity can now be done using variance inflation factors (VIF). VIFs measure how much the variance of the estimated regression coefficients are inflated as compared to when the predictor variables are not linearly related. A maximum VIF in excess of 10 is a good rule of thumb for multicollinearity problems. Based on the maximum VIF, 4.342462, there do not appear to be any issues that need addressing.

vif(m6)

##                  PROPORTION_SPENT                        incumbentN 
##                          1.151237                          4.342462 
##            OTHER_POL_CMTE_CONTRIB incumbentN:OTHER_POL_CMTE_CONTRIB 
##                          2.755262                          2.202938

Residual Diagnostics

To identify potentially influential points, Cook’s distances is extracted from the model and plotted using a half-norm plot to emphasize unusually large or small values.

m6i <- influence(m6)
halfnorm(cooks.distance(m6))

Observations 91 and 209 look like they might have high influence, but a model fit without those two observations does not substantially change the model, so the model can be considered robust to those data points.

summary(lm(VOTESHARE_TWOPARTY ~ PROPORTION_SPENT + incumbent * OTHER_POL_CMTE_CONTRIB, subset = -c(91, 209)))

## 
## Call:
## lm(formula = VOTESHARE_TWOPARTY ~ PROPORTION_SPENT + incumbent * 
##     OTHER_POL_CMTE_CONTRIB, subset = -c(91, 209))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.091389 -0.019658 -0.000248  0.021216  0.088909 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        5.230e-01  1.025e-02  51.017  < 2e-16 ***
## PROPORTION_SPENT                   4.789e-02  1.367e-02   3.503 0.000563 ***
## incumbentN                        -8.424e-02  9.441e-03  -8.922 2.46e-16 ***
## OTHER_POL_CMTE_CONTRIB            -2.459e-08  6.108e-09  -4.026 7.97e-05 ***
## incumbentN:OTHER_POL_CMTE_CONTRIB  1.300e-07  1.953e-08   6.657 2.47e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03224 on 207 degrees of freedom
## Multiple R-squared:  0.3897, Adjusted R-squared:  0.3779 
## F-statistic: 33.04 on 4 and 207 DF,  p-value: < 2.2e-16

summary(m6)

## 
## Call:
## lm(formula = VOTESHARE_TWOPARTY ~ PROPORTION_SPENT + incumbent * 
##     OTHER_POL_CMTE_CONTRIB)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.09216 -0.01978 -0.00124  0.02165  0.09173 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        5.245e-01  1.049e-02  49.986  < 2e-16 ***
## PROPORTION_SPENT                   4.486e-02  1.387e-02   3.234 0.001418 ** 
## incumbentN                        -8.235e-02  9.678e-03  -8.509 3.39e-15 ***
## OTHER_POL_CMTE_CONTRIB            -2.448e-08  6.285e-09  -3.895 0.000132 ***
## incumbentN:OTHER_POL_CMTE_CONTRIB  1.278e-07  1.983e-08   6.444 7.91e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03318 on 209 degrees of freedom
## Multiple R-squared:  0.3607, Adjusted R-squared:  0.3484 
## F-statistic: 29.48 on 4 and 209 DF,  p-value: < 2.2e-16

A plot of the fitted versus residual values looks like noise, so this plot supports constant variance of residuals. That said, there are a number of observations greater than two standard deviations from the expected mean of 0: 26, 32, 33, 43, 44, 65, 66, 91, 202, 209, 210, 213, and 214. However, when the model is fitted without these data the coefficient estimates do not change drastically.

## 
## Call:
## lm(formula = VOTESHARE_TWOPARTY ~ PROPORTION_SPENT + incumbent * 
##     OTHER_POL_CMTE_CONTRIB)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.09216 -0.01978 -0.00124  0.02165  0.09173 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        5.245e-01  1.049e-02  49.986  < 2e-16 ***
## PROPORTION_SPENT                   4.486e-02  1.387e-02   3.234 0.001418 ** 
## incumbentN                        -8.235e-02  9.678e-03  -8.509 3.39e-15 ***
## OTHER_POL_CMTE_CONTRIB            -2.448e-08  6.285e-09  -3.895 0.000132 ***
## incumbentN:OTHER_POL_CMTE_CONTRIB  1.278e-07  1.983e-08   6.444 7.91e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03318 on 209 degrees of freedom
## Multiple R-squared:  0.3607, Adjusted R-squared:  0.3484 
## F-statistic: 29.48 on 4 and 209 DF,  p-value: < 2.2e-16

summary(lm(VOTESHARE_TWOPARTY ~ PROPORTION_SPENT + incumbent * OTHER_POL_CMTE_CONTRIB, subset = -c(26, 32, 33, 43, 44, 65, 66, 91, 202, 209, 210, 213, 214)))

## 
## Call:
## lm(formula = VOTESHARE_TWOPARTY ~ PROPORTION_SPENT + incumbent * 
##     OTHER_POL_CMTE_CONTRIB, subset = -c(26, 32, 33, 43, 44, 65, 
##     66, 91, 202, 209, 210, 213, 214))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.059396 -0.018781 -0.001086  0.019203  0.060761 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        5.190e-01  9.006e-03  57.627  < 2e-16 ***
## PROPORTION_SPENT                   5.812e-02  1.184e-02   4.907 1.94e-06 ***
## incumbentN                        -8.353e-02  8.215e-03 -10.168  < 2e-16 ***
## OTHER_POL_CMTE_CONTRIB            -2.531e-08  5.290e-09  -4.785 3.36e-06 ***
## incumbentN:OTHER_POL_CMTE_CONTRIB  1.290e-07  1.679e-08   7.682 7.29e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02742 on 196 degrees of freedom
## Multiple R-squared:  0.4884, Adjusted R-squared:  0.478 
## F-statistic: 46.78 on 4 and 196 DF,  p-value: < 2.2e-16

Finally, a Q-Q plot supports normality, though there is some minor deviation at the tails.

qqnorm(residuals(m6))
qqline(residuals(m6))

Document Information

All of the statistical analyses in this document will be performed using R version 4.0.2 (2020-06-22). R packages used will be maintained using the packrat dependency management system.

sessionInfo()

## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19041)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] tcltk     stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] faraway_1.0.7 asbio_1.6-7  
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.6           nloptr_1.2.2.2       bslib_0.2.4         
##  [4] compiler_4.0.2       jquerylib_0.1.3      gWidgets2_1.0-8     
##  [7] highr_0.8            tools_4.0.2          boot_1.3-25         
## [10] statmod_1.4.35       digest_0.6.27        lme4_1.1-26         
## [13] nlme_3.1-148         jsonlite_1.7.2       evaluate_0.14       
## [16] memoise_2.0.0        lattice_0.20-41      rlang_0.4.10        
## [19] Matrix_1.2-18        crosstalk_1.1.1      yaml_2.2.1          
## [22] mvtnorm_1.1-1        xfun_0.21            fastmap_1.1.0       
## [25] pixmap_0.4-12        stringr_1.4.0        knitr_1.31          
## [28] htmlwidgets_1.5.3    sass_0.3.1           combinat_0.0-8      
## [31] grid_4.0.2           DT_0.17              scatterplot3d_0.3-41
## [34] deSolve_1.28         R6_2.5.0             plotrix_3.8-1       
## [37] rmarkdown_2.7        minqa_1.2.4          magrittr_2.0.1      
## [40] MASS_7.3-51.6        htmltools_0.5.1.1    splines_4.0.2       
## [43] multcompView_0.1-8   stringi_1.5.3        gWidgets2tcltk_1.0-6
## [46] cachem_1.0.4

Relationships between Fundraising Types and Candidate Voteshare in the 2018 Congressional Midterms

Jon Crenshaw