Our 2024 Forecast Methodology

It’s been a while since we’ve written something here at CNalysis on our forecasting methodology; the last article we wrote on our methodology was back in 2021 written by myself and Jackson. Since 2022, Jackson’s stepped into a more minor role at CNalysis as he’s taken on a full time job and is no longer at the forecasting department. A lot is changing in our forecasts since Jack has taken on the role of lead oddsmaker at CNalysis, and with a big year ahead of us we felt it’s timely to go into depth about our 2024 forecasting methodology.

 

Mixed-Method: Reasoning for our Approach (Chaz)

 

CNalysis uses mixed-methodology in its forecasts as we believe that this is the best approach to creating a high-quality model featuring the best of both approaches. In psephology, qualitative and quantitative election forecasters have usually kept to their own respective fields. However, certain observations can slip through the cracks for either of the camps. For example, if a forecaster uses solely qualitative analysis in their election forecasting, the conclusions can diverge from sheer opinion and become easily susceptible to biases, cognitive or otherwise. If a forecaster only uses quantitative analysis, the methodology can be so data-driven that it cannot accurately estimate the effects of non-numerical factors in an election. The best case of this I like to cite was in FiveThirtyEight’s 2018 US House forecast. In Kansas’s 2nd Congressional District, Republican nominee Steve Watkins was plagued by sexual harassment allegations shortly before the election. The forecast had the district as roughly a 50-50 race going in either direction. Upon the scandal breaking, the numbers turned heavily in favor of Democrat Paul Davis due to the fundamentals in the model containing a “scandal factor,” with Davis jumping to an 82% chance of winning, and by the time of the election still held a 62% chance of winning; Watkins ended up winning by a single point.

 

Therein lies the problem with quantitative-only approaches; there are some things that cannot be quantified in a forecast model. Having a simple “scandal factor” doesn’t take into account the idiosyncrasies of each and every scandal. Take the 1998 midterms for example, when Clinton was impeached for perjury after lying to Congress about his sex scandal with Monica Lewinsky. Republicans suffered greatly at the polls that year with an underwhelming midterm performance as Bill Clinton still remained popular despite the scandal. In a more extreme scandal than an extramarital affair, Rick Roebert won a competitive state legislative district in the Missouri House in 2020 despite coverage of his children’s allegations of sexual abuse against him, with little ticket-splitting in the process. Scandals, like many factors in elections that aren’t quantified, exist on a spectrum on how likely they are or not to impact the outcome, something that only seasoned qualitative analysis can provide the best approach for.

 

Qualitative Analysis (Chaz)

 

In our qualitative analysis, Chaz creates ratings for each state legislative district and respective seats with a rough estimate on what the chances are a district will go for either party. In the process, a “base rating” is created which qualitatively takes into account recent election data, election trends since 2016, expected incumbency or lack thereof, previous and current non-numerical factors (i.e a weak candidate being nominated in the last cycle and neither candidate in the district this cycle being particularly weak) and what the expected environment will be across the state.

 

Afterwards, Chaz keeps the ratings up to date and tweaks them when needed. Keeping up with the latest news in the districts as best as possible helps with making timely adjustments: given the sheer amount of state legislative districts to keep up with, there will more than likely be a few stories on state legislative elections that will not be picked up on our radar. Speaking with politicos on both sides of the aisle in the campaign world (formally and informally), and gathering their useful data points and learning what strategies they’re using is also a part of our qualitative analysis. An examination of the campaign finance data across districts is also presented qualitatively rather than quantitatively for the time being: it takes a boatload of resources and time that our staff does not currently possess to quantify quarterly campaign financial reports for every candidate in each state’s legislative elections.

 

Quantitative Analysis (Jack)

 

New to this year is the quantitative method of forecasting. While there has been significant number crunching in creating our qualitative forecast, it has never been directly implemented into the model. After the success of the live model for the 2023 Virginia State Legislative Election, we decided to implement it into a new model this year, the Expanded model. It combines both the use of qualitative ratings and quantitative data. However, in this section I’ll be going over the quantitative aspect of the expanded model. To create this quantitative side of the model we are using polling averages and previous elections to forecast these races. The way the model is created is it adjusts all the data points back to a mean zero, as if the state is going to vote dead even and then shifts the district ratings to what the forecasted state margin will be for the US presidential election. This gives the model a lot of flexibility; changes to the presidential race will affect the model, as it is a presidential year, but still taking other factors into consideration, including incumbents that outperform benchmarks and other quirks.

 

Factoring & Weighting (Jack)

 

The factoring is split into two parts. The first part is predicting how each state will vote this year. The model uses 3 different polling data points to mesh together to create a forecast. Firstly is the states polling average (All polling averages are pulled from jhkforecasts.com presidential polling average, with an in-depth weighting function), second is national polls, and finally the national environment average. These averages are finally combined with CNalysis’ presidential ratings, which translates to a margin based on the rating. To simulate the national environment, we combine national and state polls and state polls are regressed against partisanship. This is how the states move away from the mean zero. 

The second factoring is district level projections. The district level results are weighed based on their correlation with state legislative races. More recent elections get more weight, especially previous elections for that state legislative district seat. Each one of the elections is reverted to the margin of the election, where the mean zero resides. The last district level data point is the qualitative rating from CNalysis. This data point holds significant weight in the model, second to the most recent legislative election. For this data point we revert back to mean zero using the projected margin in the state, to offset the regression later on. These data points are combined to create a projected margin at the mean zero, which we then adjust with the projected state margin to create a district level projection. From there we predict a win percentage based on a normal distribution that matches our rating win percentages to is rating margin.

 

There are a few small quirks in the data for which we had to account. One is uncontested races. For this we picked an election that is the most similar to the race and used that margin for the uncontested race. The other is the solid ‘D’ or ‘R’ races that are well above the projected 20% margin in those races. For that, we did the same thing as the uncontested races. It is a somewhat crude method, but very few if any of these races are competitive enough to necessitate a more sophisticated approach.

 

Classic vs. Expanded (Jack)

 

The classic approach is the CNalysis forecast you’ve seen for years, while the expanded is our new model that incorporates the quantitative methodology as described. Both forecasts predict a win percentage for each race, the qualitative is stuck to intervals of 10%, while the expanded has a continuous probability of 0-100%. This applies to margins as well, where for the qualitative each seat in the rating has the same margin, while the expanded has that continuous scale. 

 

Both models use the same simulation method, where it runs the 35,000 simulations of the forecast. However, this year there is a new method of simulation. Previously, the simulation would consist of two factors, state and district. It created a random number for the state and then each district and combined those. Using adjusted win probabilities, due to the combination of two random numbers decreases its variability. If the new number was less than the win percentage the party would win that seat, and the opposite if it was greater. This year, while still using the state and district random variables, we are simulating margins. The state will have its own normally distributed shift and each district gets its own normally distributed shift. This gives the model more flexibility and realness when it comes to the movement of the state and districts.