There is one group of prediction professionals with far more experience than political pollsters: the weather forecasting community. No one else has more day to day experience and success in using large amounts of data and in predicting the future.
Thus, the natural question is: can the insights of the highly experienced and technically sophisticated weather prediction community assist our embattled colleagues in political polling and forecasting? Perhaps.
As an example of how the political polls were challenged, below is a plot of the probabilities of Clinton winning over time. Most of the polls and combination of polls during the last week were giving her 80-95% chance of winning, with the exception of the 538 multi-poll guidance, which gave her only about a 65% change of winning (hats off to Nate Silver). This was clearly a suggestion of considerable uncertainty (I certainly wouldn't get on a plane whose chances of crashing was 35%!).
Weather forecasting versus political polling
Weather prediction and political polling have some similarities and differences.
Both attempt to use limited real-world data to determine the "truth" about the current situation. Meteorologists use a range of observational assets (e.g., satellites, surface observations, aircraft data) to determine the current three-dimensional characteristics (e.g., temperature, humidity, winds) of the atmosphere. Pollsters' "observations" are the current opinions of the voting population, predominantly determined through telephone polling.
Meteorologists forecast the future by using complex equations that describe the evolution of the atmosphere if they are provided an accurate description of the current atmospheric state. Pollsters really can't predict the future but try to accurately estimate the current opinions of potential voters and elucidate short-term trends.
Both meteorologists and pollsters are heavy users of statistics. Meteorologists use statistics to assist in filling in the gaps between observations and to compensate for biases in the atmospheric models. Pollsters use statistics to combine and filter the polling information they gather and to use a limited sample of voters to provide an accurate picture of the larger voting population.
The Achilles Heals of Political Pollsters
Nearly all modern political prediction is based on one source of information: polling, which attempts to ask a representative sample of folks about their political intentions-- how they plan on voting. But there is a minefield of problems pollsters have to deal with, such as:
- Telephone calling is the main approach to polling, but technological changes are a big issue. More and more folks have moved from landlines to cell phones. Many people have caller id and deliberately don't answer unknown callers. The person who answers a phone may not be a voter. The problems are endless and growing.
- Then there are the sampling problems: the number of calls are relatively small, but they have to provide sufficient information about the actual voter cohort.
- Changing demographics and communication technologies can result in the behavior of previous polling being less relevant to the current election.
- Because of the above and other issues, pollsters really can't provide useful uncertainty information about their polls.
Can the Meteorological Community Aid Political Pollsters?
Both the meteorological and political polling communities deal with a central problem: using incomplete observational information to provide a complete picture of the current situation. For meteorologists, the 3D distribution of all weather variables. For pollsters, the intentions of the folks that will actually go to the polls.
I would suggest that meteorologists are far more sophisticated in estimating the current situation based on observations. Pollsters use only one type of information to estimate the election results: the expressed intentions of potential voters. Meteorologists go one step further: they use the correlations of other type of information to inform their estimates of difference parameters.
Let me give you an example. Meteorologists need to describe the three-dimensional distributions of temperature, wind, humidity, and other parameters to serve as the initial state of weather prediction models. A difficult parameter is humidity (the amount of water vapor). It is critical, but we don't have many observations of it aloft, in contrast to wind and temperature. But weather folks have a powerful new tool to deal with that problem: we have found the correlations between humidity and parameters for which there are a lot more observations (e.g., wind). How we do this is a bit technical, but one approach is to use ensembles of many forecasts and making use of all those forecasts to determine correlations between various parameters and locations.
So by using all kind of parameters at various locations, we can get a much better analysis of a poorly observed parameter, in this case humidity. I use this technology extensively for weather forecasting though something called Ensemble Kalman Filter data assimilation.
So How Can Pollsters Use Weather Technology?
From what I have read, most political pollsters work the same way as meteorologists did in the old days. Considering only one parameter during their analysis of data (also called data assimilation). Pollsters use telephone information (asking people how they will vote) to estimate the voting of the entire voter contingent.
But what if they worked more like meteorologists? Use other information to estimate what everyone cares about: who will win the election. Pollsters could select from a long list of potential "predictors"--or parameters that could be combined statistically to estimate who would win the election. Examples might include:
1. Demographics (ancestry, age, etc.)
2. Unemployment rate (long term, short term, trend)
3. Education level
4. Facebook activity for each candidate
5. Trend in economic activity
6. Crime rate and trend in crime rate.
...and many more.
You get the idea: parameters that can be estimated accurately can be used to predict who will win the election. There are a variety of statistical techniques (e.g., multilinear regression, neutral nets, AI approaches) than can be used to select the most relevant "predictors" and find the relationship to who will win.
Musing about this, I did a search and found that someone already tried something like this and claims he has gotten every presidential election correct for which he made a prediction: Professor Alan Lichtman of American University.
Professor Lichtman uses a series of parameters or "key" to decide the winner and does not depend on polling information at all. I suspect an artificial intelligence (AI) engine like IBM's Watson could have a lot of potential.
A Final Note: Probabilistic Prediction
One lesson both meteorologists and political pollsters have learned is that the only way to forecast is do so probabilistically. All forecasts are uncertain and we can't simply give our users a number: the high will be 65F, or Trump will win by 2%. We need to give the probabilities of various outcomes. But we have a problem: we still need to develop reliable approaches for calculating realistic probabilities, something that will keep all of us busy for the immediate future.