This is the third in a series of GUEST POSTS by Matthew Knee, a Ph.D. candidate at Yale University specializing in campaigns and elections, ethnic voting patterns, public opinion, and quantitative and experimental approaches to political science.
1. Pollsters Are, Generally Speaking, Competent Professionals, & Polls Are Relatively Accurate
Conspiracy theories about polling are usually wrong. Despite its shortcomings, polling is a relatively mature science, and its practitioners do their jobs reasonably well.
This does still mean that outlier polls put out by campaigns are often wrong – sometimes, the pollster’s job is to persuade the voters that their client has a shot at winning. Usually this is due to using overly-optimistic assumptions about who will turn out to vote and similar subjective matters, but sometimes they do something shadier such as not rotating candidate order (There is an advantage to being first), priming with previous questions, or taking multiple samples and releasing the one in which their candidate does the best.
Public polls, on the other hand are conducted and released by people who are trying to get things right. Issue polling is inherently tricky, as I’ve explained above, but pollsters generally get elections about right on average. Why are media polls sometimes liberally-biased in their wording? I can’t speculate. These mistakes can even occur subconsciously or accidentally. But as much as conservatives are naturally skeptical of the media, we should examine polls on their own terms using solid methods, rather than discarding useful information just because we don’t like the source.
2. A Bias Is Not Always A Problem At All – Let Alone A Fatal One
Say you find a flaw that could bias a poll result, but do not know how large an impact it would have. Sometimes you do not need to know. Such a bias is about as likely to improve your argument as weaken it, depending on which way it biases the results. For instance, if you are arguing that a poll on unionization indicates more support for the anti-union side, and you find a question is biased for the unions, then you can say that your theory is probably even more true than the numbers indicate.
3. Compare That Which Is Mostly Similar
Most real, full-data-based polling analysis is based in large part on some kind of multiple regression analysis. Multiple regression finds relationships between the variable you are interested in and the outcome you care about in a hypothetical world in which other factors you think might get in the way (“controls”) are equal. Without a data set, you can’t do that, but you might be able to use similar principles. If a pollster has conducted multiple surveys on the same question over time, you might be able to compare the results without worrying that different question wordings or methodologies will get in the way. For repeated polls, you pretty much just have sampling error and change over time. Similarly, even comparing the same wording by different pollsters over time can be instructive, as long as you keep an eye out for differences between pollsters.
Less ideally, polls that ask about previous views or votes can be interesting, as long as one considers in what direction and to what degree people might lie about previous views.
4. You Can Estimate The Impact of Some Biases
What if you have some information about the impact of a bias, but not enough for exact results? In that case, you can estimate the maximum impact by testing possible but extreme scenarios.
Yesterday I laid out the general template for something like this. If we know what the over-sampled group thinks, and we know approximately what a representative group might think, the number espousing any particular view can be adjusted by the difference between those percentages times the number of percentage points off the sample is.
Now let’s try something more complicated, which will also incorporate the previous two items. Some have argued that the PPP “do-over” poll sampled too many union members and Barrett voters.
Say someone wishes to know how many people abandoned Walker in hindsight for Barrett, and wants to adjust for this oversampling. This makes sense because the poll asks who previously voted for Walker, so we can compare changes within the same group. This means that the slight difference in voter composition is unlikely to matter, since we are comparing the same people. It is unlikely that a slightly different voter composition would have a significantly different amount of change. In the PPP poll, the group sampled went from claiming they voted 47-47 to giving Barrett a 45-52 advantage, creating a 7 point deficit for Walker..
We have a 6% surplus of union members. We know that 37% of union members voted for Walker. We can multiply those numbers to find how many potential Walker abandoners are in that union surplus (2.2% of the sample). Thus, if every single union member in that 6% abandoned Walker, it would only account for 2.2 percentage points worth of abandoners . Realistically, it is very unlikely that even half of all union members changed their minds. Assuming this half all went over to Barrett (which is a more negative scenario than supported by the data), we still have only accounted for 2.2% of the 7% deficit that was created.
We are assuming that the demographic voting breakdowns are equivalent to the exit polls. This is not ideal, but a pretty good approximation. All of these numbers are subject to sampling error anyway.
If we want to be more specific, as in yesterday’s example, we then need to adjust for plausible numbers of abandoners in an equally-sized non-union group. In other words, what would the people who would otherwise have been sampled have done? We do not know exactly what this number is, but since the overall sample changed only slightly, it is likely not large. It might slightly enhance the effect of oversampling union members if non-union members became more supportive of Walker (it would blunt the effect if non-union members also became less supportive of Walker). All in all, this will probably not make much of a difference, since it is unlikely that even if non-union folks became more supportive of Walker, they would be anywhere nearly as skewed as our already-unreasonably skewed union group. Even if a quarter of them left Barrett for Walker and none moved the other way, (which is, again, completely unreasonable), this would net only 1.3 more points.
To make matters worse, the poll relies on asking for whom people voted in November. Since it appears that more people oppose Walker now than then, it is likely that to the degree this number is off, it is because there are people who did vote for Walker but do not want to admit it, rather than the other way around. Thus, this bias, while unmeasurable, militates in favor of Walker losing more support, rather than less.
From what we know, despite the somewhat unrepresentative sample, it looks like Walker has probably lost a bit of support between November and the PPP poll, since even ridiculously optimistic assumptions only account for half of his losses. Is it possible that even after adjusting for union support and people lying about for whom they voted, there is some other factor that contributed to the retrospective tie that can explain away Walker’s losses? In theory, yes, but it is extremely unlikely. Is it possible that this is due to sampling error? Yes, but again, the difference is large enough that that is also unlikely. Hopefully I have shown how little an impact small differences tend to have on final numbers.
* * *
Tomorrow, I will examine more polls on the conflict with public employee unions.