This is the second in a series of GUEST POSTS by Matthew Knee, a Ph.D. candidate at Yale University specializing in campaigns and elections, ethnic voting patterns, public opinion, and quantitative and experimental approaches to political science.
Samples & Margin of Error
1. Sampling Error Will Always Be There – No Matter What The Sample Breakdown Looks Like
Bloggers love to look at sample breakdowns. They say that there are too many of this party, not enough of that ethnicity, etc. in a polling sample. However, samples will always vary somewhat compared to the general population. An unbiased sampling procedure will match the population if you perform it an infinite number of times and average the results, but any given sample will always vary in one direction or another in a variety of ways. Even if a pollster were to weight their results so that all reported subgroups were “perfectly” represented, they would merely be creating the illusion of a perfect sample, since the actual results will still contain a degree of error inherent in the random sampling process.
2. The Margin Of Error Is Bigger – And Smaller – Than You Think
Most people do not realize two vital facts about the margin of error. First, it applies to each individual number. When you are looking at the difference between support for two candidates or issue positions, the actual margin of error is twice what is reported (Approximately – certain complicated mathematical considerations can throw this off slightly, and it is wrong in many-way races).
Second, there is not an even chance of reality falling anywhere within the margin of error. The margin of error describes part of a bell curve representing possible outcomes, typically the range in which there is a 95% chance of containing the “real” number. The number given is the center of the curve. Here is a chart showing what these curves look like at various sample sizes:
Why 95%? It’s a tradition, but an arbitrary one. Cutting the margin of error in half changes the 95% to over 67%. If you are only interested in the margin of error in one direction, that number increases to 84%. Thus, the probability of a candidate’s actual current position being half the margin of error better than their polling number is only 16%.
Calculating the probability that one candidate is ahead of another is a bit more complicated. Fortunately, the polling firm American Research Group has a handy web calculator that does the math for you here.
This calculation will also show that polls which many journalists and campaign spin doctors portray as indicating a statistical tie show nothing of the sort. A poll in which the candidates are separated by exactly twice the margin of error actually indicates a near certainty that the candidate who appears to be ahead is actually ahead.
3. The Margin of Error of Subsamples Is Usually Quite High – Often Unusably High
The stated margin of error for a poll is the margin of error for the entire poll, and is primarily driven by sample size. However, if you are only looking at subsamples (women, Hispanics, Evangelical Christians, etc.), the margin of error is equivalent to a poll in which the sample size is equal to the number in that group. This means that there is typically inadequate data about many or most subgroups to say very much about what they think. Even larger subgroups such as the two parties and independents are often in the ballpark of 300 (which is still acceptable, but getting close to not) in a large 1000-person national poll. More polls are smaller than that than larger.
The views of most minority groups such as African-Americans, Hispanics, Jews, and Asians cannot be measured accurately by a standard national poll, although there do exist very large polls or polls specifically designed to properly sample these groups.
4. Adjusting The Composition Of The Sample Often Helps Less Than Many Expect
Those who re-weight based on selected demographics tend to overestimate the impact of doing so. Many assume an approximately 1-to-1 relationship between party and some particular view, but moving a percentage point of the sample from one demographic to another will often get you a lot less since few groups are near unanimity in either direction. Each percentage point of the sample is allocated to one side or another (or undecided/other/declined/etc.). Even replacing members of a highly skewed demographic with members of another (say, a 75-25 group and a 25-75 group) only moves half a point from one side to the other per point moved from one demographic to the other – and most demographic changes are a lot less extreme than that 3-to-1 ratio. On a close question, replacing a relatively representative group (say, 50-50) with a highly skewed group (say 75-25) only shifts a quarter of a point. Even if you double all this when discussing differences between sides, small biases in a sample often will not make an enormous difference.
Less importantly, adjusting for a selected demographics in this relatively crude fashion creates a small risk of introducing more inaccuracies by falsely assuming that other demographic balances are not inadvertently altered in the process.
This is of course a simple approximation, and as such there are a number of factors that can throw the results off somewhat, but it is quite useful as a back-of-the-napkin estimate of the effects of a skewed sample.
Tomorrow, I will further discuss how to deal with bias in polls.