Big Lizards:Blog:Entry “More On Polling”

« Stupid, Meaningless Observation | Main | Will Saddam Get the Drop? »

November 3, 2006

More On Polling

Hatched by Dafydd

Commenter Big D believes that polling is usually totally wrong, strongly biased towards Democrats:

I stand by my comment - the polls are almost always biased toward the Democrats and are almost always wrong.

Big D is right in a small way; polls are consistently biased left, but it's a very, very small bias (around 1%, maybe 2%). In a larger sense, however, the idea that polls are wildly and deliberately biased left is an electoral myth: the polls are not always wrong; generally, they're quite accurate. But there are some cycles where they're totally off... why is that?

First, a short detour. The 2004 polling was actually very good, very close to the reality of the vote. But the "poll" most often cited is the botched 2004 exit polling, conducted early in the day, which "predicted" that Kerry would win in a landslide -- and the way the media hyped it contributed to the false belief that the election was "stolen."

In fact, properly conducted exit polls are particularly useful... but not for predicting the electoral outcome! This is a particularly stupid idea; why do you need to see a prediction on Election Day of the election results? You'll know the outcome that same night in nearly all cases.

Rather, the purpose of exit polls is to understand why the election fell out the way it did, not try to guess how it will fall out. The proper idea is to conduct exit polling all throughout the voting period; then you compare it to the actual voting in each of those precincts, adjusting the exit poll to match the actual voting. That is, you have poll watchers who actually stay outside and count the number of men and number of women, the number of blacks, the number of whites, and so forth. (You can't count the number of Republicans and Democrats in most cases, alas.) And of course, the total number of voters in each precinct is public information. Then you weight your exit polling sample for the numbers of people in each weighting category who actually voted.

For numbers you cannot observe, you compare the exit polling demography with the known demography of the district, and you try to decide how much of the difference is due to polling methodology and how much is due to actual turnout differences among groups: black males turn out at far smaller a rate than white females, for example; but did they turn out more than expected in that distriction on that day?

What you have left is a very, very good snapshot of why various groups of people voted the way they did, who turned out, and how those two points affected the vote. The exit polling in 2004 that was conducted correctly -- which means, among other things, continuing to poll right up until the polls closed -- was, as always, very illuminating.

Back to regular public polling. I rise in its defense: given their assumptions, the well-known public polls (Gallup, Rasmussen, Pew, SurveyUSA, Field, Mason-Dixon, Quinnipiac, etc.) are by and large sound. The problem comes when the conventional-wisdom assumptions turn out to be wrong -- as in 1994, and perhaps 2006; we won't know that until November 8th.

There are three times I can think of when properly conducted public polling can be significantly wrong:

If the turnout assumptions are very wrong;

If the pollsters fail to take into account technological changes that may affect how people vote or how respondents are polled;

And the "Douglas Wilder effect."

Those turnout blues

First, there is turnout; we've discussed this a lot, as has everybody else. In very, very brief, pollsters must guess the turnout percentage of each of a large number of demographic groups, from blacks to Catholics to married women to blue-collar workers and many others. Then the demography of those who actually end up being polled must be adjusted, or "weighted," to correct it to the demography the pollster expects.

For example, if the pollster believes (based upon past history) that 15% of the electorate in a given district will be black, but his respondent pool ended up being only 9% black, then he weights the black vote, increasing its value in each response to the 15% level. Suppose he polls 1,500 people, of whom only 135 (9%) are black; he expects 15% of the voters to be black, so he should have gotten 225 in the pool.

If those 135 black respondents contributed 122 responses (90.37%) for the Democratic candidate (pretty normal), then in fact, the pollster would up that to 203 responses... which is 90.37% (rounding off) of 225. But note, you must do this for every demographic group you weight for simultaneously. And naturally, all this depends upon the pollster's prediction of demography: if he is badly wrong about that, then his poll will be off correspondingly.

The times they are a-changing

The infamous "Dewey Defeats Truman" poll of 1948 was an anomaly, but one that can also bite us in 2006.

That was probably the first year when presidential polls were conducted by telephone... and it turns out -- and should have been suspected in advance -- that people who had home telephones in 1948 were not representative of all voters; they represented more affluent voters.

Given that limitation, the poll was actually correct: if only voters affluent enough to own a home phone were allowed to vote, then Thomas Dewey would indeed have beaten President Harry S. Truman *. Unfortunately for the pollsters' reputations, even the poor and lower middle class were allowed to vote, and Truman did very well among those groups.

* While it is true that Harry S. Truman had no middle name, only an initial, and that he sometimes said it shouldn't have a period, he must have been joking... because in all official documents he produced or signed, there is in fact a period after the initial.

That same error, which I (and nobody else, of course) call the New Technology Deceit, can jump up and bite us again this year (and for the next couple of elections), as more and more people vote absentee or vote eary via touchscreen or other early-voting systems: those who vote early might be unrepresentative of voters as a whole; thus, polls conducted during the time that early voting is allowed may be skewed, because voters who have already voted may change their minds when polled a few days later.

Again, we won't know that until the 8th. (Even if some contests are not yet settled, and even if those contests determine who will control the Congress, we'll still know on November 8th the outcome of virtually every race.)

The Wilder effect

And there is a third circumstance in which polling falls down, sometimes catastrophically. That is when they run into the buzzsaw that is now called (by me and everybody else) the Douglas Wilder effect: when a polling question has a strong component of "cognitive dissonance" -- that is, when answering a certain way would put the respondent at odds with what he knows, culturally or philosophically, he is "supposed" to answer -- then you cannot trust the answer. Respondents will often answer one way, but then, in the privacy of the booth, vote the opposite.

This was discussed yesterday by John McIntyre on Real Clear Politics.

In the Douglas Wilder case, he ran for governor of Virginia in 1989, trying (successfully) to become the first black person elected governor of any American state. All the polls said he would cruise to victory by at least 10-12 points. But when the votes were counted, although he did win, he defeated Republican Marshall Coleman by less than one half of one percent.

Did his supporters desert him? An "election-eve rally" by Coleman? No; pollsters now believe that many, many Democrats, who never really intended to vote for Wilder, nevertheless said they would... because they knew it was the politically correct thing to do to vote for the black guy. But when they got into the voting booth, they pulled the lever for the white guy.

How do they know it was Democrats who betrayed Wilder? Exit polls! The Republicans straightforwardly said they had voted for the Republican; it was the Democrats who said they voted for Wilder... in precincts where he got nowhere near that many votes.

There is one race, as John pointed out, where this effect may affect the election: a recent poll in the Maryland Senate race showed Lt. Gov. Michael Steele only 6 points behind Rep. Ben Cardin (D, 95%); that's in the "leans Democrat" range.

But the same poll also showed Steele's black support at only 12%, and most watchers believe he will in fact get higher than that -- at least 20%, perhaps as much as 25%. John McIntyre speculates that there may be a reverse Wilder effect at play here:

The derision of black Republicans among many in the black political class is some of the harshest and meanest in politics. If you don't believe me, just go and look at what many liberal African-American leaders have said about Colin Powell, Clarence Thomas and Condoleezza Rice. Is it possible that some African-American voters are giving pollsters the politically correct answer that they are voting for the Democrat, because they know that is what they are "supposed" to say, but might do something different in the voting booth ?

Well, yes; of course it's possible; but is it correct? We'll find out in just a few days.

Things fall apart, the center cannot hold

Actually -- yes it can hold. All of these three special circumstances together add up to a very small number of actual election years; the circumstances in which polls can fail catastrophically are rare. Normally, properly conducted polls are good indicators of the view of the electorate at that moment (which can change over time -- look at George W. Bush's poll numbers in March 2001, November 2002, and July 2006).

However, 2006 happens to be a year when, for interesting and unrelated reasons, all three anomalous circumstances may be in play:

Pollsters have decided that Democrats are elated and Republicans depressed; thus they appear to believe that black turnout will be high (everywhere except in Maryland), married white turnout will be low, and evangelicals and other religious voters will stay home and sulk. If they're wrong about this -- and I think they are -- that will cause actual turnout to differ dramatically from projected turnout, changing results.

(Also in the same category, people who choose to respond to pollsters may not be representative of voters as a whole; this is sometimes called the "Republican walk-away effect," mostly by people I don't know.)

Because of early voting, pollsters may detect "changes" in the electorate that do not show up as strongly in the ballot box, because many of the respondents may already have voted before they changed their minds.

And the Wilder effect might be in play in the Maryland U.S. Senate race.

On the other hand, I could be all wet, and conventional wisdom may rule. We'll probably find that out by November 8th, too, or at least before the end of the year.

Hatched by Dafydd on this day, November 3, 2006, at the time of 4:58 PM

Trackback Pings

TrackBack URL for this hissing: http://biglizards.net/mt3.36/earendiltrack.cgi/1421

Comments

The following hissed in response by: Terrye

I have already voted. I voted R. Surprise!

The above hissed in response by: Terrye at November 4, 2006 3:28 AM

The following hissed in response by: yetanotherjohn

You also have polls with 1200 respondants of whom they determine 1000 are likely voters. or greater than 80% voting. Compare that to the 2004 election of 60% of elgible voters voting. And 35% in the 2002 mid terms.

If they are wrong about us seeing 80% of elgible voters voting, the next question is what is the make up of the subset of voters. For a simple version of this, cut the 80% from the poll in half to generate a 40% overall voting percentage. Now assume that the half came from the democrat or republican. Obviously in that somewhat far fetched version, even Harris has a chance of winning in FL.

Not assume that voters drop out in proportion to the poll (so 40% vote, but the overall percentage is the same). But further assume the GOP's GOTV effort adds a further 10% of voters + 1, all of whom are republican. The result would be polls showing comfortable democratic leads of 60% to 40% would find the republican winning with 50% of the vote +1. Look at anyones "most vulnerable" house or senate race projections with the assumption the GOP could make up 20% and less deficits. We are then looking at the GOP gaining seats in the house and senate.

Come Wednesday either I or the pollsters are going to be very sad. Me because the GOP GOTV effort couldn't pull the election out. The pollsters because they will have giant lizard egg on their face.

The above hissed in response by: yetanotherjohn at November 4, 2006 1:53 PM

The following hissed in response by: Rick Ballard

Jay Cost buried the lede big time in the linked piece. What the cited report actually says ties in very well with the probability that polling companies are using incorrect weights for their samples. The proposition is quite simple: "If registration data for 2006 shows a continuation of the historical trend then how can the current generic poll results be considered to be accurate?" The generic poll is used to weight turnout probabilities and is about 5-7% negative in relation to the "typical" past generics. If there has been such a large shift, why isn't it reflected in actual registration numbers?

If you look at the CA-50 special election that was held in conjunction with the California primary the thesis of Republican disenchantment due to upickem was proven incorrect. Both parties sank a ton of money into the race and the Reps turned on the GOTV as if it were the general and the result was that the district voted within 1% of its '04 results. That's with Cunningham having been convicted of the most outrageous type of vote selling that I can remember.

Tuesday is going to be an interesting day.

The above hissed in response by: Rick Ballard at November 4, 2006 8:38 PM

The following hissed in response by: Dick E

Dafydd-

I fully appreciate the power and usefulness of statistics and polling. You seem quite conversant with the mechanics, so I have a couple of questions (well, just one question, but several sub issues):

I can see how pollsters adjust for race, gender, etc. among survey respondents, but how do they correct for people like me who don’t answer the pollsters’ questions?

I suppose one possibility is that the number of nonresponders is always statistically insignificant. This seems to me to be unlikely, so …

Do they just ignore the nonresponders, assuming they’re randomly distributed among the candidates? This too seems unlikely, so is there any way to know the probable voting patterns of nonresponders?

If a poll is supposed to cover 600 likely voters, and 5 people answer, “I’m going to vote, but I won’t answer your questions,” do they then call a total of 605 people who say they plan to vote? Either way, what do they do about the recalcitrant 5?

If a pollster encountered a significant increase or decrease in nonresponders, what would they do with this information? In an election like the impending one, could a large number of reluctant poll respondents actually be Republican voters who are too ~~ashamed~~ intimidated to admit they still support Bush? If the information were ignored, mightn’t we have something akin to “Dewey Defeats Truman?”

The above hissed in response by: Dick E at November 4, 2006 10:23 PM

The following hissed in response by: Rick Ballard

The WaPo just reported that the generic lead has shrunk to 6 while Newsweek reports that it has opened to 16.

Six is the historic spread that Reps win on. One of those polls is merde du jour - or they both are.

The above hissed in response by: Rick Ballard at November 5, 2006 12:10 AM

The following hissed in response by: Dafydd ab Hugh

Dick E:

I can see how pollsters adjust for race, gender, etc. among survey respondents, but how do they correct for people like me who don’t answer the pollsters’ questions?

They don't.

They have assumed for some time that equal proportions of left and right hang up on the pollsters (ignoring the undistributed middle for purposes of discussion).

But it has become increasingly clear that this is an untenable assumption: very often, conservatives are less willing to answer pollster's questions than liberals.

Whenever this is true, the poll will, of course, be skewed to the left. But so far as I know, no pollster attempts to correct for the "Shun Effect," as I just now decided to start calling it.

If you're asking how one could correct for it, I do have some suggestions:

Block your caller ID before calling (you want to distinguish between "shunning" and "not home";

Keep careful records of how many people actually answer the phone but refuse to participate in the poll;

Ask the person why... he might volunteer something that will tell you his political philosophy (or he might just hang up on you);

Now comes the tricky part: the pollster must correlate the number of shunnings with the vote of that district across a number of elections: if conservatives are more unwilling to participate in polling, then that would show up here: you would receive a higher concentration of shunnings in conservative districts, and the rate should positively correlate with the GOP advantage.

If it does, and if it's consistent (which in the real world it won't be), then you can easily calculate the Shun Effect using linear algebra. If it's not consistent, you can still approximate it by a numerical solution.

Of course, pollsters must do this over and over in many different districts; they would use numerical analysis to create a solvable function that approximates the actual Shun Effect: the rate of Republican shunners vs. the rate of Democratic shunners. Then they would use this rate in future elections to weight responses from Democrats and Republicans, depending upon the political rating of the district or state in question.

This would be in addition to all other weightings (race, sex, age, education, marital status, religion, etc).

Oh, and for the record: note that each kind of district (based upon political ranking) would typically have its own associated total percent of shunners; a GOP +20 would generate a different total percent of shunners than a GOP +10 or a Democrat +15.

But the actual number for any district is measurable: it's the percent of respondents who refuse to participate. If the measured number of shunners in a district differs from the number it should get (based upon its political ranking), then the pollster should use the Shun Effect numbers for the kind of district associated with the actual number, not the expected number. Observation trumps theory.

(Example: if the district is ranked as a GOP +20, but the pollster only get the number of shunners associated with a GOP +10 district, he should use the weighting constants associated with the latter, not the former.)

In any event, while not perfect, this would go at least part of the way towards making polling more accurate by taking the Shun Effect into account.

Dafydd

The above hissed in response by: Dafydd ab Hugh at November 5, 2006 1:19 AM

The following hissed in response by: Dick E

Dafydd-

Thanks very much.

You really answered my question -- with the answer I feared you would give -- in the first paragraph.

I'm sure adjusting for "shunners" (are they the guys who dress up funny for a parade in Philadelphia?) would involve complicated gyrations such as you describe. Sounds like it would be expensive for pollsters and might give poll results that a few of them wouldn't like.

I wonder if polls conducted for internal purposes only -- e.g. those conducted by the Dems and GOP themselves -- try to be more accurate.

The above hissed in response by: Dick E at November 5, 2006 6:44 PM

The following hissed in response by: Big D

Yipes - quoted by the lizard himself! So I must respond.

er...yeah...but!

No wait, here is where I will make my stand. By your own admission pollsters lean Democrat by 1 to 2%...But this is the margin of victory in many of the races!

Now I would describe a pollster being wrong when he predicts the wrong winner of a race, not when he fails to accurately predict the margin of victory. After all, the margin of victory (or defeat) is only meaningless numbers - predicting the wrong winner is for keeps!

Therefore, what is my argument? That in evenly divided America even a small bias contributes to big errors in polling.

What of the concept of polls becoming increasing less reliable - more shunners (such as myself), more Wilders (ditto for snarky reasons). I would argue, given the poison way in which politics now play, that more people lie to pollsters than ever before.

I'd be very interested in the statistical increase in shunners over the decades - I'll bet it has increased significantly.

The above hissed in response by: Big D at November 6, 2006 10:36 AM

The following hissed in response by: Big D

Powerline itself has an article on - why polls are so frequently wrong! In fact, why they are biased towards Democrats! So, what? Am I crazy here?

"Based on experience, what a pollster really should do is weight his poll results in favor of Republicans. But I doubt whether any of them do that."

Okay, so that was my best pull quote, but the entirety of the article does seem to support my conclusion - polls consistently over play Democrats.

The above hissed in response by: Big D at November 6, 2006 11:47 AM