Your usual reminder about missing data in surveys

Tuesday, February 24, 2015

Political science is not a strange to the Spanish general public anymore and specialized blogs are clearly leading the way in the opening of the discipline to the general audience. It seems to me that the public understanding of what political scientists do is now more accurate than, say, ten years ago. Excellent researchers are now part of the roll of experts in radio and television, and their in-depth, academically grounded perspective is helping raise the level of the political debate.

But I also feel that sometimes the urge to produce analysis is making us sideline the standards of excellence that our profession require. We are so eager to build stories, to produce insights, that we sometime forget the raw material we work with. Thankfully, the credibility revolution has brought causality to our vocabulary, and no one dares anymore to present the traditional kitchen sink regression without emphasizing the all too familiar cautionary note about correlation and causation. But we seem to approach survey data with far less vigilance, and we tend to forget everything we have learned about how responses to surveys are formed, about measurement error, about the theory of public opinion, and sometimes even about statistical inference. In this regard, I have my own particular fixation with missing data.

Consider study MD3045 from the CIS (the latest Barómetro). 24% of the respondents did not answer the ideology question, which makes “Don’t know” the largest category by far. It is astonishing: one quarter of the sample is either unwilling or unable to translate its political preferences into an ideological scale. And yet all the analysis I have seen so far ignore this figure, in spite of all the discussions about the current transformation of the political landscape. Ignoring that 24% of the sample would be the right choice only if we are willing to assume that the people who did report their ideology are similar to the people who did not, but it does not take much to show that the assumption is false. A quick look at the data reveals that the likelihood of not answering the ideology question is larger for women, for middle-aged respondents, and for respondents with lower education.


There is no need to tear our hair out and start questioning everything we know because we have forgotten to account for a large constituency that is unlikely to be uniformly distributed across parties or even across political preferences. The problem may or may not be consequential for our analysis, and maybe the stories we have put together still hold after we make the appropriate corrections or after we rethink how to conceptualize nonresponse. That is all I ask for. The public deserves from us no less attention to the details of our analysis than any peer reviewer.

Dialogue & Discussion