January 2020

The goal of this workshop is to introduce students to methods from survey statistics for the design and processing of modern public opinion polls. Although we emphasize the methods that are designed for the case of non-probability surveys, throughout each session we will discuss each technique from the perspective of how they are used in the context of traditional (probabilistic) survey research because it is the best way to understand the main challenges that are specific to public opinion research.

The workshop uses R very heavily and we will spend most of our time working with the packages survey, mice and caret for data processing. We will also rely on synthetic data for the presentation of the main concepts.

Contents

Sampling methods

Interviewing all the adult population is impractical. How to select a sample to make inferences about the opinion of those who will not receive our questionnaire? In this session, we will review some basic sampling designs with the goal of understanding better the challenges of non-probability methods.

  1. The sampling frame
  2. Sampling with known probabilities
  3. Sampling with unknown probabilities

Weighting and estimation

The sample we plan is often not the sample we collect and we need to make corrections in order to be able to make inferences about the population of interest. In this session, we will see two common methods probability samples (raking and poststratification) and another two that have become popular for the analysis of non-probability samples (pseudo-inclusion methods and MRP).

  1. The Total Survey Error framework
  2. Non-response bias
  3. Classic weighting methods
  4. Methods from the causal inference literature
  5. Multilevel regression with poststratification

Addressing non-response

Not all respondents answer all questions. In some cases, these missing values are problematic because they affect our ability to make inferences. In some others, the problem is that respondents did not answer to questions of our direct interest. In this session, we will review a common theoretical framework to understand the effect of missing values, we will differentiate between imputation and prediction, and we will then discuss several methods to fill-in incomplete datasets.

  1. Non-response mechanisms in surveys
  2. Single/simple imputation methods
  3. Multiple imputation
  4. Predictive methods

Bibliography

  • Rivero, G. (2011). Análisis de datos incompletos en Ciencias Sociales. Centro de Investigaciones Sociológicas.
  • Rivero, G. (2019). Predictive Likely Voter and Vote Choice Models for Party Support Estimation in Spanish Election Polls. Unpublished Manuscript.
  • Lumley, T. (2010). Complex Surveys: A Guide to Analysis Using R: A Guide to Analysis Using R. Willey & Sons.
  • Elliott, M. R., & Valliant, R. (2017). Inference for nonprobability samples. Statistical Science, 32(2), 249-264.
  • Lax, J. R., & Phillips, J. H. (2009). How should we estimate public opinion in the states?. American Journal of Political Science, 53(1), 107-121.

Credits

This website is adapted from an original design from Jeffrey Arnold y Pablo Barberá.