Friday, August 5, 2016

A description of a Bayesian near-ignorance model for USA election polls

Election Poll for a single state

In this and follwoing posts, I'll present a way to compute Bayesian prediction for the result of USA 2016 election based on election poll data and near-ignorance prior models. This model is described in detail here:
A. Benavoli and M. Zaffalon. "Prior near ignorance for inferences in the k-parameter exponential family". Statistics , 49:1104-1140, 2014. (http://www.idsia.ch/~alessio/benavoli2014b.pdf)
In USA 2016 election, we have two main candidates Trump and Clinton. We denote with $\theta_{t}$ Trump's winning probability; with $\theta_{c}$ Clinton's winning probability and $\theta_{u}$ the undecided case (undecided voters, other candidates etc.). The goal of the inference is to estimate these parameters and in particular the focus is to compare the proportions of voters for $Trump$ and $Clinton$, i.e., $\theta_{t}-\theta_{c}$.

Likelihood model

In the election poll, a total of $n$ adults are polled to indicate their preference for the candidates $Trump$ and $Clinton$. Let $\hat{y}_{nt}$ denote the proportion of the sample that supports $Trump$, $\hat{y}_{nc}$ denote the proportion that supports $Clinton$ and $\hat{y}_{nu}=1-\hat{y}_{nt}-\hat{y}_{nc}$ denote the proportion that is either undecided or vote for someone else. The counts $n\hat{y}_{nt}$ (number of votes for Trump), $n\hat{y}_{nc}$ (number of votes for Clinton) and $n\hat{y}_{nu}$ (undecided) are assumed to have a multinomial distribution with sample size $n$ and respectively parameters $\theta_{t}$ (Trump), $\theta_{c}$ (Clinton) and $\theta_{u}$ (undecided). Thus, the likelihood model is:
$$ p(data|\theta)=\theta_{t}^{n\hat{y}_{nt}} \theta_{c}^{n\hat{y}_{nt}} \theta_{ u}^{n\hat{y}_{nu}}, $$where $\theta_{t}+\theta_{c}+\theta_{u}=1$ are the unknown non-negative chances to be estimated.

Noninformative Prior model

In the standard Bayesian apparoach, we need to choose a prior on the unknonw $\theta$ parameters. A Dirichlet conjugate prior is a natural prior for $\theta_{t}$, $\theta_{c}$ and $\theta_{u}$:
$$ p(\theta)\propto \theta_{t}^{\alpha_{t}-1} \theta_{c}^{\alpha_{c}-1} \theta_{u}^{\alpha_{u}-1}, $$where in the case of lack of prior information the prior parameters are commonly selected as follows: Haldane's prior $\alpha_{t}=\alpha_{c}=\alpha_{u}=0$ (that is an improper prior); Jeffreys' prior $\alpha_{t}=\alpha_{c}=\alpha_{u}=\tfrac{1}{2}$; uniform prior $\alpha_{t}=\alpha_{c}=\alpha_{u}=1$. The expected value of $E[\theta_{t}-\theta_{u}]$ is equal to $0$ and the prior probability $P(\theta_{t}>\theta_{u})=0.5$ for both Jeffreys and uniform priors (for Haldane's prior they are not defined). See the paper for details about how computoing this lower and upper bounds.
These are commonly called noninformative priors, but they are not noninformative. These priors express indifference between $Trump$ and $Clinton$, but not prior ignorance.
To see that, consider $P[\theta_{t}+0.5 \theta_{u}>\theta_{c}+0.4\theta_{u}]$, this is the probability that the proportion of votes of $Trump$ exceeds the votes for $Clinton$ assuming a ``swing'' scenario in which $50\%$ of the undecideds vote for $Trump$ and $40\%$ of the undecideds for $Clinton$. This probability is equal to $0.76$ in the case of the uniform prior and $0.66$ in the case of Jeffreys' prior. It depends on the choice of the prior and this shows that the uniform and Jeffrey's priors are not really uninformative for this kind of poll.
Combining likelihood and prior, the resulting posterior is
$$ p(\theta|n,\hat{y}_n)\propto \theta_{t}^{n\hat{y}_{nt}+\alpha_{t}-1} \theta_{c}^{n\hat{y}_{nc}+\alpha_{c}-1} \theta_{u}^{n\hat{y}_{nu}+\alpha_{u}-1}, $$which is always proper in the case of the Jeffreys' and uniform prior and in the case of Haldane's prior provided that $\hat{y}_{nt},\hat{y}_{nc},\hat{y}_{nu}>0$.
The posterior expected value of $\theta_{t}-\theta_{c}$ (Trump-Clinton difference) is: $$ E[\theta_{t}-\theta_{c}|n,\hat{y}_n]=\dfrac{n\hat{y}_{nt}+\alpha_{t}}{n+\alpha_{t}+\alpha_{c}+\alpha_{u}}-\dfrac{n\hat{y}_{nc}+\alpha_{c}}{n+\alpha_{t}+\alpha_{c}+\alpha_{u}}, $$ while the posterior probability of the event $\theta_{t}-\theta_{c}>0$ is $$ P[\theta_{t}>\theta_{c}|n,\hat{y}_n]=\dfrac{\int\limits_{\{\theta_{t}>\theta_{c}\}} \theta_{t}^{n\hat{y}_{nt}+\alpha_{t}-1} \theta_{c}^{n\hat{y}_{nc}+\alpha_{c}-1} \theta_{u}^{n\hat{y}_{nu}+\alpha_{u}-1} ~d\theta}{\int \theta_{t}^{n\hat{y}_{nt}+\alpha_{t}-1} \theta_{c}^{n\hat{y}_{nc}+\alpha_{c}-1} \theta_{u}^{n\hat{y}_{nu}+\alpha_{u}-1} ~d\theta}, $$ which can be computed numerically by sampling from the Dirichlet distribution.

Near-ignorance Prior model

We now present a weaker prior model (near-ignorance model) that automatically allows us to take into account of possible swing scenarios.
A near-ignorance is a set of priors, in particular we consider this set:
$$ \mathcal{M}=\left\{\theta_{t}^{\ell_{t}-1}\theta_{c}^{\ell_{c}-1}\theta_{u}^{-\ell_{t}-\ell_{c}-1}, ~|\ell_i| \leq v, ~-\ell_{t}+\ell_{c}\in [-v,v] \right\}, $$here $v$ is parameter that represents pseudo-votes and it determines the strength of the prior inferences on the posterior inferences
When we have set of probabilty distributions, inference is obtained by computing lower and upper bounds of the expectations and probabilties of interest. From this model, a priori we have $\underline{E}[\theta_{t}-\theta_{c}]=-1$ (lower expectation), $\overline{E}[\theta_{t}-\theta_{c}]=1$ (upper expectation). Moreover, we have that $\underline{P}(\theta_{t}>\theta_{c})=0$, $\overline{P}(\theta_{t}>\theta_{c})=1$ and that $\underline{P}[\theta_{t}+0.5 \theta_{u}>\theta_{c}+0.4\theta_{u}]=0$, and $\overline{P}[\theta_{t}+0.5 \theta_{u}>\theta_{c}+0.4\theta_{u}]=1$.This is really a model of prior ignorance. Before seein the data all the probabilties of interest can assume any value from 0 and 1 that means we are not biasing our inference neither towards Trump not towards Clinton. This is a more correct expression of the lack of prior information on the election result.
The resulting set of posteriors is
$$ \mathcal{M}_p=\left\{p(\theta|data)\propto \theta_{t}^{n\hat{y}_{nt}+\ell_{t}-1} \theta_{c}^{n\hat{y}_{nc}+\ell_{c}-1} \theta_{u}^{n\hat{y}_{nu}+\ell_{u}-1}, ~|\ell_i| \leq v, ~-\ell_{t}+\ell_{c}\in [-v,v] \right\}, $$From this set we can extract posteriors that are useful for analysing possible Swing Scenario:
  1. v votes move from Trump to Clinton, the resulting posterior is $$ p(\theta|n,\hat{y}_n)\propto\theta_{t}^{n\hat{y}_{nt}-v-1} \theta_{c}^{n\hat{y}_{nc}+v-1} \theta_{u}^{n\hat{y}_{nu}-1} $$
  2. v votes move from Clinton to Trump, the resulting posterior is $$ p(\theta|n,\hat{y}_n)\propto \theta_{t}^{n\hat{y}_{nt}+v-1} \theta_{c}^{n\hat{y}_{nc}-v-1} \theta_{u}^{n\hat{y}_{nu}-1} $$
In the next post, we will use this model to make inferences on signle state polls.

~

No comments:

Post a Comment