Loading [MathJax]/jax/output/HTML-CSS/jax.js

Monday, August 29, 2016

Combining polls data from different sources using covariance intersection

In a previous post, we have seen how to perform polls for a single State using poll data from KTNV/Rasmussen.
Here  we are going to see how to combine polls from different sources.

Let us consider again Nevada polls.

Poll Date Sample MoE Clinton (D) Trump (R) Johnson (L) Spread
0 RCP Average 7/7 - 8/5 -- -- 43 40.7 6.3 Clinton +2.3
1 CBS News/YouGov* 8/2 - 8/5 993 LV 4.6 43 41.0 4.0 Clinton +2
2 KTNV/Rasmussen 7/29 - 7/31 750 LV 4.0 41 40.0 10.0 Clinton +1
3 Monmouth 7/7 - 7/10 408 LV 4.9 45 41.0 5.0 Clinton +4


Instead of doing an average of the poll as it is done by RCP (RealClearPolitics), we use Covariance Intersection. Covariance intersection is an algorithm for combining two or more data source when the correlation between them is unknown.

Let us denote with ˆa a vector of observations (e.g., 43,41,16 from CBS News/YouGov) and  ˆb another vector of observations (e.g., 41,40,19 from KTNV/Rasmussen). A denotes the reliability of the data poll ˆa that we assume to be equal  1/samplesize (e.g., 1/993
for CBS News/YouGov) and B denotes the reliability of the data poll ˆb (e.g., 1/750 for KTNV/Rasmussen).

Given the weight ω,Covariance Intersection provides a formula to combine them: 


C1=ωA1+(1ω)B1, ˆc=C(ωA1ˆa+(1ω)B1ˆb).

This formula can be extended to an arbitrary number of sources.  For instance, for the previous table using  uniform weights  ω1=1/3,ω3=1/3,ω3=1/3, we get

C1=ω1993+ω2750+ω3408=717 ˆc=C(ω1993[43,41,16]+ω2750[41,40,19]+ω3408[45,41,14]).

The final result is

C1=717,   ˆc=[42.68,40.65,16.67]

It can be observed that by using ω1=1/3,ω3=1/3,ω3=1/3 the combined poll ˆc reduces to the average of the input polls
weighted by the sample size. However, it is possible to choose other values of the weights, see for instance here.

No comments:

Post a Comment