Here we are going to see how to combine polls from different sources.
Let us consider again Nevada polls.
Poll | Date | Sample | MoE | Clinton (D) | Trump (R) | Johnson (L) | Spread | |
---|---|---|---|---|---|---|---|---|
0 | RCP Average | 7/7 - 8/5 | -- | -- | 43 | 40.7 | 6.3 | Clinton +2.3 |
1 | CBS News/YouGov* | 8/2 - 8/5 | 993 LV | 4.6 | 43 | 41.0 | 4.0 | Clinton +2 |
2 | KTNV/Rasmussen | 7/29 - 7/31 | 750 LV | 4.0 | 41 | 40.0 | 10.0 | Clinton +1 |
3 | Monmouth | 7/7 - 7/10 | 408 LV | 4.9 | 45 | 41.0 | 5.0 | Clinton +4 |
Instead of doing an average of the poll as it is done by RCP (RealClearPolitics), we use Covariance Intersection. Covariance intersection is an algorithm for combining two or more data source when the correlation between them is unknown.
Let us denote with ˆa a vector of observations (e.g., 43,41,16 from CBS News/YouGov) and ˆb another vector of observations (e.g., 41,40,19 from KTNV/Rasmussen). A denotes the reliability of the data poll ˆa that we assume to be equal 1/samplesize (e.g., 1/993
for CBS News/YouGov) and B denotes the reliability of the data poll ˆb (e.g., 1/750 for KTNV/Rasmussen).
Given the weight ω,Covariance Intersection provides a formula to combine them:
C−1=ωA−1+(1−ω)B−1, ˆc=C(ωA−1ˆa+(1−ω)B−1ˆb).
This formula can be extended to an arbitrary number of sources. For instance, for the previous table using uniform weights ω1=1/3,ω3=1/3,ω3=1/3, we get
C−1=ω1993+ω2750+ω3408=717 ˆc=C(ω1993[43,41,16]+ω2750[41,40,19]+ω3408[45,41,14]).
The final result is
C−1=717, ˆc=[42.68,40.65,16.67]
It can be observed that by using ω1=1/3,ω3=1/3,ω3=1/3 the combined poll ˆc reduces to the average of the input polls
weighted by the sample size. However, it is possible to choose other values of the weights, see for instance here.
No comments:
Post a Comment