Sunday, October 16, 2016

Bayesian Sign Test

This post explains how to perform a Bayesian sign test for comparing the performance of two classifiers on different datasets. This module is part of a tool that I am developing together with Giorgio Corani and Janez Demsar. The code of this Ipython Notebook and the code of the test can be found here

Bayesian Sign Test

Module signtest in bayesiantests computes the probabilities that, based on the measured performance, one model is better than another or vice versa or they are within the region of practical equivalence.

This notebook demonstrates the use of the module.

We will load the classification accuracies of the naive Bayesian classifier and AODE on 54 UCI datasets from the file Data/accuracy_nbc_aode.csv. For simplicity, we will skip the header row and the column with data set names.

In [3]:
import numpy as np
scores = np.loadtxt('Data/accuracy_nbc_aode.csv', delimiter=',', skiprows=1, usecols=(1, 2))
names = ("NBC", "AODE")

Functions in the module accept the following arguments.

  • x: a 2-d array with scores of two models (each row corresponding to a data set) or a vector of differences.
  • rope: the region of practical equivalence. We consider two classifiers equivalent if the difference in their performance is smaller than rope.
  • prior_strength: the prior strength for the Dirichlet distribution. Default is 1.
  • prior_place: the region into which the prior is placed. Default is bayesiantests.ROPE, the other options are bayesiantests.LEFT and bayesiantests.RIGHT.
  • nsamples: the number of Monte Carlo samples used to approximate the posterior.
  • names: the names of the two classifiers; if x is a vector of differences, positive values mean that the second (right) model had a higher score.

Summarizing probabilities

Function signtest(x, rope, prior_strength=1, prior_place=ROPE, nsamples=50000, verbose=False, names=('C1', 'C2')) computes the Bayesian sign test and returns the probabilities that the difference (the score of the first classifier minus the score of the first) is negative, within rope or positive.

In [10]:
import bayesiantests as bt
left, within, right = bt.signtest(scores, rope=0.01)
print(left, within, right)
0.0 0.71288 0.28712

The first value (left) is the probability that the first classifier (the left column of x) has a higher score than the second (or that the differences are negative, if x is given as a vector).

In the above case, the right (AODE) performs worse than naive Bayes with a probability of 0.29, and they are practically equivalent with a probability of 0.71.

If we add arguments verbose and names, the function also prints out the probabilities.

In [11]:
left, within, right = bt.signtest(scores, rope=0.01, verbose=True, names=names)
P(NBC > AODE) = 0.0, P(rope) = 0.70982, P(AODE > NBC) = 0.29018

The posterior distribution can be plotted out:

  1. using the function signtest_MC(x, rope, prior_strength=1, prior_place=ROPE, nsamples=50000) we generate the samples of the posterior
  2. using the function plot_posterior(samples,names=('C1', 'C2')) we then plot the posterior in the probability simplex
In [12]:
%matplotlib inline
import matplotlib.pyplot as plt

samples = bt.signtest_MC(scores, rope=0.01)

fig = bt.plot_posterior(samples,names)

plt.show()

Checking sensitivity to the prior

To check the effect of the prior, let us a put a greater prior on the left.

In [13]:
samples = bt.signtest_MC(scores, rope=0.01,  prior_strength=1, prior_place=bt.LEFT)
fig = bt.plot_posterior(samples,names)
plt.show()

... and on the right

In [14]:
samples = bt.signtest_MC(scores, rope=0.01,  prior_strength=1, prior_place=bt.RIGHT)
fig = bt.plot_posterior(samples,names)
plt.show()

The prior with a strength of 1 has negligible effect. Only a much stronger prior on the left would shift the probabilities toward NBC:

In [15]:
samples = bt.signtest_MC(scores, rope=0.01,  prior_strength=10, prior_place=bt.LEFT)
fig = bt.plot_posterior(samples,names)
plt.show()

Auxiliary functions

The function signtest_MC(x, rope, prior_strength=1, prior_place=ROPE, nsamples=50000) computes the posterior for the given input parameters. The result is returned as a 2d-array with nsamples rows and three columns representing the probabilities $p(-\infty, `rope`), p[-`rope`, `rope`], p(`rope`, \infty)$. Call signtest_MC directly to obtain a sample of the posterior.

The posterior is plotted by plot_simplex(points, names=('C1', 'C2')), where points is a sample returned by signtest_MC.

References

@ARTICLE{bayesiantests2016, author = {{Benavoli}, A. and {Corani}, G. and {Demsar}, J. and {Zaffalon}, M.}, title = "{Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis}", journal = {ArXiv e-prints}, archivePrefix = "arXiv", eprint = {1606.04316}, url={https://arxiv.org/abs/1606.04316}, year = 2016, month = jun }

Saturday, October 15, 2016

Fresh forecast for US2016 election

The worst-case probability for Clinton winning the election is back above 90% (precisely 93%). You can try by yourself running this code  in my Github repository.



Tuesday, October 4, 2016

US2016 election forecast

I have run again the Bayesian algorithm that uses a prior near-ignorance model to compute US2016 election forecast.
This is the current situation for Clinton (worst-case in red and best-case in blue).
The probability range of winning the election (by getting the majority of the electoral votes) is [0.68,0.91]. The posterior distributions obtained using the
prior near-ignorance model are shown in figure:
The state by state situation is the following

The most uncertain states at the moment are
-Arizona
-Florida
-Iowa
-Nevada
-North-Carolina
-Ohio
-Pennsylvania
-

Friday, September 23, 2016

ECML 2016 tutorial on Bayesian vs. Frequentist tests for comparing algorithms

Tutorial went very well. It was a nice experience and we received very positive feedback. If you are interested in the content please visit this page.

Clinton vs. Trump 23th Sptember 2016


I have run again the Python code that computes the worst-case (red) and best-Case (blue) posterior distribution for Clinton winning the general USA election. using fresh (September) poll-data.    At the moment there is a quite large uncertainty but is still in favour of Clinton: the  probability of winning is between 0.78 and 0.95.  If you are interested in the methodology I have used to compute these distributions please see the past posts. If you want to try yourself, Python code and data-poll from fivethirtyeight.com are available in my github (just click on the links).
 

Friday, September 16, 2016

19 September Tutorial at ECML

Working on the slides for our Tutorial at ECML 2016 (Riva del Garda)  


G. Corani, A. Benavoli, J. Demsar.  Comparing competing algorithms: Bayesian versus frequentist hypothesis testing


Schedule

Time Duration Content Details
09:00 15min Introduction Motivations and Goals
09:15 60min Null hypothesis significance tests in machine learning NHST testing (methods and drawbacks)
10:15 25min Introduction to Bayesian tests Bayesian model comparison versus Bayesian estimation
10:40 20min Break Is the coffee in Riva del Garda better than the coffee in Porto?
11:00 35min Bayesian hypothesis testing for comparing classifiers Single and hierarchical Bayesian models
11:35 55min Non-parametric Bayesian tests and presentation of the results of Bayesian analysis Dirichlet process and how to perform nonparametric Bayesian tests
12:30 10min Summarizing! Summary and conclusions