Wednesday, March 06, 2013

The inevitability of bad predictions

Yesterday the Guardian published a piece of psephology by John Ross. He gets one thing right but almost everything else wrong.

I’ll start positive. The thing that Ross gets right is his main point: Conservative support has been in decline for a long time. He says that since 1931, the Conservative share of the vote has dropped by an average of 0.2% a year.

I agree. I’ve only looked back to 1945 (covering 18 general elections rather than Ross’s 20), but I also get an average 0.2% decline a year – with, of course, a lot of variation around this general trend.

But what Ross doesn’t mention is that the Labour vote has also declined, by an average of 0.2% a year (since 1945). Conversely, the Liberal/Lib Dem vote has risen over this period by an average 0.3% a year. You can see the rough picture from this chart:

 
These numbers do, though, depend on your starting-point. As Hopi Sen points out, 1931 was a stunning Conservative landslide; 1945 was a Labour one. If you start at 1974, after the first Liberal surge, the Labour trend is, on average, flat and the Lib rise is under 0.1% a year. If you start at 1983, the Libs are in slight decline. If you start at 1997, the Conservatives are on the up.

But for the sake of argument, let’s stick with the longer-term picture.

Things really go wrong when Ross looks to the future. This paragraph contains one of the highest concentrations of wrongness I’ve ever seen:

Taking these projections, if the Tories won the next election, they would get 34.6% of the vote, and if they lost they would get 30.3% of the vote. As there is no doubt at present that the Tories will lose, they will get 30.3% of the vote. As always there is a bit of statistical noise in any calculation, so 29.3% to 31.3% would be a reasonable range, but 30.3% is the central figure.

What he seems to be doing is separating elections that the Conservatives have won from ones they have lost, and then extrapolating the trends for both categories.

This is a logical flub. You don’t first ask whether the Conservatives will win and then go on to wonder what vote they’ll get. You first ask what vote they (and other parties) will get, and then use that to see who’ll win. Votes determine victories, not the other way round.

So, having established that there are only two possible Conservative vote shares in 2015, he then says “there is no doubt at present that the Tories will lose”. Might there be doubt in the future? I guess it’s doubtful. But the interesting thing here is that Ross already has the election result predicted without even using his system. Presumably he’s looking at opinion polls like the rest of us. So what’s the use of the system?

Then there’s the “bit of statistical noise”: he reckons his system’s predictions have a margin of error of plus or minus 1%. That’s a lot better than the 3% that a normal-sized opinion poll has. I wonder how he arrives at this number?

He doesn’t say, but if I wanted to arrive at such a number, I’d start with this chart:


The dots are actual vote shares and the lines are the overall average trend. The distances between the dots and the lines show how close the model has been in the past. You’ll notice that most of the dots are more than 1% away from the relevant lines.

In fact, the median error is 3.8% for the Conservative vote, 2.1% for the Labour vote and 3.6% for the Liberal/Lib Dem vote: half the time, the model was wrong by more than these amounts.

But bear in mind that this model is based on a very small sample of data: 18 election results since 1945 (or 20 for Ross since 1931; adding the extra two really won’t change the picture). So the confidence intervals of any conclusions we draw from it may well be large. And they are. The standard deviation of the error in the predicted Conservative vote is 4.1%, in the Labour vote 4.6% and in the Lib Dem vote 4.3%.

Assuming a normal distribution, two-thirds of observed results would be expected to fall within one standard deviation of the central result. For a typical 95% confidence interval, we need to go plus or minus two standard deviations: 8.4% for the Conservative vote, 9.2% for Labour, 8.6% for the Lib Dems.

So, my version of Ross’s model gives these central projections for 2015: Conservatives 34.3%, Labour 31.5%, Lib Dems 26.4%. But all I’d be confident in saying is that the Conservatives will get between 25.9% and 42.7%, Labour between 22.3% and 40.7%, and the Lib Dems between 17.8% and 35%.

Probably. Assuming that there’s a genuine phenomenon here that will continue in the future. And ignoring all polling evidence.

In conclusion: Yes, long-term trends are noteworthy. But let’s not read too much into them.

No comments: