Tuesday, 26 June 2007

The Big Harry Potter prediction

I was (re-)writing a lecture for a course I'm giving next week in Asturias, and included an example of prediction.  I thought it was sufficiently interesting to preview here.  Feel free to disagree.

The background: even if you haven't heard now, you'll soon be sick of hearing that HP7, Harry Potter and the Deadly Hallows is released next month.  There are lots of unknowns, which give us plenty of scope to predict what will happen, and then hope nobody remembers when they're wrong.

So, I decided to make my own predictions - after all, that is what the lecture is about.  So, I've tried to predict how many pages the book will have.  As the course is in statistics, we need data.  For that I use the number of pages in the previous books, and the dates they were published:

Potter book lengths

Odd, it looks like Ursa Major.  It must be a clue!  Perhaps it's pointing north, so Harry will have to go to the frozen northlands, be helped by a polar bear in his quest, and in the final climax end up betraying a friend in order to save everything, and break through to a new universe.


Anyway, now we do a linear regression.  Because the course is about Bayesian analysis, I did a Bayesian prediction.  This meant firing up BUGS, fitting a straight line, and predicting the number of pages for a Harry Potter book published in 2007.  I also made a prediction based on the least squares estimate of the line.  This is what I got:

HP7 pages prediction

The thick bars are +/-one standard error, the thin bars are the 95% confidence/credible intervals.  Next week I will blather on about why the Bayesian intervals are wider, but for the moment, I want to point out that I'm predicting that HP7 will be about 860 pages long, but with a wide margin of error: there's a 95% probability that it will be between 500 and 1200 pages long.

And I get paid to do this stuff.


Minyu said...

Looks more like reversed Ursa Major.

jebyrnes said...

On Amazon, it is reported to be 784 pages. Not far off!

Bob O'Hara said...

Minyu - it all looks rather different if you're drunk.

jrbyrnes - thanks for the update! Amazon UK still gives 608 pages, which exactly is the same number of pages as they give for no. 6. Of course, I prefer your version!


Anonymous said...

Show your workings for the examples, 007. (For the neophytes, that is ...).

Andrew said...

If you're being paid to do this stuff, the least you could do is be precise with your language.

Confidence intervals are NOT the same thing as probability. This is one of the most basic tenets of statistical models. Please don't confuse the general public any more by claiming that a 95% confidence interval is the same as a 95% probability that a random variable will lie within that range.

Bob O'Hara said...

Andrew - the 95% probability came from the Bayesian interval, so there is a 95% probability that the number of pages will be in this interval.

OK, there is a subjective aspect (as it's based on my priors, although they are flat here) but in the Bayesian world we can and do make probability statements about random variables.


Taabu said...

I hope and pray that your comment was inn good faith and not just a cheap sneer at Bob. And if it were not then you better get a good detergent to clean the eggs off your face in displaying obtuse ignorance between CI and crediable intervals. Otherwise remain a student of life and learn something new every single day.