I was (re-)writing a lecture for a course I'm giving next week in Asturias, and included an example of prediction. I thought it was sufficiently interesting to preview here. Feel free to disagree.
The background: even if you haven't heard now, you'll soon be sick of hearing that HP7, Harry Potter and the Deadly Hallows is released next month. There are lots of unknowns, which give us plenty of scope to predict what will happen, and then hope nobody remembers when they're wrong.
So, I decided to make my own predictions - after all, that is what the lecture is about. So, I've tried to predict how many pages the book will have. As the course is in statistics, we need data. For that I use the number of pages in the previous books, and the dates they were published:
Odd, it looks like Ursa Major. It must be a clue! Perhaps it's pointing north, so Harry will have to go to the frozen northlands, be helped by a polar bear in his quest, and in the final climax end up betraying a friend in order to save everything, and break through to a new universe.
Anyway, now we do a linear regression. Because the course is about Bayesian analysis, I did a Bayesian prediction. This meant firing up BUGS, fitting a straight line, and predicting the number of pages for a Harry Potter book published in 2007. I also made a prediction based on the least squares estimate of the line. This is what I got:
The thick bars are +/-one standard error, the thin bars are the 95% confidence/credible intervals. Next week I will blather on about why the Bayesian intervals are wider, but for the moment, I want to point out that I'm predicting that HP7 will be about 860 pages long, but with a wide margin of error: there's a 95% probability that it will be between 500 and 1200 pages long.
And I get paid to do this stuff.