## Monday, 31 December 2007

### A physicist stumbles into a statistical field

By a virtual game of Chinese Whispers (in which no Chinese were involved), I found out about a paper on arΧiv where a poor unsuspecting physicist wanders into a curious part of statistics. I'm actually something of a bystander in this area, but it's not going to stop me commenting on it.

OK, so the paper is by a guy called Bruce Knuteson, from MIT. He's interested in working out the scientific worth of a piece of empirical work, and being a physicist, he wants to measure it.

So, the first problem is to decide what worth is. Knuteson decides to measure it in terms of "surprisal", i.e. how surprised we are by a result. So, if we collected data, and got a result (say, a measurement of a parameter) xi, how shocked would we be by it? From this, Knuteson decides that ...

The scientific merit of a particular result xi should thus (i) be a monotonically decreasing function of the expectation p(xi) that the result would be obtained, and (ii) be appropriately additive.

and so suggests -log(Pr(xi)) as a measurement, as it has these properties. He then suggests that the worth of an experiment can be estimated as the expected value of this, i.e. the sum of -Pr(xi)log(Pr(xi)). This is a measure called entropy: something beloved of physicists and engineers, but rather opaque to the rest of us. The idea is that a larger entropy will mean that the experiment is better - we will expect more surprising results.

But is this a good measure? Perhaps a good way of tackling this is to view it as a problem in decision theory. How can we decide what is the best course of action to take when we are uncertain what the results will be? For example, if we have a choice of experiments we can carry out, how can we decide which one to do? To do this we first need to define "best". This has to be measured, and the numerical value for each outcome is called the utility, U. This might, for example, be the financial gain or loss (e.g. if we are gambling), or might be something more prosaic, like one's standing in the scientific community (however that is measured. h-index?). All the effects of each action, both positive and negative, go into this number. So, for example, we would include the gain in prestige from publishing a good paper, and the cost (e.g. financial, or the effect on our notoriety if the results are a turkey). The second part of the decision analysis is to give a probability for each outcome, so for action A the probability might be 0.3 that we get a Nature paper, and 0.7 that we get a Naturens verden paper. For action B it might be 0.9 and 0.1 respectively. We then calculate the average utility for each action, i.e. sum the probability of each result multiplied by the utility for that result.

This is what Knuteson does to get his measure. The problem is that his only utility is surprisal, and in general this doesn't make sense. Two things are missing. Firstly, there is no cost element. So, if we want to measure the time it takes an apple to fall on a physicist's head, it makes no difference if we pay a couple of students \$1 or £30,000,000 to do it. The second problem is that there is no measure of scientific worth. Finding out if the next toss of a €1 coin is treated exactly the same as finding out if the Higgs boson is green.

This leads to clearly nonsensical results. If there are only two possible outcomes of an experiment, then the maximum expected surprisal occurs if the probability of one is 0.5. Therefore the optimal experiment is one with this property. For example, tossing a €1 coin. According to Knutsen, then, we should fund lots of coin tossing experiments (hmm, there's an Academy of Finland application deadline coming up).

The second thing that is missing is where the probabilities come from. These are probabilities of outcomes that are not observed, so in general they cannot be measured (without doing the experiment...). Therefore one has to assign them based on one's subjective opinion. Now we are on familiar Bayesian ground, and is something that has been argued about for years. But here I think Knutsen can use a sneaky trick to sidestep the problems. Put simply, he could argue that in practice the estimation of merit is made by people, so they can assign their own probabilities. If someone else disagrees, fine. This way, it is clearer where the disagreement lies (e.g. which probabilities are being assigned differently).

So, estimating the merit of a piece of work before it is done can be problematic (and I haven't touched on comparing experiments with different numbers of possible outcomes!). But Knutsen develops his ideas even further. How about, he asks, worthing out the merit of an experiment after it has been done?

Before doing this, Knutsen sorts out a little wrinkle. It is not generally the experimental results themselves that are of interest - it is how they impact on our understanding of the natural world. We can think about this in the way that we have several models, M1, M2, ... Mk, for the world (these might correspond to theories, e.g. that the world is flat, or that it is a torus). The worth of an experiment could then be measured in terms of how it changes what we learn about these experiments, i.e. how Mj changes with the data, xi. This can simply be measured as the entropy of the models, the Mk's, rather than the experimental outcomes.

Knutsen goes through the maths of this, and finds that the appropriate measure of the merit of an experiment is a measure of how far the probabilities of each model are shifted by the experiment. To be precise, it is a measure known as the Kullback-Leibler divergence (I will spare you the equations). Now, this again is something that is familiar. A big problem in statistics is deciding which model is, in some sense, best. This can be done by asking about how well it will predict an equivalent data set to the one being analysed. After going through a few hoops, we find that the appropriate tool is the K-L divergence between the fitted model and the "true" model. Of course, we don't know the true model, but there are several teaks and approximations that can be made so that we don't need to - it is the relative divergence of different possible models that is important. The result of this is a whole bunch of criteria that are all TLAs with IC at the end - AIC, BIC, DIC, TIC, and several CICs.

The optimal experiment is the one which will maximise the difference between our prior and posterior probabilities of the different models (yes, Bayesian again). The idea is natural - the greater the difference, the more we have learned, and hence the better the experiment is. Of course, we still have the same problems as above, i.e. assigning the probabilities, and getting the utility right, but we are in the right area. Indeed it turns out (after browsing wiki) that the idea is not original - the method proposed by Knutsen is the same as something called Bayesian D-optimality. And (after reading the literature), the idea goes back to 1956!

So, does this help? For the general problem of estimating scientific merit, I doubt it. There are too many problems with the measure. It may be useful for structuring thinking about the problem, but in that case it it little different from using a decision analytic framework.

In experimental design, it is of more use, but then the idea is not original. The other area it might be useful is in summarising the worth of an experiment for estimating a parameter, such as the speed of light. There will be cases where physical constants have to be measured. Previous measurements can then be used to form the prior (there are standard meta-analysis methods for this), and then the K-L divergence of several experiments can be calculated, to see which gives the largest divergence. This is some way from the ideas Knutsen is thinking about (he explicitly rejects estimating parameters as being of merit!). But I think more grandiose schemes will die because of naysayers like me nagging at the details.

Reference
Lindley, D.V. (1956). On a Measure of the Information Provided by an Experiment. The Annals of Mathematical Statistics, 27, 986-1005.

## Tuesday, 25 December 2007

### Goodwill To All Man and Beast

It's Christmas, so even beasts get to enjoy some peace...

Alas, the bottom of the sofa does not. Words will be had later.

## Monday, 24 December 2007

### Hyvää Joulua!

Yes, it's time to wish you all a
Merry Christmas!

This advert, like Julebryg is a tradition in Denmark. Around Christmas you go to the cinema, and this advert comes on to be greeted by a huge cheer. If only the films were greeted with such enthusiasm...

But anyway, have a good Christmas and New Year, especially to you three regular readers.

## Saturday, 22 December 2007

### Finnish Exports, Pt. 2

After I posted the video on the Finnish export business, I thought I should post a saftey video, just in case you were thinknig of ordering an item.

## Wednesday, 19 December 2007

### Finnish Exports, Pt. 1

It's almost Christmas, but not yet. So it's still worth posting this educational video about one of the major Finnish exports.

Although I should complain that our 'O' level geography teacher never told us about this. I guess he just wasn't competent. *sigh*

## Saturday, 15 December 2007

### "It's the wrong scanner, Bruce!"

Ouch.
BBC NEWS | UK | England | Norfolk | Hospital's scanner goes walkabout

Someone at Philips sent the wrong scanner to the wrong Queen Elizabeth Hospital (I guess it's easy to mix up Kings Lynn and Adelaide). But anyway, the BBC report includes this:

"There's probably a shipping clerk sitting in an office very miserable," QEH spokesman Richard Humphries said.

Yes, but which Queen Elizabeth Hospital does he speak for?

## Wednesday, 12 December 2007

### Oooh! Struck by a blog meme!

GrrrrrlScientist struck me with first ever blog meme (woo-hoo!). I had to participate, but it took me a bit of time thanks to hardware problems and an influx of spawning cod. So. Here we go...

The Rules are as follows:

1. Link to the person that tagged you and post the rules on your blog.

2. Share 7 random and or weird things about yourself.

3. Tag 7 random people at the end of your post and include links to their blogs.

4. Let each person know that they have been tagged by leaving a comment on their blog.

Seven Random or Weird Things About Me:

1. I have a reputation for wearing t-shirts (and nothing over them) when it's cold, i.e. above freezing. The secret is to keep moving, folks.

2. I haven't cut my hair for 15 years, and it's still barely shoulder-length.

3. I use to bake bread in Danish.

4. Some films I have never seen: Bambi, E.T., Gone With the Wind, Casablanca.

5. There's a mistake in the first line of the introduction of the first paper I ever published. The moment I saw the reprint, I spotted it. Of course, I missed it in all of the drafts before then.

6. I have two chairs for my computer. That way, when the Beast sits in one, I gently shift it to one side and sit in the other.

7. I'm intrigued by the idea that internet memes are SIR epidemics.

Seven Random Bloggers whom I read;

1. Hermagoras at Paralepsis

2. Andrew Gelman (or one of his minions) Statistical Modeling, Causal Inference, and Social Science

3. John Wilkins and his ridiculously funny Evolving Thoughts

4. The Highly Allochronic Chris

5. Martin på Aardvarchaeology

6. Kristine the Amused Muse, and witch.

7. S.A. Smith, the ERV (and modern artist)

## Monday, 10 December 2007

### BBC report filtered out

Actually, it's somehow comforting that there is a part of the brain dedicated to ignoring things. And another dedicated to deciding what's irrelevant. But somehow Henry always seems to slip past that one. Read more!

## Wednesday, 5 December 2007

### Focus on DIC

This is something that's only of interest to Bayesians, so the rest of you can look away.
In a couple of analyses with BUGS, I've seen comparisons of DIC from different models, and the values have been almost exactly the same. This sort of thing is suspicious, and eventually I worked out why. I thought it was worth posting about this, so that everyone else can share The Secret.

It's best to explain the problem with an example (the complete R code for this is below). I simulated a couple of simple hierarchical models, so

Yij ~ N(θi, σy2)
θi ~ N(μi, σθ2)

(j =1,...,n, i=1,...,m. There are m groups, each with n observations). I then had two models for μi:

Model 1
μi = φ + β (i-5.5)

Model 2
μi = φ

The first model has a covariate (cunningly equal to the identity of the group), and the second has none. The data are plotted below. The effect of the covariate is clear, so DIC should be able to pick it up.

Now, I fit each of the models to each data set. This is easy running it through the R2WinBUGS package in R. From this I can extract the DIC:

Data 1
Model 1 DIC: 1464.0 pD: 10.9
Model 2 DIC: 1464.4 pD: 11.2

Data 2
Model 1 DIC: 1407.7 pD: 10.9
Model 2 DIC: 1407.7 pD: 10.9

So, in both cases the DIC is the same (for Data 2 the difference is in the third decimal place!). But for Data 1, Model 1 should be better - hey, we can see it on the figure! So, what's going on?

We can get a clue from plotting the posteriors for μi from the groups, from the two models:

Only the error bars are plotted (i.e. plus/minus 1 posterior standard deviation), and the 1:1 line is drawn in a fetching shade called "hotpink". Obviously the models are predicting the same means for the groups, and hence we will get the same deviance. We can see why this is happening from the group-level standard deviations (σθ2, posterior means and standard errors in parentheses):

Data 1
Model 1: 0.87 (0.280)
Model 2: 4.4 (1.31)

Data 2
Model 1: 1.1 (0.34)
Model 2: 1.1 (0.31)

So, for the data where there is a trend, but none is fitted, σθ2 is much larger - essentially, the lack of the linear trend is compensated by the increase in variance. The difference is not in the model for θ at all, but higher in the hierarchy (or hier in the higherarchy?).

Of course, this is obvious from looking at the models. The solution is to change the focus, from θ to φ and &beta. This then means calculating the marginal deviance, marginalising over θ, i.e. looking at P(Y | φ, β) and integrating over P(θ | Y). This can be done analytically (Hat-tip to David Spiegelhalter for correcting my errors, and refraining from making any justified comments about my mathematical ability!), whence we find that the deviance can be calculated because

barYi. ~ N(μi, σy2/n + σθ2) )

when we do this, we get these results:

Data 1
Model 1 DIC: 26.9 pD: 2.7
Model 2 DIC: 57.8 pD: 1.7

Data 2
Model 1 DIC: 30.9 pD: 2.6
Model 2 DIC: 30.3 pD: 1.7

Now this makes more sense, for the data with an effect, the DIC massively favours the correct model. Without the effect in the data, the DIC is pretty similar. In both cases, also note that pD is larger by 1 for the model with 1 extra parameter. Which is what should happen!

What lessons can we draw from this? Firstly, that DIC is not an automatic panacea - you do have to focus it on the right part of the model. If the focus is not at the level immediately above the data (i.e. θ here), then you can't use the DIC given by BUGS. The correctly focussed DIC is more complex to get at: you have to calculate it yourself. For more complex models this might be awkward, if there are no analytical results, then the parameters to be integrated out have to be simulated, for example by MCMC. But this requires some further thought...

library(R2WinBUGS)
# Simulate data
Cov=1:10
Group = rep(Cov, each=50)
beta = 1
# With covariate
GrpMean1=rnorm(length(Cov), beta*Cov, 1)
Value1=rnorm(length(Group), GrpMean1[Group], 1)
DataToBUGS1=list(N=length(Value1), NGrp=length(Cov), Group=Group, Y=Value1)
# Without covariate
GrpMean2=rnorm(length(Cov), 5.5, 1)
Value2=rnorm(length(Group), GrpMean2[Group], 1)
DataToBUGS2=list(N=length(Value2), NGrp=length(Cov), Group=Group, Y=Value2)

# Plot the data
# png("C:/Bob/Blog/Data.png", width = 960, height = 480)
par(mfrow=c(1,2), mar=c(2.1,2.1,1.1,1.1), oma=c(2,2,0,0), las=1)
plot(jitter(Group), Value1, pch=3, col="grey70")
points(Cov, GrpMean1, col=1, pch=3, cex=1.5)

plot(jitter(Group), Value2, pch=3, col="grey70")
points(Cov, GrpMean2, col=1, pch=3, cex=1.5)
mtext("Group", 1, outer=T)
mtext("Y", 2, outer=T)
#dev.off()

# Write BUGS models to files
model1 <- function(){
for (i in 1:N){ Y[i] ~ dnorm(muGrp[Group[i]], tau.y) }
for (j in 1:NGrp){
muGrp[j] ~ dnorm(muG[j], tau.Grp)
muG[j] <- mu0 + betaGrp*(j-5.5)
}
mu0 ~ dnorm (0.0, 1.0E-6)
betaGrp ~ dnorm (0.0, 1.0E-6)
tau.y <- pow(sigma.y, -2)
sigma.y ~ dunif (0, 1000)
tau.Grp <- pow(sigma.Grp, -2)
sigma.Grp ~ dunif (0, 1000)
}
write.model(model1, "C:/Bob/Blog/model1.txt")

model2 <- function(){
for (i in 1:N){ Y[i] ~ dnorm(muGrp[Group[i]], tau.y) }
for (j in 1:NGrp){ muGrp[j] ~ dnorm(mu0, tau.Grp) }
mu0 ~ dnorm (0.0, 1.0E-6)
tau.y <- pow(sigma.y, -2)
sigma.y ~ dunif (0, 1000)
tau.Grp <- pow(sigma.Grp, -2)
sigma.Grp ~ dunif (0, 1000)
}
write.model(model2, "C:/Bob/Blog/model2.txt")

# Initial values
Inits1=list(list(mu0=0, betaGrp=1, sigma.Grp=5, sigma.y=1), list(mu0=2, betaGrp=0, sigma.Grp=5, sigma.y=1) )
Inits2=list(list(mu0=0, sigma.Grp=5, sigma.y=1), list(mu0=2, sigma.Grp=5, sigma.y=1) )

# Fit the models
# Data 1, Model 1
Data1Model1.post=openbugs(DataToBUGS1, inits=Inits1, c("mu0", "muGrp", "betaGrp", "sigma.y", "sigma.Grp"), model.file = "C:/Bob/Blog/model1.txt", n.chains = 2, n.iter = 11000, n.burnin = 1000, n.thin = 10)
# Data 1, Model 2 (no slope in model)
Data1Model2.post=openbugs(DataToBUGS1, inits=Inits2, c("mu0", "muGrp", "sigma.y", "sigma.Grp"), model.file = "C:/Bob/Blog/model2.txt", n.chains = 2, n.iter = 11000, n.burnin = 1000, n.thin = 10)
# Data 2, Model 1
Data2Model1.post=openbugs(DataToBUGS2, inits=Inits1, c("mu0", "muGrp", "betaGrp", "sigma.y", "sigma.Grp"), model.file = "C:/Bob/Blog/model1.txt", n.chains = 2, n.iter = 11000, n.burnin = 1000, n.thin = 10)
# Data 2, Model 2 (no slope in model)
Data2Model2.post=openbugs(DataToBUGS2, inits=Inits2, c("mu0", "muGrp", "betaGrp", "sigma.y", "sigma.Grp"), model.file = "C:/Bob/Blog/model2.txt", n.chains = 2, n.iter = 11000, n.burnin = 1000, n.thin = 10)
print(Data1Model1.post)
print(Data1Model2.post)
print(Data2Model1.post)
print(Data2Model2.post)

# Plot mu
# png("C:/Bob/Blog/Theta.png", width = 960, height = 480)
par(mfrow=c(1,2), mar=c(2.1,2.1,1.1,1.1), oma=c(2,2,1,0), las=1)
plot(Data1Model1.post\$mean\$muGrp, Data1Model2.post\$mean\$muGrp, type="n", main="Data 1")
segments(Data1Model1.post\$mean\$muGrp,Data1Model2.post\$mean\$muGrp-Data1Model2.post\$sd\$muGrp, Data1Model1.post\$mean\$muGrp, Data1Model2.post\$mean\$muGrp+Data1Model2.post\$sd\$muGrp)
segments(Data1Model1.post\$mean\$muGrp-Data1Model1.post\$sd\$muGrp, Data1Model2.post\$mean\$muGrp, Data1Model1.post\$mean\$muGrp+Data1Model1.post\$sd\$muGrp, Data1Model2.post\$mean\$muGrp)
abline(0,1, col="hotpink")

plot(Data2Model1.post\$mean\$muGrp, Data2Model2.post\$mean\$muGrp, type="n", main="Data 2")
segments(Data2Model1.post\$mean\$muGrp, Data2Model2.post\$mean\$muGrp-Data2Model2.post\$sd\$muGrp, Data2Model1.post\$mean\$muGrp, Data2Model2.post\$mean\$muGrp+Data2Model2.post\$sd\$muGrp)
segments(Data2Model1.post\$mean\$muGrp-Data2Model1.post\$sd\$muGrp, Data2Model2.post\$mean\$muGrp, Data2Model1.post\$mean\$muGrp+Data2Model1.post\$sd\$muGrp, Data2Model2.post\$mean\$muGrp)
abline(0,1, col="hotpink")
mtext(expression(paste(theta[i], ", Model 1", sep="")), 1, outer=T)
mtext(expression(paste(theta[i], ", Model 2", sep="")), 2, outer=T, las=0)
#dev.off()

####################################################################### DevCalc1=function(mcmc, data) {
NamesGrp=paste("muGrp[", data\$Group, "]", sep="")
-2*sum(dnorm(data\$Y, mcmc[NamesGrp], mcmc["sigma.y"], log=TRUE))
}
DevCalc2=function(mcmc, data) {
Mean=unlist(tapply(data\$Y, list(data\$Group), mean))
Mu=mcmc["mu0"]
N=length(data\$Group)/length(unique(data\$Group))
-2*sum(dnorm(Mean, Mu, sqrt((mcmc["sigma.y"]^2)/N + mcmc["sigma.Grp"]^2), log=TRUE)) }
DevCalc2beta=function(mcmc, data) {
Mean=unlist(tapply(data\$Y, list(data\$Group), mean))
Mu=mcmc["mu0"] + mcmc["betaGrp"]*(as.numeric(names(Mean)) - 5.5)
N=length(data\$Group)/length(unique(data\$Group))
-2*sum(dnorm(Mean, Mu, sqrt((mcmc["sigma.y"]^2)/N + mcmc["sigma.Grp"]^2), log=TRUE))
}
DIC.calc=function(dat, mcmc, func) {
Dev=apply(mcmc\$sims.array, c(1,2), func, data=dat)
mean=apply(mcmc\$sims.array, 3, mean)
pD=mean(Dev) - func(mean, dat)
DIC=mean(Dev) + pD
return(list(DIC=DIC, pD=pD, Dbar=mean(Dev)))
}
PrintDIC=function(DIC, Label) { cat(Label, " DIC:", DIC\$DIC, " pD:", DIC\$pD, "\n")}

# Focus on theta
DIC.D1M1=DIC.calc(DataToBUGS1, Data1Model1.post, DevCalc1) DIC.D1M2=DIC.calc(DataToBUGS1, Data1Model2.post, DevCalc1)
DIC.D2M1=DIC.calc(DataToBUGS2, Data2Model1.post, DevCalc1)
DIC.D2M2=DIC.calc(DataToBUGS2, Data2Model2.post, DevCalc1)
PrintDIC(DIC.D1M1, "Data 1, Model 1")
PrintDIC(DIC.D1M2, "Data 1, Model 2")
PrintDIC(DIC.D2M1, "Data 2, Model 1")
PrintDIC(DIC.D2M2, "Data 2, Model 2")

# Focus on phi
DIC2.D1M1=DIC.calc(DataToBUGS1, Data1Model1.post, DevCalc2beta)
DIC2.D1M2=DIC.calc(DataToBUGS1, Data1Model2.post, DevCalc2)
DIC2.D2M1=DIC.calc(DataToBUGS2, Data2Model1.post, DevCalc2beta)
DIC2.D2M2=DIC.calc(DataToBUGS2, Data2Model2.post, DevCalc2)
PrintDIC(DIC2.D1M1, "Data 1, Model 1")
PrintDIC(DIC2.D1M2, "Data 1, Model 2")
PrintDIC(DIC2.D2M1, "Data 2, Model 1")
PrintDIC(DIC2.D2M2, "Data 2, Model 2")

## Tuesday, 4 December 2007

### Finnish curiosities

The Finns are a curious people. This morning I got a phone call asking me if I owned a TV.
I don't. End of conversation. We actually spent longer ascertaining that the conversation had to be in Enlgish. Read more!

## Saturday, 24 November 2007

### The Beast is Happy

The beast is evil. How can you not like him when he's this happy?

But why is he so happy...

Because he has a new prime minister. I told him the guy's name is Rudd, and he thought it was great having a fish as PM - if he's a failure, some lucky moggie can eat him.

I guess this explains why he was never impressed with Bush in the US - he has a self-cleaning litter tray already.

## Monday, 12 November 2007

### Damn, it was only spam

I've just cleaned up my RNI mailbox, and amongst the spam I got these two subjects:

Drum Snail Bible Necklace Festival

Snail Foot Dung Rope Leather jacket

Is the leather jacket made from a drum snail, one wonders.

## Sunday, 4 November 2007

### Thrilling

Some people have far too much time on their hands. They could be doing something profitable, like writing a blog. Instead...

They re-make Jacko's Thriller in lego.

Incidentally, I came across this whilst trying to find a photo to illustrate oozing.

## Sunday, 28 October 2007

### Chasing The Beast back to its lair

After the Wombat's success, we chased the beast back to its lair (every beast has to have a lair). This is all we could see:

Neither of us were brave enough to go any further.

## Monday, 22 October 2007

### The Wombat Strike Back

For those of you worrying about the wombat and the beast, there is good news. The wombat has been learning a few moves.

The beast was beaten off (at least temporarily).

## Wednesday, 17 October 2007

### Jerry Fodor fails Evolution 101

The latest "fun" on the evolutionary (pro and anti) parts of the web has been discussion of an article in the London Review of Books by Jerry Fodor. In it he proclaims that natural selection is on its way out. Alas for him, his argument is based on an impressive ignorance of evolutionary biology. Jason Rosenhouse has done a good job taking down this mess, but still left something for the rest of us.

Fodor's claim is that there are two problems with theories of evolution by natural selection, one conceptual, one empirical. The conceptual problem first:
Here’s the problem: you can read adaptationism as saying that environments select creatures for their fitness; or you can read it as saying that environments select traits for their fitness. It looks like the theory must be read both ways if it’s to do the work that it’s intended to: on the one hand, forces of selection must act on individual creatures since it is individual creatures that live, struggle, reproduce and die. On the other hand, forces of selection must act on traits since it is phenotypes – bundles of heritable traits – whose evolution selection theory purports to explain. It isn’t obvious, however, that the theory of selection can sustain both readings at once.
Here he's bringing up a well-known issue, part of the levels of selection debate. The complete debate is more extensive than I'll describe here, and is largely solved. The problem that Fodor is focussing on is that genes code for traits - a gene might affect colour of a bear, for example, with one allele making it white. But, it is the whole bear that lives or dies. What, then, is being selected? Fodor:

It couldn’t, for example, be literally true that the traits selected for are the ones Mother Nature has in mind when she does the selecting; nor can it be literally true that they are the traits one’s selfish genes have in mind when they undertake to reproduce themselves. There is, after all, no Mother Nature, and genes don’t have, or lack, personality defects. Metaphors are fine things; science probably couldn’t be done without them. But they are supposed to be the sort of things that can, in a pinch, be cashed. Lacking a serious and literal construal of ‘selection for’, adaptationism founders on this methodological truism.
Of course, we do have an explanation, and one which does not rely on metaphor. Instead, we use mathematics. The ideas were developed by George Price, and the main result is the Price Equation. In the simpler form, it says that the change in the average value of a trait is equal to the covariance between the trait and fitness. Now, as fitness is measured on individuals, we have the explicit connection between a trait and selection. The more complex form of the Price equation allows for changes within individuals (e.g. if the environment changes, and this causes a change in the traits).

Price's equation helps us get Fodor out of the dilemma he's talked himself into:
Maybe one can, after all, make sense of mindless environmental variables selecting for phenotypic traits. ... The crucial test is whether one’s pet theory can distinguish between selection for trait A and selection for trait B when A and B are coextensive: were polar bears selected for being white or for matching their environment? Search me; and search any kind of adaptationism I’ve heard of. Nor am I holding my breath till one comes along.
Poor Fodor is going to asphyxiate as he waits for a theory that went shooting past him in the early seventies. Put simply, it doesn't matter whether we can distinguish between selection for traits A and B if they're correlated. Price tells us what will happen (and, of course, the Price is right). Some polar bears were whiter than white, and this meant that they were more likely to survive, and hence whiteness was correlated with fitness. Others presumably were a bluey-whiteness you'll really like, but this was correlated with a lower fitness. Hence that colour reduced in frequency. We can even deal with evolution of trait A when trait B is under selection - just follow the covariances!

Fodor's empirical problem is a wonderful straw man, dressed up in word salad. Follow this if you dare:
Adaptationism is a species of what one might call ‘environmentalism’ in biology. (It’s not, by any means, the only species; Skinnerian learning theory is another prime example.) The basic idea is that where you find phenotypic structure, you can generally find corresponding structure in the environment that caused it. Phylogeny tells us that phenotypes don’t occur at random; they form a more or less orderly taxonomic tree. Very well then, there must be nonrandomness in the environmental variables by which the taxonomic tree is shaped. Dennett has put this idea very nicely: ‘Functioning structure carries implicit information about the environment in which its function “works”. The wings of a seagull . . . imply that the creature whose wings they are is excellently adapted for flight in a medium having the specific density and viscosity of the atmosphere within a thousand metres or so of the surface of the Earth.’ So, phenotypes carry information about the environment in which they evolved in something like the way that the size, shape, whatever, of a crater carries information about the size, shape, whatever, of the meteor that made it. Phenotypes aren’t, in short, random collections of traits, and nonrandomness doesn’t occur at random; the more nonrandomness there is, the less likely it is to have been brought about by chance. That’s a tautology. So, if the nonrandomness of phenotypes isn’t a reflection of the orderliness of God’s mind, perhaps it is a reflection of the orderliness of the environments in which the phenotypes evolved. That’s the theory of natural selection in a nutshell.
Now, what he's trying to say (and I'll spare you the rest of the section) is that evolution is constrained - there are some things that a species can't evolve into, because they is no way of developing in that direction:

For example, nobody, not even the most ravening of adaptationists, would seek to explain the absence of winged pigs by claiming that, though there used to be some, the wings proved to be a liability so nature selected against them. Nobody expects to find fossils of a species of winged pig that has now gone extinct. Rather, pigs lack wings because there’s no place on pigs to put them. To add wings to a pig, you’d also have to tinker with lots of other things. In fact, you’d have to rebuild the pig whole hog: less weight, appropriate musculature, an appropriate metabolism, an apparatus for navigating in three dimensions, a streamlined silhouette and god only knows what else; not to mention feathers. The moral is that if you want them to have wings, you will have to redesign pigs radically. But natural selection, since it is incremental and cumulative, can’t do that sort of thing. Evolution by natural selection is inherently a conservative process, and once you’re well along the evolutionary route to being a pig, your further options are considerably constrained; you can’t, for example, go back and retrofit feathers.
I wonder in Prof. Fodor knows what chiroptera are? Of course, Fodor is right that there are developmental constraints. But so f*cking what? Of course that restricts how a species might evolve - it's one of the more powerful arguments for evolution by natural selection. It's why bats don't have feathers - they didn't evolve them, but found another way of taking to the air. This is why Fodor's problem is so silly - we know that there are endogenous constraints on species, but we also see that they still adapt. They work their way around problems the best they can, even if it means having a nerve 10 to 15 feet longer than it has to be. Fodor talks about species developing along one path, and that cutting out others, but why does this stop adaptation? I can't think of anyone who would have problems with this - hey, even game theorists understand it.

Apparently Fodor is writing a book with Massimo Piattelli-Palamarini about evolution without adaptation. Somehow I don't think it'll be about genetic drift.

## Monday, 15 October 2007

### Lunacy and the Planets

I was just clearing my inbox, and found this in Science:

Excitation of Lunar Eccentricity by Planetary Resonances -- Cuk 318 (5848): 244 -- Science
I assume this means that lunacy and astrology are linked. Or perhaps that the star signs of werewolves are important.

## Friday, 12 October 2007

### It almost makes me feel homesick

In today's ScienceExpress (the online first service of Science):

Máté Ádámkovics, Michael H. Wong, Conor Laver, Imke de Pater

Evidently the first man (or woman!) onto Titan should be English. The bad news, though, is that the rain is of methane not water. I haven't tried, but I'm guessing the tea won't taste as nice.

## Tuesday, 9 October 2007

### An epic struggle

The reinforcements have started their action.

At first, the beast didn't know what was coming

"What are you cheering for? Is something going on?"

"OK, that's another round of shredded sofa later."

"*sigh*. I suppose I'll have to wait for the two of you to finish before starting on the carpet."

## Monday, 8 October 2007

### Ha-wha'?

Whilst going through my backlog of email, I went through the Science table of contents. It included a paper with this title:

Major Australian-Antarctic Plate Reorganization at Hawaiian-Emperor Bend Time

OK, I recognise every word, but don't expect to see them in that order.

On reflection, I think it makes more sense if we assume there's a typo, and "Bend" is actually "Bed". Quite why antipodeans would want to reorganise their china at that time of day I'll let someone else explain. Or if anyone has a better (if not more accurate) interpretation, please tell! Bonus marks if it involves hedgehog pathways.

Reference (for those who care): Whittaker et al. (2007) Science 318: 83-86 DOI: 10.1126/science.1143769

### Back again

I've been away for a few days on a course in Estonia, on management and supervision of academic research. This has spawned a few random thoughts:

1. If you take your laptop, remember the power cord. It's more important than the mouse.
2. As I might have expected, some of what we were told I had already worked out, but some I hadn't. It's nice to know which is which. There were also a few ideas that I hadn't thought about.
3. Sharing a room with someone who snores is not a good idea. And going down with 'flu isn't sufficient for revenge.
4. Management theory can actually be useful - obviously there is a lot that is common sense, but having it organised does help. In particular, there was a discussion about project management. In classical project management, one sets out the stages that are needed to complete the task (say, build a bridge), and set out the schedule by working backwards from completion, to decide when each task should be scheduled. I guess this is used all over the place, and is one of the reasons western society works (shocking, isn't it?). Working out how to take an organised approach to this sort of task, and then how to teach this to managers is actually a good thing. IOW, MBAs are not necessarily useless.
5. Thanks to one of the speakers, I'm now reading text with a Welsh/Swedish accent. Should I seek medical help?
6. I hope my students don't mind doing a lot of writing.
7. If you find yourself on the M/S Star between Helsinki and Tallinn, don't try the pizza from the fast food place. It's awful (and remember, I'm English, so I know bad food) - a slab of congealed artificial cheese with chicken and battery-farmed pineapple. It was called Hawaij, presumably because they feared a law suite for defamation from the good people of Hawaii. The people of Four Seasons do not seem to have as much of a reputation for litigation.
8. I don't know what happened in the rugby World Cup on Saturday. I think the media must be lying to us. Ah well, an England-France semifinal, just like last time.

Remind me, who were in the other semifinal in 2003?

## Monday, 1 October 2007

### Reinforcements

I have finally got some help to defend my computer from The Beast:

I don't know if a wombat is the best animal to help, but we're off to a good start. Incidentally, I think it might be a Fijian wombat.

## Saturday, 29 September 2007

### My Week in Pictures

I had a good week work-wise this week, I got a few things to work. I also realised that I can pretty much summarise it in pictures.

The first is from something I was playing with the previous Friday afternoon, seeing if I could get the ODE solver in BUGS to work. For this I fitted the Lotka-Volterra equations to some famous data on Lynx-Hare cycles from the Hudson bay Company. I got it working this week, and the model fits reasonably well:

I'm happy with the lynx fit, but the hares aren't great - I might have to add carrots into the model. But considering how simple the model is, it's pretty good.

The second result is something I've been working on for some time, a multi-trait QTL analysis. This is genetics - trying to locate genes that affect different traits. We've been looking at models that can include several traits in the same analysis. Here I plotted what is roughly the probability that each locus (on the x-axis) affects each trait (each line).

Locus 12 seems to be affecting several traits - yes, we have pleiotropy! The numbers aren't totally convincing, but it looks like there's something there.

At the same time, I've also been doing another predator-prey analysis, on voles and weasels. That worked as well, and this is a plot showing that the interactions are local:

The ideas is that if the circles overlap, there is a correlation between the sites. Actually, this interpretation may be a bit dodgy, but the main thing that comes out of this is that most of the sites are behaving independently.

Finally, this Friday's afternoon playing around involved more ODEs, modelling the numbers of butterflies over a season. This was fairly quick to fit, although it could behave better.

It also needs a bit more work on it (hey, it was only about an hour's work!), because there is more variation than the model accounts for.

Other than showing I had a good, interesting week, does this really say anything? One thing is that none of the graphs will look like that in their final versions - these are all produced as part of the process of doing the analysis. But "quick and dirty" graphs are still very useful for looking at the results, and getting a feel for what's happening. It also shows the worth having packages with good, flexible graphing functions - two of the plots are standard BUGS plots, the other two are drawn in R. Plots are also an easy way of showing collaborators what you've done - I've drawn several other plots this week, and not all will end up in the final write-ups of the work.

Now I just have to write all this up.

## Friday, 28 September 2007

### Training the Beast

It's a generally held view that cats are independent, and hence can't be trained beyond the litter tray. Well, this is false even for older cats.

I knew I had to take Jack's training slowly, so I started with the order "Eat!". He learned that one surprisingly quickly. We've now progressed onto "Squeak!", and he's doing very well at that now.

It's all matter of timing.

## Monday, 24 September 2007

### Rechtschreibung ist schwierig

I just read this in a discussion on Pharyngula

Go easy on him; spelling is hard.

Even Einstein, the smartest man in the world managed to get "i" before "e", except after "c" wrong twice in his own name.

Posted by: John McKay | September 24, 2007 12:33 AM

Sheer class, and one I'll have to remember to steal.

## Saturday, 22 September 2007

Remember Dover? For those of you who don't, it was the latest stand by creationists to get creationism into schools (under the guise of "intelligent design"). At the centre of the case was a text designed for schools called Of Pandas and People. During Michael Behe's testimony, he was presented with a quote from the new version of the text, which is still in preparation:
Sudden emergence holds that various forms of life began with their distinctive feature already intact, fish with fins and scales, birds with feathers and wings, animals with fur and mammary glands.
I bring this up because I want to continue where I left off fisking this paper

Sherman, M. (2007). Universal genome in the origin of metazoa. Cell Cycle 6: 1873-1877. Link

Previously, I went through the evidence Sherman put forward to suggest that there was a problem for evolutionary biology, and humbly suggested that it may not be as much as a problem as he thought, and that if he was to make sweeping statements, he might like to support them. So, now let's get on to what he suggests explains the problems he sees. In a nutshell, it is sudden emergence with huge brassy shiny knobs on.

This is what he writes:
Here I propose a hypothesis that answers the questions posed above, and offers experimentally testable predictions. This hypothesis postulates that (1) shortly (in geological terms) before Cambrian period a Universal Genome that encodes all major developmental programs essential for every phylum of Metazoa emerged in a unicellular or a primitive multicellular organism; (2) The Metazoan phyla, all having similar genomes, are nonetheless so distinct because they utilize specific combinations of developmental programs. In other words, in spite of a high similarity of the genomes in phyla X and Y, an organism belonging to phylum X expresses a specifc set of active developmental programs, while an organism belonging to a different phylum Y has a distinct set of "working" programs specifc for phyla Y.
(as in my previous post, all grammatical mistakes are in the original. I don't want to indicate them all, so let's just give Sherman a sic note)

In other words, every species had every gene, but not all of them are used. Now, to experienced watchers of ID on the blogosphere, this is a familiar notion, John A. Davison's Prescribed Evolutionary Hypothesis. Davison is a crackpot, but let us not judge Sherman on those grounds. Certainly, Sherman has thought through his ideas more, and is probably a sane, normal person.

So, Sherman's idea is that a Universal Genome appeared in the Cambrian, causing an explosion. Since then, the process of evolution has just been one of changes in the switches in pre-existing developmental programmes. Sherman implies that these changes are not genetic (as all organisms started with the same Universal Genome), so it's not awfully clear how he thinks developmental changes have occurred. Perhaps he should throw some epigenetics into the mix.

So, how did the Universal Genome appear? Ah, Sherman says nothing. It's taken as a given that supporters of ID will say that it can't say anything about the identity of the designer, but this is going one step further - stuff just appeared without mentioning the possibility of a designer. You see, it's Sudden Emergence, only without the mammaries.

Sherman quickly seeks to address what he sees as a fundamental problem: if the universal genome was in all Metazoa initially, why don't we see it in all now? The solution is obvious - genes have been lost over time:
A beautiful illustration of such a loss is a Wnt gene family. In humans, there are nineteen Wnt genes belonging to twelve families. In Hydra. on the other there are two Wnt genes that correspond to two families found in humans. Simple analysis of this finding within the framework of the classical model suggests that additional human genes have developed from ancestral Wnt genes found in Hydra. However, Anemona that belongs to a distinct branch of Cnidaria has eleven Wnt genes belonging to eleven families found in humans. Therefore, it is quite obvious that many Wnt gene families, possibly the entire set, exists in the gene pool of the primilive Metazoan phylum, and various members of Wnt families were lost in different species within this phylum.
Waste not, Wnt not.
Accordingly, the proposed model predicts that in various groups of Cnidaria we will find many diverse gene families that function in more advanced phyla.
Or that they have also become lost.

Sherman expands on his ideas:
The "Universal Genome" hypothesis does not contradict any well-established data on the genetic evolution (e.g., gene duplications or accumulation of mutations, molecular clock. etc). but suggests that genetic evolution could shape and improve function of developmental programs.
It's a pity Sherman missed out all that well know data about large chunks of DNA suddenly appearing in a genome, without any immediate function. I'm a bit sceptical about his suggestion that the hypothesis doesn't contradict data on the accumulation of mutations, so let's see how he defends that:
...
Yep, that's how he does it. Once more, make a bold statement, don't give any argument or evidence to support it and move on. For the moment, so shall we.

Sherman fills in a couple of details (including a bit of special pleading for genes have only been lost is some lineages - there must be a special mechanism for their conservation in others. Again, just a a statement, not backed up), and then gets on to the "this is really science" part:
There are two main testable predictions of the presented hypothesis, which are absolutely critical for validation of the model; (1) full or parts of the developmental programs characteristic to higher taxons must be encoded in genomes of lower taxons, and (2) blocks of genetic information encoding these developmental programs in more primitive taxons must be useless in these taxons.
What happened to waste not Wnt not? Sherman is saying that primitive organisms must still carry genetic code for more advanced developmental programmes. But he's already accepted that they can lose genes. So, (1) can't be critical. (2) is more interesting - one would need to show that the genetic information was present, but not expressed in the "primitive" taxa. Sherman, of course, doesn't suggest this. Instead, he tells us that the common ancestor of Arthropoda and Chordata didn't have eyes (as we have already seen, this is not quite right), but yet jellyfish have eyes, and they are controlled by similar genes to those in Drosophila. Oh, and as part of this argument, he points out that jellyfish don't have a central processing unit like a brain, so it is unclear how they could process and integrate what they see. Once again, because we are ignorant of something, it can't possibly happen.

Sherman also suggests that we could try and induce development of these more advanced programmes, for example by over-expressing Pax6 in the sea urchin, and seeing whether it develops eyes (and becomes a see urchin, I suppose). It's not clear to me why over-expression would make a difference: we already know that it is expressed in the sea urchin's foot. Without, apparently, an eye being developed. Could it be that other genes involved in eye production are missing? The public needs to know!

We also get another prediction:
Another indication that latent developmental program is present in a lower taxon would be expression of such a program in higher taxons derived from the lower one in a seemingly convergent processes. For example a possible experiment would be to activate development of circulation systems of mammalian or bird types in lizards or even in Xenopus [a frog]. The circulation systems in mammals and birds appear to be very similar, however, they developed from the reptilian system independently in these taxons. Therefore it seems likely that Reptilia possess the program of development of the circulation system of the mammalian/bird type and requires only a minor switch to activate it.
This sounds like we would end up with a reptile that was developmentally very confused. Note we're told that Reptilia have the programme for development, but we're not shown any evidence. Where are the homologs and orthologs for the genes? Is Sherman unable to BLAST the Xenopus genome to see if the genes are there? (in case you're wondering, I'm too lazy to chase this up myself).

Sherman suggests his second prediction could be tested by deleting genes and seeing if there is a physiological effect, e.g. the genes for the adaptive immune system in the sea urchin. This makes sense (although you might have to attack the sea urchin with a pathogen afterwards, to see if it responds). Except that it's not clear what one would conclude if nothing happened - Sherman has already discussed the possibility that genes are lost, so he could claim that that's what has happened in this case. In other words, the predictions don't provide potential falsification (although if they were found to be correct, they would provide a powerful verification of the idea).

So, we have a paper that makes some dodgy claims from ignorance that evolution can't explain the Cambrian explosion or the evolution of body plans, followed by and alternative hypothesis which explains nothing that can't be explained by evolutionary biology, relying on gaps in our knowledge to create doubt. And it says nothing about the elephant in the room - how sudden emergence happened. The key part of the hypothesis - how developmental information appeared - is just stated and then left aside.

The whole premise of Intelligent Design as science was that one could investigate design without asking about the designer (because, obviously, that would mean admitting you thought the designer was the god of Abraham). Sherman has taken this to the next stage - he doesn't even mention the possibility of a designer.

I mentioned in my first post that I was suspicious about this paper. That was because the grammar looked odd (the mistakes were in simple grammar, but complex structures were correct). That was before I read this, from the Disco Institute (p20) (pdf):
On the other hand, reading the papers on evolution published in respectedscience journals like Proceedings of the National Academy of Science or Nature, one is surprised at the weakness of the arguments. Indeed, the standards of proof in the field are much lower than in the rest of biology. Such papers would never make it through the peer-review process if they concerned molecular or cellular biology. Of course, there are obvious reasons for such low standards, including the difficulty of testing evolutionary hypotheses through experimentation. But if the theory is based on poor arguments, why have criticisms of it not succeeded in convincing mainstream scientists?
Poor arguments? Ha ha ha ha! Now, this guy must be spoofing us, and the DI.

As for the jourmal, it is asking to be spoofed. This is part of the journal's explanation of why one should publish there:

• Rapid response to presubmission
inquiries (usually within the hour). Most papers are rejected without
external review. (papers send for external peer-review are expected to
be published). During presubmission inquiry, reviewers will be
contacted to accelerate further review. Authors are encouraged to
suggest and decline potential reviewers.

• Ultra-rapid peer-review (usually within one-two days)

• Papers
rejected from other top journals (e.g., Nature, Science, Cell), if the
authors choose, may be submitted with previous reviews and decision
letters. This allows for the consideration of a paper without sending

So, the journal takes an hour to decide if a paper is good enough - if it's sent out to review, then it will have to be pretty crap to be rejected. And there is pressure on the referees to comment on a manuscript very quickly. This must have a negative impact on the quality of the refereeing - sometimes you have to go through a paper carefully, and spend time checking references, and also thinking about it - often I need time to work out why I'm not sure about a paper, or to work out what to recommend. The final bulletpoint says to me that the journal is desparate - they want good papers (doesn't every journal?), so they are prepared to cut corners to do so.

The journal is desparate to do things quickly, so it looks like if you wanted to get a dodgy or hoax paper published, this is a good journal to do it with. I hope that is what happened. Otherwise Michael Y. Sherman will have to justify why he can submit a paper in whcih he ignores basic norms of writing science - you know, backing up your argument with evidence. And whether this is a hoax or not, Cell Cycle has to justify how it can publish paper which nobody there has even read properly.

EDIT: Forgot to hat-tip Albatrossity for the pdf. Thanks!

## Friday, 21 September 2007

### Why We Admire the French

As we're discussing rugby, I thought it was about time I dug out one of those moments that you don't mind being on the receiving end of.

Oh, and for any Aussies out there, you can guess what's below the fold.

I think it says something that the greatest moment of English rugby is a fly-half kicking a drop goal.

## Wednesday, 19 September 2007

### My lips are sealed

I've just come back from a meeting at a pharmaceutical company. If you want to know what we discussed, you'll have to ask me in twenty years' time. Mind you, I will have forgotten it by then. Read more!

## Tuesday, 18 September 2007

### More on the Beast

The beast is still around. Sometimes I can ward him off with a fluffy purple thing on a stick, but he returns, with evil intent etched across his face:

Even when he's supposedly resting, I know he's still plotting to keep me away from the blog:

I wonder - which one of you is paying him to do this?

## Monday, 17 September 2007

### What ecologists discuss

I've just got an email from an ecology listserv with the intriguing title of:

Inducing vomiting in salamanders

Last week, my bestest friend DaveScot put up a post at Uncommon Descent (the intelligent design blog of William Demski and other illuminaries) about a paper on front-loading. This is an idea that DS is keen on - that there was an ur-cell that had all of the instructions necessary for all of life, and these were turned on at the right time to produce whatever The Designer wanted to appear. I thought it was worth having a look at the paper, if only to stave away boredom. This is the citation:

Sherman, M. (2007). Universal genome in the origin of metazoa. Cell Cycle 6: 1873-1877. Link

The paper advances a suggestion that goes totally against mainstream evolutionery biology, and is therefore nuts and wrong.

It's almost tempting to stop there, but I doubt anyone would get the joke. So, I'll use a different rhetorical strategy to Sherman, and if I make any grand statements, try to back them up with evidence and argument.

Before laying into it properly, I should state that the paper should never have been published in the form it was. The grammar is awful. I'll comment more on this after I've finished with the text - it makes me a bit suspicious about the whole thing. For the moment, it is enough to point out that the grammatical mistakes in the quotes are in the original, and if I were to acknowledge all of them with the usual sic, this post would look like a vomitorium.

The paper starts by laying out some facts that the author thinks need to be explained:

(1) seemingly simultaneous appearence of paleontological remains of all presently existing Metazoan phyla, both simple and advanced; (2) similarities of genomes among Metazoan phyla of diverse complexity; (3) seemingly excessive complexity of genomes of lower taxons; (4) similar genetics switches of functionally similar but non-homologous developmental programs.

Let's take these one by one:

(1) seemingly simultaneous appearence of paleontological remains of all presently existing Metazoan phyla, both simple and advanced. OK, this one is easy - it's boiler-plate creationism. CC300 and CC301 (go to the links for rebuttals).

(2) similarities of genomes among Metazoan phyla of diverse complexity. Grr, now I have to do some work. Sherman points out that some genes (or rather their orthologs) are found in diverse taxa, and not always doing the same thing. He states:
...one does not expect to find genes responsible for development of bilateral organisms in primitive Metazoa with radial symmetry. Surprisingly, such genes, e.g., orthologs of hox genes, were found in Cnidaria, and furthermore they are expressed in Cnidaria in an asymmetric manner, as if to define segments in these radial organisms.

Why is this unexpected? Sherman does not explain. Perhaps he hasn't heard of common descent. Or co-option, where genes that have one function are used to do something else (any intelligent intelligent design supporter should know that the bacterial flagellum took some of its structure from the Type III Secretory System). Sherman does discuss genes changing function over evolutionary time:
A possible response to these arguments within the classical model would be a suggestion that the genes responsible for eye development in Arthropoda or vertebrates serve different functions in lower taxons (so-called gene sharing). In fact, several examples of gene sharing have been described, e.g., recruiting of small heal shock proteins to serve as crystallines. These examples, however, are exceptionally rare, and it is unclear whether they indeed can be responsible for making de-novo complex developmental programs serving unrelated functions.

So, Sherman, if this is exceptionally rare, why does Conway-Morris declare co-option to be "rampant" (pdf)?
...co-option and redeployment are rampant both in a developmental
context (e.g. Eizinger et al., 1999; Heanue et al., 1999 (see also Relaix and Buckingham, 1999); Merlo et al., 2000; Damen, 2002; Locascio et al., 2002; Lowe et al., 2002; Fabrizio et al., 2003) and in related topics such as those concerned with enzymatic pathways (e.g. Peregrin-Alvarez et al., 2003).

Look, look! Conway-Morris acts like a fusty old academic and gives citations! Curse the man for making it so easy to check his assertion!

Of course, it could be that Sherman is unaware of this work, because he hasn't read this paper. Except ... it's cited in his paper. Not that that means much (find the Know-Thine-Own-Self Results).

It's around here that Sherman makes a comment so factually wrong even I spotted it. He writes
In fact, many of the regulatory genes were lost later in evolution, and are not present in Drosophila or C. elegans, e.g., hedgehog gene,5 indicaling that their presence is not necessay for development and life of vey complex Arthropoda.

Um, but hedgehog is found in Drosophila. It must be - it has the requisite silly name. Even funnier, it was discovered in the fruit fly! Oh, and the same page shows that there are genes similar to hedgehog in C. elegans.

Where were we? Oh, next point...

(3) seemingly excessive complexity of genomes of lower taxons; An immediate problem here is how one defines complexity. PZ Myers has a nice essay on this. But let us proceed. Sherman points out that the sea urchin, which apparently is primative (I guess this means it doesn't know how to eat spaghetti properly), has a whole suite of genes involved in eye development:
While the presence of the opsins could be explained by their possible function in a simple light sensing, sea urchin has the entire set of orthologs of major genes involved in the eye development ... Therefore, it appears that information on the eye development is encoded in the sea urchin genome, while no eye is actually developed, and thus the genetic information seems to be excessive.

He also points out that the sea urchin has the genes for an adaptive immune system.
Yet, sea urchin does not have antibodies, and possibly lacks adaptive immunity in general. Genes that are seemingly useless in sea urchin but are very useful in higher taxons exemplify excessive genetic information in lower taxons.

Or perhaps they exemplify our lack of knowledge about the sea urchin. Now, I know that Sherman is at Boston University Medical School, but I have no idea what he does there. I'm not, though, going to infer that he's useless. Or at least not on this basis.

One can't simply point at a gene and say "we don't know what it does in this organism, so it must be useless". In the paper Sherman cites about the presence of the adaptive immune system genes in the sea urchin, the authors point out that we know very little about the immune systems of most species. To nake his case, Sherman has to show that these genes are not used by the sea urchin, e.g. show that they are not expressed. Put the promotor next to GFP, transform it into the sea urchin, and watch to see when GFP is expressed (it glows green - very pretty). Perhaps the genes are active in the early part of the development of the visual sensory system, and this has been well conserved. Or, again, we could posit co-option of genes from one function to another. Just slap it in, and see when it glows!

There's a recurring theme in this paper - the author makes bold statements and utterly fails to back them up, with evidence, argument or citation. I'm not a developmental geneticist, so I don't want to follow all the claims up, although a few are certainly false (e.g. hedgehog above). Others may be correct, but how are we to know? Can we assume divine revelation?

Right, last point.

(4) similar genetics switches of functionally similar but non-homologous developmental programs. Oh, now the evo-devo people are going to love this:
A distinct set of data that call for a novel approach to evolution comes from comparison of genes that control functionally similar genetic programs in Chordata and Arthropoda. There appears to be a high degree of similarity in some of these genes. A classic example of such similarity is Pax6 gene that controls development of visual systems. According to all current accounts, a common ancestor of Chordata and Arthropoda was a very primitive organism that lacked eyes, and therefore the evolution of eyes in these groups was convergent.

All current accounts? Well, I guess Wiki isn't current then. Neither is Sean Carroll, who suggests a common ancestor had proto-eyes at least (p123 of From DNA to Diversity). Or perhaps Sherman is making a bold claim without any evidence. Again. Carroll et al. suggest that the development of the proto-eye that the common ancestor had was under control of Pax6, as it is so conserved. Sherman appears to be unaware of the idea of common descent. I hear it's a rather popular theory that's doing the scientific rounds.

So, to summarise where we've got to: Sherman has suggested that there is evidence for a problem, but has been unsuccessful in actually providing it. He goes on to suggest an alternative, which is a delight to snigger at. But that, dear fools who have gotten this far, is for another day. Read more!

## Thursday, 13 September 2007

### Moral dilemmas

Is it unethical to gloat to the cat over the Aussies being beaten in the Twenty20 World Cup by Zimbabwe? Especially as he didn't say a word when they were hammering England last winter.

Always nice to hear from an Aussie captain: "Our batting was diabolical, you can't afford to get off to those starts and that's where we lost the game."

Of course, well one to Zimb, I hope they went out and had a good party last night. And that they're still hung over today when they play England....

## Monday, 10 September 2007

### Filling the intertubes

Earlier today I sent out a message to a mailing list, and I got the inevitable out of office replies. I can accept that they're a useful idea, but not when this is the message:

I will be out of the office starting 09/10/2007 and will not return until 09/11/2007.

I will be travelling on business on September 10th. I will respond to your message no later than September 11th.

In particular note the "I will respond...", not "I may" or "If any action needs to be taken, I will...".

I'm debating whether I should respond with this:
Thank you for your out of office reply. Unfortunately I will be out of my office on September 11th, so I will not be able to reply to the response you have promised until September 12th. I hope the delay this causes does not unduly inconvenience you.

OK folks, what's wrong with this headline?

BBC NEWS | Health | Potato 'fuel of human evolution'

Yes, human evolution occurred in Africa, whilst the potato was happily growing in the Americas.

The story is better, the main results are summarised as:
Compared with primates, humans have many more copies of a gene essential for breaking down calorie-rich starches, Nature Genetics reports.

And these extra calories may have been crucial for feeding the larger brains of humans, speculate the University of California Santa Cruz authors.
So it might be the ability to digest starch that was important.

Perhaps it's also worth pointing out that the author might have been writing for an African audience. I recently found out that what is usually called a potato in Europe and the Americas is called an Irish potato in Africa (or at least some parts of it). This is to distinguish it from other potatoes such as the sweet potato.

EDIT: the headline has now been changed to "Starch 'fuel of human evolution'". I guess someone else pointed out the absurdity too.

## Friday, 7 September 2007

### The snipers were extra

Most of the time police forces should be respected and supported - they do a difficult job, and I'm sure most are trying to serve their communities.

But (and you knew one was coming) sometimes they make total arses of themselves. This time its the turn of the Aussie police. As reported worldwide, some comedians drove through the security around the APEC meeting using nothing more devilish than some Canadian flags (the Osama look-alike was merely a detail). The police just waved them through to the hotel where President Bush is staying.

So, what do the police do? They allowed these comedians through to the hotel, didn't stop them or apparently check their ID. They've charged them with "entering a secure area". Errrm, you let them. Does this mean that anyone who is allowed into a secure area should be arrested? Including Mr. Bush? Please?

Apparently the police are considering other charges. The comedians evidently have considerable skill at making people laugh at them and appearing absurd, so presumably they'll be charged with impersonating a police officer.

## Thursday, 6 September 2007

### Why I've not been blogging much

There's a monster guarding my laptop

I managed to drive it off for long enough to post this. But it'll be back.

Oh yes, it'll be back.

## Monday, 3 September 2007

### Wambam: the photos

Last week I was in Gotland for a meeting on the animal model (if you don't know, don't ask. It's safer that way). I took some photos, of varying quality.

All of the photos are on my Flickr page, but I thought this was worthy of more attention. It's Henrik demonstrating that mathematics really is gobbledigook, something we had merely suspected:

If anyone who was at the meeting would like to add links to their photos, or add anything else (e.g. transcriptions of Lars' poetry), please do!

Last but not least, the meeting was co-organised by Jon Brommer. He couldn't join us though, because he was waiting to become a father. He duly did that just before the meeting started, so congratulations to Jon and Marianne!

## Sunday, 2 September 2007

### ESEB Saturday

Phew, the last day.

We started with a plenary lecture by Scott Edwards. He was talking about phylogenetics, and making the point that gene trees are not the same as species trees. Err, OK, let's back up. A method used a lot to work out the relationship between species is to pick a gene (e.g. CO1), sequence it in different species, and use that to draw a tree. This works because when a species is split into two species, variation in the gene sequence can accumulate: mutations occur in one species, and can become fixed by random drift. The longer two species have diverged, the more differences in the sequences have accumulated.

One thing people in phylogenetics often want to estimate is when two species have diverged. They do this by drawing a phylogenetic tree from a gene sequence, and looking at time of the the relevant split in the tree. Edwards pointed out that this may not be the time when the species diverged. The problem is that just before two species diverge, there may be variation in the gene sequences in the population. The time of the gene's split would therefore be before the species' split, so the estimate of the split would be too old. The soluion to this is to use several genes in the building of the tree, and use a model that estimates the time of the species tree splits from the distribution of gene tree splits: the gene trees put an upper bound on the species tree splits (because the species have to split after the genes), and says more something about the distribution (if all gene split between 100 and 110 million years ago, the species trees split is unlikely to be 2 million years ago).

After the plenary, I wandered off to the session on evolution in agriculture. This has nothing to do with what I work on, so I was there out of curiousity. A couple of talks were about the origins agriculture in the middle east, and its spread through Europe. They used looked at gene sequences and inferred one or two originations, and Huw Jones showed that in barley they could find a signature of spread aong the Meditterranean to Spain, and a second spread through central an northern Europe. In contrast, Hazel Goodwin was unable to find any pattern of spread in wheat. The reason for the difference isn't clear.

After lunch, I went to the final session, on association mapping. This is an approach to mapping genes based on the correlation between genes or markers and a phenotype from individuals sampled from a population. It can go badly wrong because the association may not be because the markers are close to a gene affecting the trait, but instead because of population history: if the population being sampled is made up of two sub-populations, then there may be divergence in the trait, and also in genes nowhere near the any affectig the trait, just by chance. Association mapping would then pick up these relationships, and suggest they are causal.

After the talks were over, we adjourned to the castle for the banquet. The food was good, the wine was excellent, the speeches were short. And an evil Norwegian talked me into organising a session at the next ESEB meeting, in Turin in 2009.

## Saturday, 1 September 2007

### I'm back.

I got back from the wilds of Sweden yesterday morning (1am): after stressing all day about getting the connection from Stockholm to the airport in time, that was fine, but the bus to take us from the aeroplane to the terminal in Helsinki was late.

I spent yesterday running around on errands (deleting spam, picking up the cat etc.). I'll post more goodies later (e.g. the final report on ESEB, and some photos from Gotland). But now I'm off to a wedding (not mine - I'm still in the "all reasonable offers accepted" state). No doubt there will be some very bad jokes in Swedish today.

## Saturday, 25 August 2007

### ESEB: Friday. Starts with sex, ends with Aussie Rules

It's always good to start the day with sex. Alas, today we were only talking and thinking about it.

One of the big mysteries in evolutionary biology has been how sex evolved. John Maynard Smith pointed out in the 1960s that it really shouldn't have - there's a huge cost to any gene (because with sex it only has a 50% chance of being passed on), so a modifier that stops sex and have a 100% chance of being passed on will be fitter. Since then a lot of people have been worrying about this problem. In her plenary talk, Sally Otto talked about recent work that suggests we are close to a resolution of the problem.

There have been a couple of explanations that have been around for some time. The first is that sex helps evolution because it breaks up bad combinations of genes, particularly when the disadvantages are magnified, so that the cost of carrying two bad genes is more than the cost of carrying one bad gene twice (technically this is called epistasis). This does give sex and advantage, but it's so small, and only occurs in limited and unlikely conditions.

The second explanation is the Red Queen hypothesis, again. A species is being subjected to all sorts of attacks (pathogens, parasites etc.), which are co-evolving with them, so there is a constant arms race (this is the Red Queen bit). A species evolves defences, and sex can help combine them together, to increase the speed at which the species runs away from its enemies. This has some empirical support, but Otto showed that the theoretical results suggested it only worked under a narrow set of circumstances.

She then introduced a third idea - to look at finite populations. All of the previous work she had presented had been done assuming infinite populations. But in a finite population gene combinations can be combined randomly by genetic drift, and also not every combination of genes will be present in the population. Sex can then work to combine gene combinations and give an advantage. Adding the Red Queen improves the advantage (and I suspect that any sort of environmental variation will give an advantage to sex, more work needs to be done etc.).

After the plenary I ran off to the session on integrating evolution and ecology. Lots of different things. Carol Eunmi Lee talked about a lot of work on invasing species (err, were they isopods? Something small and aquatic anyway), and suggested that species that invade tend to live in more variable environments in their core ranges (there was more than this, but I turned off during the physiology). Virpi Lummaa talked about a huge data set on agricultural Finns (the church has kept the records on births marriages and deaths for over 200 years, so they put a few of them in a database), looking at twins. She showed that a female twin is less fit if her sibling was male, and that this was due to conditions in the womb (because it persisted even if the sibling died in the first three months of life).

After the sessions there was the ESEB business meeting. As always with these things, some of it was interesting, some just needed to be said (e.g. the accounts). A couple of interesting things appeared from the meeting - the SSE (our US counterparts) wanted to orgnaise a joint meeting. One option had been to have it in 2011, when there should be an ESEB meeting. Nobody liked that, but the second option was to to have a meeting in 2012 in Canada. This looks like the option that will be followed up, so we'll see what happens. The other thing is that there will be a new journal starting next year called Evolutionary Applications.

Today's the last day, and is followed by the banquet. Tomorrow I have to get an early train to go to Gotland, so there may not be any report for a few days.

## Friday, 24 August 2007

### ESEB: Thursday

A short day today: there are excursions in the afternoon, as well as several barbequeues (including the one I attended).

But before then we had to "work". The plenary was given by Michael Majerus. He talked about the classic peppered moth example of evolution. The original story is well known, but some of the original work by Kettlewell has been criticised, and a book appeared a few years go which accused Kettlewell of fraud. The book has been panned (and when I read it, I wasn't convinced either), but is used by creationists as the basis of one of their assaults on evolution. Actually, there are some good criticisms of Kettlewell's work, so Majerus decided to re-run the experiments (actually not quite - he would have needed a wood which is still polluted) but redesigning them to take the criticisms into account.

Majerus ran his studies for 6 years in his field site (actually his back garden), and also trapped moths near by to look at the change in the melanic form. The long and the short of it is that he replicated Kettlewell's results (qualitatively - after half a century we would expect some differences). So, it looks like Kettlewell was right after all (to the surprise of nobody in the room). We then had a sermon about how we need to keep faith and belief out of science, and that the peppered moth is such a good example for education that it should be used. Not really controversial for this audience, but good to rally the troops.

The parallel sessions weren't too interesting for me, but I went to Jim Mallet's talk on mimicry in Heliconius and other butterflies. This is a story that's been going on for years, and I look in on it when I'm at meetings. The latest thing is that they are thinking about how mimicry can affect speciation - mimicking a distasteful species means you don't get eaten, and different populations mimic different species in different areas. But wing colouration also affects mating behaviour (quite simply, the male can't recognise the female that's the 'wrong' mimic), so that could lead to reproductive separation and speciation. SO, they're now looking for hybrids between species, and also into the genetics and development of wing colour. More next time.....

In the afternoon I didn't go on an excursion, instead we had a barbaqueue in the house I'm staying (err, actually outside the house). Lots of beer drunk, a new wealth index invented, and just after a pregnant lady sat down next to her, Vilppu asking "So, how do you get the sperm?"