I was asked about this last week by a colleague, and now it's hit the blogosphere, so I thought I would publicly leap into a dispute about sexism in science. And make a plea for people to actually look at their data.

This was all started by a group of biologists who have been working at NCEAS on publication biases in ecology (the biggest bias is, of course, that not enough of my papers get accepted straight away). They managed to get their latest results published in TREE.

The received wisdom is that there is a bias against women in science. One area where this might be seen is in acceptance of papers for publication – referees and editors might have a bias (conscious or subconscious) against women. If this is true, the proportion of papers published by women should be higher in journals where the gender of the author is not known.

For this and other reasons there have been suggestions floating around that journals shift to a system of double-blind reviews. At the moment most journals have single blinding: the authors' identities are known to the referees, but the referees' identities are not revealed to the authors (unless the referees wish to do so). In double blinding, the referee doesn't know the identity of the author. Hence, any bias due to gender of the author should be removed. So, if a journal shifts from single blinding to double blinding, the proportion of papers by female authors should increase.

In 2001 the journal *Behavioural Ecology* moved to double blinding. But did this change the proportion of female authors? Or, more exactly, was there a bias against women that was removed? After all, the proportion of female authors might be changing in the rest of science – the null expectation is that the change in *Behavioural Ecology* should be the same as in similar journal, rather than there should be no change. So, the group gathered data on the number of papers by male and female first authors from before and after *Behavioural Ecology* switched to double blinding for five similar journals too. And then they compared the change in proportion of female authors in *Behavioural Ecology* to that in the other journals.

Err, no.

What they did was to compare the change in the proportion of female authors in each journal to zero. They found was that *Behavioural Ecology* and *Biological Conservation*. had increases that were significantly different, but not the other journals. They therefore concluded that there was an effect of double blinding, and that the increase in *Biological Conservation* must have been due to other factors. Oddly, though, at no point did they seem to make a direct comparison. It is not clear that they looked at the data either. Had they done so, they would have seen this:

The lines show the change from before *Behavioural Ecology* went double blind to afterwards. The vertical lines are the standard errors. *Behavioural Ecology* is the thick black line. We see that the proportion of female authors increases in all of the journals, but also that it is greatest in *Behavioural Ecology*. But is that increase significantly (in any sense) greater than in the other journals? Well, comparing it to zero obviously inflates the estimate of significance, because the other journals are all also increasing.

We can get an idea about if the data show anything with a more focussed analysis. This is also simplified, but I an ignoring some variation, and a more sophisticated analysis (=too much hassle to explain) comes to the same conclusion (and yes, for those who have read the paper, so does including the "don't knows").

What we can do is calculate the difference between the before and after proportions of female authors for the “control group”, and estimate the distribution of differences that would be expected if there was no double blinding implemented. Then we can ask if the difference in the proportion for *Behavioural Ecology* falls so far outside this distribution that it would be unlikely to explain the change.

These are the differences:

Journal | Percentage Before | Percentage After | Difference (%) |

Behavioural Ecology | 23.7 | 31.6 | 7.9 |

Behavioral Ecology & Sociobiology | 25.1 | 26.3 | 1.3 |

Animal Behaviour | 27.4 | 31.6 | 4.2 |

Biological Conservation | 13.8 | 20.6 | 6.8 |

Journal of Biogeography | 14.4 | 16.5 | 2.0 |

Landscape Ecology | 19.5 | 23.4 | 3.9 |

For the journals in black, the mean difference is 3.65%, with a standard deviation of 2.15%. If these were exact, then there would be a 95% chance that the change for another, similar, journal would be between -0.6% and 7.9%. So,

*Behavioural Ecology*is right on the edge.

But it assumes that the variance is known. In reality it is estimated, and only estimated from 5 data points (i.e. not a lot). If we take this into account, we find that the prediction for a journal would fall between -2.3% and 9.6% (with 95% probability). Now

*Behavioural Ecology*is reasonably well inside the limits. Even someone wanting to do a one-sided test will find it inside.

So, the analysis shows little evidence for any effect of double blinding. But there are a couple of caveats, which could have opposite effects. The first is simply that there is not a lot of data – only 6 data points. We would really need more journals to be able to come to any conclusion. In particular, there may have been some other changes at

*Behavioural Ecology*that could have had an effect.

The second caveat is more subtle. Suppose you were a journal editor, and you introduce a rule that authors have to admit that statisticians are the highest form of life in their acknowledgements. After a couple of years, you notice that the proportion of authors called Fisher has increased. You wonder if this is because of the new rule. So, you compare it with other journals, and find no increase. You therefore declare that Coxes appreciate statisticians, but other people don't. But what about all those other effects you didn't see? What about the changes in numbers of Boxes, Coxes, and Nelders? Humans are very good at detecting patterns, but very bad at judging whether they are random. And using the same data from which you spotted a pattern to assess whether it is real is naughty – of course you're going to see an effect, because you've already noticed it in the mass of all possible things that could happen. Now, I don't know if the authors are guilty here – they don't say how they came to decide to examine this particular aspect of the data, but the introduction is a bit arm-wavy about the effect of double-blinding on sex ratio.

Of course, the solution to both caveats is simple – get more data. Anyone fancy trawling through the literature this weekend?

EDIT: Oops. I should have hat-tipped Grrlscientist for her post, which encouraged me to write this. Hedwig - I'm sorry. Please don't set Orpheus onto me...

Reference

BUDDEN, A., TREGENZA, T., AARSSEN, L., KORICHEVA, J., LEIMU, R., LORTIE, C. (2008). Double-blind review favours increased representation of female authors. Trends in Ecology & Evolution, 23(1), 4-6. DOI: 10.1016/j.tree.2007.07.008

## 8 comments:

Nope, I don't know why Blogger wants a huge space before the table either.

Dear Bob

Thanks for your in-depth consideration of the paper - it made for interesting reading.

To add a couple of clarifications regarding the rationale behind the study, we are an NCEAS working group interested in exploring many aspects of bias in the ecological publishing community. As ecologists and evolutionary ecologists we were aware at the outset of the review policy change in BE (a number of us had submitted to that journal) and were also aware that this was an exception in our field. Hence we decided to investigate the effect that this change had on author demographics without knowledge of the result. Furthermore, a priori we chose BES as the comparison journal given the very similar IF and nature of the two journals. Investigation of the trends in these two journals gave us some very compelling results (as shown by your figure). However, a comparison of one on one did not lead for large sample sizes and so again, a priori, we developed criteria for a larger sample of single blind journals. These journals show differing trends in the proportion of female first authors, however they also reflect different fields within ecology which may be growing at different rates. Perhaps BES is still the best comparison for this reason?

I completely agree that more data are needed and factors beyond gender need to be explored. Our group is now working on submission data provided by a set of journals (albeit single-blind) to more thoroughly explore demographic effects on the likelihood of acceptance. We hope that these data will provide more insight into the reviewing process.

I thought the same thing as I read the manuscript this morning, Given that the data is available from the MS, it would be possible to reanalyze it using a log-linear model and determine the effect size/significance of the change.

If you haven't already done so, I'll post those results up on my blog later today or tomorrow.

Sean

ae budden - thanks for your comments. Given the variation between journals, my feeling is that matching to just a single journal is difficult, because you have to be sure that you're matching using the right properties. Journals have to be different (competitive exclusion!), and the more similar they are with regards to obvious properties, the more subtle I suspect are the differences.

It's good to hear you're looking at acceptance rates. If you need any help with the data analysis, just ask (Roosa has my email address).

s. walker - Logistic regression almost works. The problems is that the main source of variation is between journals. So, you would need to fix the time*journal effect as random. I did this using a GLMM, and got an estimate of -0.15 with a standard error of 0.10. You get slightly different results depending on what precise model you use, but the conclusions are always the same.

I've reanalyzed the data using a generalized logit model and the results don't support the hypothesis that BES and BE are different.

I posted it on my blog and I'd be happy to send the analysis to you if you want. It does not account for the random nature of the sample, but it is in the same spirit as the original analysis in the paper.

I'm reposting this comment here so that you can see it.

Here's the estimates from my model.

The BE-BES comparison has a parameter estimate of -0.15 (SE=0.11, 95% CI -0.3696-0.0627). This is only looking at the log odds of female to male. Interesting, this is very close to the estimate you obtained.

In contrast comparing BE to JB (JB shifted to an increase in male authored papers) the estimate is:

-0.2372 (95% C.I. -0.4819-0.00755) only looking log odds of female to male.

I agree about the random effects model but I thought it would be interesting to see if there were any significant differences across any of the journals.

I wonder if a short note to TREE is in order?

Oh, good. So if the proportion started were 25%, the effect of double-blinding would be to shift the proportion to 28% (CI: 24%-33%). So the difference isn't huge, and the uncertainty is comparatively large.

I think TREE might be interested. Hang on *rummage rummage*

OK, that's your email address...

I agree, " Logistic regression almost works'. It does require more data, but it can be concluded from most cultures in the world that it is a 'general rule'. One distinct culture is that of a tribe in Indonesia where they have 5 genders, which the Bissu are the meta-gender.

Post a Comment