Thursday, 22 December 2016

Controversial statues: remove or revise?

The Rhodes Must Fall campaign in Oxford ignited an impassioned debate about the presence of monuments to historical figures in our Universities. On the one hand, there are those who find it offensive that a major university should continue to commemorate a person such as Cecil Rhodes, given the historical reappraisal of his role in colonialism and suppression of African people. On the other hand, there are those who worry that removal of the Rhodes statue could be the thin end of a wedge that could lead to demands for Nelson to be removed from Trafalgar Square or Henry VIII from King’s College Cambridge. There are competing petitions online to remove and retain the Rhodes statue: with both having similar numbers of supporters.

The Rhodes Must Fall campaign was back in the spotlight last week, when the Times Higher ran a lengthy article covering a range of controversial statues in Universities across the globe. A day before the article appeared, I had happened upon the Explorer's Monument in Fremantle, Australia. The original monument, dating to 1913, commemorated explorers who had been killed by 'treachorous natives' in 1864. As I read the plaque, I was thinking that this was one-sided, to put it mildly.

But then, reading on, I came to the next plaque, below the first, which was added to give the view of those who were offended by the original statue and plaque. 

Source: Source:
I like this solution.  It does not airbrush controversial figures and events out of history. Rather, it forces one to think about the ways in which a colonial perspective damaged many indigenous people - and perhaps to question other things that are just taken for granted. It also creates a lasting reminder of the issues currently under debate – whereas if a statue is removed, all could be forgotten in a few years’ time. 
Obviously, taken to extremes, this approach could get out of control – one can imagine a never-ending sequence of plaques like the comments section on a Guardian article. But used judiciously, this approach seems to me to be a good solution to this debate.

Friday, 16 December 2016

When is a replication not a replication?

Replication studies have been much in the news lately, particularly in the field of psychology, where a great deal of discussion has been stimulated by the Reproducibility Project spearheaded by Brian Nosek.

Replication of a study is an important way to test the the reproducibility and generalisability of the results. It has been a standard requirement for publication in reputable journals in the field of genetics for several years (see Kraft et al, 2009). However, at interdisciplinary boundaries, the need for replication may not be appreciated, especially where researchers from other disciplines include genetic associations in their analyses. I’m interested in documenting how far replications are routinely included in genetics papers that are published in neuroscience journals, and so I attempted to categorise a set of papers on this basis.

I’ve encountered many unanticipated obstacles in the course of this study (unintelligible papers and uncommunicative authors, to name just two I have blogged about), but I had not expected to find it difficult to make this binary categorisation. But it has become clear that there are nuances to the idea of replication. Here are two of those I have encountered:

a)    Studies which include a straightforward Discovery and Replication sample, but which fail to reproduce the original result in the Replication sample. The authors then proceed to analyse the data with both samples combined and conclude that the original result is still there, so all is okay. Now, as far as I am concerned, you can’t treat this as a successful replication; the best you can say of it is that it is an extension of the original study to a larger sample size.  But if, as is typically the case, the original result was afflicted by the Winner’s Curse, then the combined result will be biased.
b)    Studies which use different phenotypes for Discovery and Replication samples. On the one hand, one can argue that such studies are useful for identifying how generalizable the initial result is to changes in measures. It may also be the only practical solution if using pre-existing samples for replication, as one has to use what measures are available. The problem is that there is an asymmetry in terms of how the results are then treated. If the same result is obtained with a new sample using different measures, this can be taken as strong evidence that the genotype is influencing a trait regardless of how it is measured. But when the Replication sample fails to reproduce the original result, one is left with uncertainty as to whether it was type I error, or a finding that is sensitive to how it is measured. I’ve found that people are very reluctant to treat failures to replicate as undermining the original finding in this circumstance.

I’m reminded of arguments in the field of social psychology, where failures to reproduce well-known phenomena are often attributed to minor changes in the procedures or lack of ‘flair’ of experimenters. The problem is that while this interpretation could be valid, there is another, less palatable, interpretation, which is that the original finding was a type I error.  This is particularly likely when the original study was underpowered or the phenotype was measured using an unreliable instrument. 

There is no simple solution, but as a start, I’d suggest that researchers in this field should, where feasible, use the same phenotype measures in Discovery and Replication samples. Where that is not feasible, the could pre-register their predictions for a Replication Sample prior to looking at the data, taking into account the reliability of the measures of the phenotype and the power of the Replication Sample to detect the original effect, based on the sample size

Tuesday, 13 December 2016

When scientific communication is a one-way street

Together with some colleagues, I am reviewing a set of papers that combine genetic and neuroscience methods. We had noticed wide variation in methodological practices and thought it would be useful to evaluate the state of the field. Our ultimate aim of identifying both problems and instances of best practice, so that we could make some recommendations.

I had anticipated that there would be wide differences between studies in statistical approaches and completeness of reporting, but I had not realised just what a daunting task it would be to review a set of papers. We had initially planned to include 50 papers, but we had to prune it down to 30, on realising just how much time we would need to spend reading and re-reading each article, just to extract some key statistics for a summary.

In part the problem is the complexity that arises when you bring together two or more subject areas, each of which deals with complex, big datasets. I blogged recently about this. Another issue is incomplete reporting. Trying to find out whether the researchers followed a specific procedure can mean wading through pages of manuscript and supplementary material: if you don’t find it, you then worry that you may have missed it, and so you re-read it all again. The search for key details is not so much looking for a needle in a haystack as being presented with a haystack which may or may not have a needle in it.

I realised that it would make sense to contact authors of the papers we were including in the review, so I sent an email, copied to each first and last author, attaching a summary template of the details that had been extracted from their paper, and simply asking them to check if it was an accurate account. I realised everyone is busy and I did not anticipate an immediate response, but I suggested an end of month deadline, which gave people 3-4 weeks to reply. I then sent out a reminder a week before the deadline to those who had not replied, offering more time if needed.

Overall, the outcome was as follows:
  • 15 out of 30 authors responded, either to confirm our template was correct, or to make changes. The tone varied from friendly to suspicious, but all gave useful feedback.
  • 5 authors acknowledged our request and promised to get back but didn’t.
  • 1 author said an error had been found in the data, which did not affect conclusions, and they planned to correct it and send us updated data – but they didn’t.
  • 1 author sent questions about what we were doing, to which I replied, but they did not confirm whether or not our summary of their study was correct.
  • 8 did not reply to either of my emails.

I was rather disappointed that only half the authors ultimately gave us a useable response. Admittedly, the response rate is better than has been reported for people who request data from authors (see, e.g. Wicherts et al, 2011) – but providing data involves much more work than checking a summary. Our summary template was very short (effectively less than 20 details to check), and in only a minority of cases had we asked authors to provide specific information that we could not find in the paper, or confirmation of means/SDs that had been extracted from a digitised figure.  

We are continuing to work on our analysis, and will aim to publish it regardless, but I remain curious about the reasons why so many authors were unwilling to do a simple check. It could just be pressure of work: we are all terribly busy and I can appreciate this kind of request might just seem a nuisance. Or are some authors really not interested in what people make of their paper, provided they get it published in a top journal?