Wednesday, 11 March 2020

What can scientists do in an emergency shutdown?

More and more universities are closing as a precaution against coronavirus.  Scientists who work on educational interventions are seeing whole projects go up in smoke when schools close, preventing gathering of endpoint data: and even if they don't close, school staff may be understandably reluctant to have researchers on the premises. Even where university labs remain open, research with humans is likely to be impacted by participants pulling out in order to keep themselves out of harm's way. I'm aware that a major push for data collection that I had planned over the next 12 months is now at risk.  I'm writing this post in part in the hope that people who have creative solutions for adapting to this situation can add them as comments, so we might find a shared way through this difficult time.

Of course, it is hoped that funders will be sympathetic to shutdowns and may offer extensions of funding, but that won't be much comfort to anyone doing a time-sensitive project, such as an intervention trial or longitudinal study, or for students who have to complete a project before their course ends. In my own case, I had planned a programme of work to take me up to retirement, and so extension of funding would not be a solution.  And if normality resumes only after a vaccine is developed and deployed, this will be a long time off.

So what to do? For some projects, particularly in psychology, online data-gathering may be a solution. Just over the past year, we have been increasingly developing methods for doing this using the Gorilla platform, including various types of language test for adults and children, as well as measures of brain lateralisation. Gorilla has a nice interface that helps new users to develop tasks, and also helps with issues such as compliance with regulations such as GDPR. For regular studies with adults, Gorilla combines nicely with Prolific, a platform for recruitment to online studies. We have had ethics approval for several studies using this approach.

But online testing won't be the solution for everyone, and one proposal I have is that where new data collection is not possible, we need to think hard about getting more value out of existing data. There are several ways we can do this. 
  • First, search for existing datasets that are relevant to your research question. Some big datasets are already well-known. For instance, the CHILDES database for child language samples, ALSPAC for longitudinal data on child development, UK Biobank for genetic and medical data, including brain imaging.  But there are many other sources of data that are relatively underused, but can be found on sites such as the UK Data Archive (social sciences) or Open Science Framework. These are just a few I happen to know about: I'm sure readers will know of many more – and maybe one useful task that underemployed scientists could undertake would be to create directories of existing accessible data on specific topics.
  • Dan Quintana tweeted that this could be a good time to learn about doing meta-analysis, and provided some useful references here.  Meta-analysis is often seen as a rather pedestrian scientific activity, but in my experience it has benefits that go far beyond the generation of a forest plot – it forces you to engage deeply with the literature and become aware of methodological issues that might otherwise be overlooked.
  • Another suggestion is to spend time curating your own existing data from published studies, to make them fully reproducible. A major aspect of the Open Science movement has been to emphasise the importance of being able to reproduce a full analysis, right through from raw data to analyses, tables and figures in a published paper.  A nice introduction by Stodden and colleagues was published in 2016, and there are more references in the slides from her recent webcast from Project Tier.
    I've been trying to adopt this approach in my own lab over the last few years and it's on the one hand much harder than you might think, but on the other hand it is immensely satisfying.  Making code open has a huge advantage beyond making the work reproducible – it also allows others to learn from your scripts, rather than re-invent the wheel when analysing data. But be warned: you will almost certainly find errors in your published work. The main thing is to anticipate that this will happen and to ensure that you correct them and learn from them.

    A major reason why many people don't adopt methods of open, reproducible science is that it takes time. Well, we may be in the weird situation of having time on our hands, and this could be a great way of using it to give a new lease of life to old data.


  1. Yes, there should be fierce competition for the Data Parasite award this year! Also, plenty of time for theory development; Could COVID-19 be the solution to the theory crisis? No, but it should help.

  2. I'll put a vote in for corpus building & descriptive analyses of big datasets [big surprise :D] — one of my lab's corpus projects is in the building stage right now and operates almost entirely online, with just a trip to the library once in awhile. We'll have plenty more time to work on this now that we're not running in-person studies for awhile.