- First, search for existing datasets that are relevant to your research question. Some big datasets are already well-known. For instance, the CHILDES database for child language samples, ALSPAC for longitudinal data on child development, UK Biobank for genetic and medical data, including brain imaging. But there are many other sources of data that are relatively underused, but can be found on sites such as the UK Data Archive (social sciences) or Open Science Framework. These are just a few I happen to know about: I'm sure readers will know of many more – and maybe one useful task that underemployed scientists could undertake would be to create directories of existing accessible data on specific topics.
- Dan Quintana tweeted that this could be a good time to learn about doing meta-analysis, and provided some useful references here. Meta-analysis is often seen as a rather pedestrian scientific activity, but in my experience it has benefits that go far beyond the generation of a forest plot – it forces you to engage deeply with the literature and become aware of methodological issues that might otherwise be overlooked.
- Another suggestion
is to spend time curating your own existing data from published studies, to make
them fully reproducible. A major aspect of the Open Science movement has been
to emphasise the importance of being able to reproduce a full analysis, right
through from raw data to analyses, tables and figures in a published paper. A nice introduction by Stodden and colleagues
was published in 2016, and there are more references in the slides from her recent webcast from Project
I've been trying to adopt this approach in my own lab over the last few years and it's on the one hand much harder than you might think, but on the other hand it is immensely satisfying. Making code open has a huge advantage beyond making the work reproducible – it also allows others to learn from your scripts, rather than re-invent the wheel when analysing data. But be warned: you will almost certainly find errors in your published work. The main thing is to anticipate that this will happen and to ensure that you correct them and learn from them.
A major reason why many people don't adopt methods of open, reproducible science is that it takes time. Well, we may be in the weird situation of having time on our hands, and this could be a great way of using it to give a new lease of life to old data.