Monday, 26 May 2014

Data sharing: Exciting but scary


Yesterday I did something I've never done before in many  years of publishing. When I submitted a revised manuscript of a research report to a journal, I also posted the dataset on the web, together with the script I'd used to extract the summary results. It was exciting. It felt as if I was part of a scientific revolution that has been gathering pace over the past two or three years, which culminated in adoption of a data policy by PLOS journals last February. This specified that authors were required to make the data underlying their scientific findings available publicly immediately upon publication of the article. As it happens, my paper is not submitted to PLOS, and so I'm not obliged to do this, but I wanted to, having considered the pros and cons. My decision was also influenced by the Wellcome Trust, who fund my work and encourage data sharing.

The benefits are potentially huge. People usually think about the value to other researchers, who may be able to extract useful information from your data, and there's no doubt this is a factor.  Particularly with large datasets, it's often the case that researchers only use a subset of the data, and so valuable information is squandered and may be lost forever.  More than once I've had someone ask me for an old dataset, only to find it is inaccessible, because it was stored on a floppy disk or an ancient, non-networked computer and so is no longer readable.  Even if you think that you've extracted all you can from a dataset, it may still be worth preserving for potential inclusion in future meta-analyses.

Another value of open data is less often emphasised: when you share data you are forced to ensure it is accurate and properly documented. I enjoy data analysis, but I'm not naturally well-disciplined about keeping everything tidy and well-organised. I've been alarmed on occasion to return to a dataset and find I have no idea what some of the variables are, because I failed to document them properly.  If I know the world at large will see my dataset then I won't want to be embarrassed by it, and so I will take more care to keep it neat and tidy with everything clearly labelled. This can only be good.

But here's the scary thing. data sharing exposes researchers to the risk of being found out to be sloppy or inaccurate. To my horror, shortly before I posted my dataset on the internet yesterday I found I'd made a mistake in the calculation of one of my variables. It was a silly error, caused by basing a computation on the wrong column of data. Fortunately, it did not have a serious effect on my paper, though I did have to go through redoing all the tables and making some changes to the text.  But it seemed like pure chance that I picked up on this error – I could very easily have posted the dataset on the internet with the error still there. And it was an error that would have been detected by anyone eagle-eyed enough to look at the numbers carefully.  Needless to say, I'm nervous that there may well be other errors in there that I did not pick up. But at least it's not as bad as an apocryphal case of a distinguished research group whose dramatic (and published) results arose because someone forgot to designate 9 as a missing value code. When I heard about that I shuddered, as I could see how easily it could happen.

This is why Open Data is both important for science but difficult for scientists. In the past, I've found mistakes in my datasets, but this has been a private experience.  To date, as far as I am aware, no serious errors have got into my published papers – though I did have another close shave last year when I found a wrongly-reported set of means at the proofs stage, and there have been a couple of instances where minor errata have had to be published. But the one thing I've learned as I wiped the egg off my face is that error is inevitable and unavoidable, however careful you try to be. The best way to flush out these errors is to make the data public. This will inevitably lead to some embarrassment when mistakes are found, but at the end of the day, our goal must be to find out what is the case, rather than to save face.

I'm aware that not everyone agrees with me on this. There are concerns that open data sharing could lead to scientists getting scooped, will take up too much time, and could be used to impose ever more draconian regulation on beleaguered scientists: as DrugMonkey memorably put it:  "Data depository obsession gets us a little closer to home because the psychotics are the Open Access Eleventy waccaloons who, presumably, started out as nice, normal, reasonable scientists." But I think this misses the point. Drug Monkey seems to think this is all about imposing regulations to prevent fraud and other dubious practices.  I don't think this is so. The counter-arguments were well articulated in a blogpost by Tal Yarkoni. In brief, it's about moving to a point where it is accepted practice to make data publicly available, to improve scientific transparency, accuracy and collaboration. 

Sunday, 11 May 2014

Changing the landscape of psychiatric research:

What will the RDoC initiative by NIMH achieve?


There's a lot wrong with current psychiatric classification. Every few years, the American Psychiatric Association comes up with a new set of labels and diagnostic criteria, but whereas the Diagnostic and Statistical Manual used to be seen as some kind of Bible for psychiatrists, the latest version, DSM5, has been greeted with hostility and derision. The number of diagnostic categories keeps multiplying without any commensurate increase in the evidence base to validate the categories. It has been argued that vested interests from pharmaceutical companies create pressures to medicalise normality so that everyone will sooner or later have a diagnosis (Frances, 2013). And even excluding such conflict of interest, there are concerns that such well-known categories as schizophrenia and depression lack reliability and validity (Kendell & Jablensky, 2003).

In 2013, Tom Insel, Director of the US funding agency, National Institute of Mental Health (NIMH), created a stir with a blogpost in which he criticised the DSM5 and laid out the vision of a new Research Domain Criteria (RDoC) project. This aimed "to transform diagnosis by incorporating genetics, imaging, cognitive science, and other levels of information to lay the foundation for a new classification system."

He drew parallels with physical medicine, where diagnosis is not made purely on the basis of symptoms, but also uses measures of underlying physiological function that help distinguish between conditions and indicate the most appropriate treatment. This, he argued, should be the goal of psychiatry, to go beyond presenting symptoms to underlying causes, reconceptualising disorders in terms of neural systems.

This has, of course, been a goal for many researchers for several years, but Insel expressed frustration at the lack of progress, noting that at present: "We cannot design a system based on biomarkers or cognitive performance because we lack the data". That being the case, he argued, a priority for NIMH should be to create a framework for collecting relevant data. This would entail casting aside conventional psychiatric diagnoses, working with dimensions rather than categories, and establishing links between genetic, neural and behavioural levels of description.

This represents a massive shift in research funding strategy, and some are uneasy about it. Nobody, as far as I am aware, is keen to defend the status quo, as represented by DSM.  As Insel remarked in his blogpost: "Patients with mental disorders deserve better". The issue is whether RDoC is going to make things any better. I see five big problems.

1. McLaren (2011) is among those querying the assumption that mental illnesses are 'disorders of brain circuits'. The goal of the RDoC program is to fill in a huge matrix with new research findings. The rows of the matrix are not the traditional diagnostic categories: instead they are five research domains: Negative Valence Systems, Positive Valence Systems, Cognitive Systems, Systems for Social Processes, Arousal/Regulatory Systems, each of which has subdivisions: e.g. Cognitive Systems is broken down into Attention, Perception, Working memory, Declarative memory, Language behavior and Cognitive (effortful) control. The columns of the matrix are Genes, Molecules, Cells, Circuits, Physiology, Behavior, Self-Reports, and Paradigms. Strikingly absent is anything about experience or environment.

This seems symptomatic of our age. I remember sitting through a conference presentation about a study investigating whether brain measures could predict response to cognitive behaviour therapy in depression.  OK, it's possible that they might, but what surprised me was that no measures of past life events or current social circumstances were included in the study. My intuitions may be wrong, but it would seem that these factors are likely to play a role. My impression is that some of the more successful interventions developed in recent years are based not on neurobiology or genetics, but on a detailed analysis of the phenomenology of mental illness, as illustrated, for example, by the work of my colleagues David Clark and Anke Ehlers. Consideration of such factors is strikingly absent from RDoC.

 2. The goal of the RDoC is ultimately to help patients, but the link with intervention is unclear. Suppose I become increasingly obsessed with checking electrical switches, such that I am unable to function in my job. Thanks to the RDoC program, I'm found to have a dysfunctional neural circuit. Presumably the benefit of this is that I could be given a new pharmacological intervention targeting that circuit, which will make me less obsessive. But how long will I stay on the drug? It's not given me any way to cope with the tendency of checking the unwanted thoughts that obtrude into my consciousness, and they are likely to recur when I come off it.  I'm not opposed to pharmacological interventions in principle, but they tend not to have a 'stop rule'. 

There are psychological interventions that tackle the symptoms and the cognitive processes that underlie them more directly.  Could better knowledge of neurobiological correlates help develop more of these?  I guess it is possible, but my overall sense is that this translational potential is exaggerated – just as with the current hype around 'educational neuroscience'. The RDoC program embodies a mistaken belief that neuroscientific research is inherently better than psychological research because it deals with primary causes, when in fact it cannot capture key clinical phenomena. For instance, the distinction between a compulsive hand-washer and a compulsive checker is unlikely to have a clear brain correlate, yet we need to know about the specific symptoms of the individual to help them overcome them.

3. Those proposing RDoC appear to have a naive view of the potential of genetics to inform psychiatry.  It's worth quoting in detail from their vision of the kinds of study that would be encouraged by NIMH, as stated here:

Recent studies have shown that a number of genes reported to confer risk for schizophrenia, such as DISC1 (“Disrupted in schizophrenia”) and neuregulin, actually appear to be similar in risk for unipolar and bipolar mood disorders. ... Thus, in one potential design, inclusion criteria might simply consist of all patients seen for evaluation at a psychotic disorders treatment unit. The independent variable might comprise two groups of patients: One group would be positive and the other negative for one or more risk gene configurations (SNP or CNV), with the groups matched on demographics such as age, sex, and education. Dependent variables could be responses to a set of cognitive paradigms, and clinical status on a variety of symptom measures. Analyses would be conducted to compare the pattern of differences in responses to the cognitive or emotional tasks in patients who are positive and negative for the risk configurations.

This sounds to me like a recipe for wasting a huge amount of research funding. The effect sizes of most behavioural/cognitive genetic associations are tiny and so one would need an enormous sample size to see differences related to genotype. Coupled with an open-ended search for differences between genotypes on a battery of cognitive measures, this would undoubtedly generate some 'significant' results which could go on to mislead the field for some time before a failure to replicate was achieved (cf. Munafò, & Gage, 2013).

The NIMH website notes that "the current diagnostic system is not informed by recent breakthroughs in genetics". There is good reason for that: to date, the genetic findings have been disappointing. Such associations as are found either indicate extremely rare and heterogeneous mutations of large effect and/or involve common genetic variants whose small effects are not of clinical significance. We cannot know what the future holds, but to date talk of 'breakthroughs' is misleading.

4. Some of the entries in the RDoC matrix also suggest a lack of appreciation of the difference between studying individual differences versus group effects.  The RDoC program is focused on understanding individual differences. That requires particularly stringent criteria for measures, which need to be adequately reliable, valid and sensitive to pick up differences between people.  I appreciate that the published RDoC matrices are seen as a starting-point and not as definitive, but I would recommend that more thought goes into establishing the psychometric credibility of measures before embarking on expensive studies looking for correlations between genes, brains and behaviour. If the rank ordering of a group of people on a measure is not the same from one occasion to another, or if there are substantial floor or ceiling effects, that measure is not going to be much use as an indicator of an underlying construct. Furthermore, if different versions of a task that are supposed to tap into a single construct give different patterns of results, then we need a rethink – see e.g. Foti et al, 2013; Shilling et al, 2013, for examples.  Such considerations are often ignored by those attempting to move experimental work into a translational phase. If we are really to achieve 'precision medicine' we need precise measures.

5. The matrix as it stands does not give much confidence that the RDoC approach will give clearer gene-brain-behaviour links than traditional psychiatric categories.

For instance, BDNF appears in the Gene column of the matrix for the constructs of acute threat, auditory perception, declarative memory, goal selection, and response selection. COMT appears with threat, loss, frustrative nonreward, reward learning, goal selection, response selection and reception of facial communication. Of course, it's early days. The whole purpose of the enterprise is to flesh out the matrix with more detailed and accurate information. Nevertheless, the attempts at summarising what is known to date do not inspire confidence that this goal will be achieved.

After such a list of objections to RDoC, I do have one good thing to say about it, which is that it appears to be encouraging and embracing data-sharing and open science. This will be an important advance that may help us find out more quickly which avenues are worth exploring and which are cul-de-sacs. I suspect we will find out some useful things from the RDoC project: I just have reservations as to whether they will be of any benefit to psychiatry, or more importantly, to psychiatric patients.

Foti, D., Kotov, R., & Hajcak, G. (2013). Psychometric considerations in using error-related brain activity as a biomarker in psychotic disorders. Journal of Abnormal Psychology, 122(2), 520-531. doi: 10.1037/a0032618

Frances, A. (2013). Saving normal: An insider's revolt against out-of-control psychiatric diagnosis, DSM-5, big pharma, and the medicalization of ordinary life. New York: HarperCollins.

Kendell, R., & Jablensky, A. (2003). Distinguishing between the validity and utility of psychiatric diagnoses. American Journal of Psychiatry, 160, 4-12.

McLaren, N. (2011). Cells, Circuits, and Syndromes: A Critical Commentary on the NIMH Research Domain Criteria Project Ethical Human Psychology and Psychiatry, 13 (3), 229-236 DOI: 10.1891/1559-4343.13.3.229

Munafò, M. R., & Gage, S. H. (2013). Improving the reliability and reporting of genetic association studies. Drug and Alcohol Dependence(0). doi:

Shilling, V. M., Chetwynd, A., & Rabbitt, P. M. A. (2002). Individual inconsistency across measures of inhibition: an investigation of the construct validity of inhibition in older adults. Neuropsychologia, 40, 605-619.

This article (Figshare version) can be cited as:
 Bishop, Dorothy V M (2014): Changing the landscape of psychiatric research: What will the RDoC initiative by NIMH achieve?. figshare.  

P.S.8th October 2015. 
RDoC is in the news again, leading Jon Roiser to send me a tweet asking whether my views expressed re social factors were just intuitions or evidence-based. That's a good question, given the importance I attach to evidence. So is there any evidence that past life events or current social situation predict response to intervention in depression? 
I have to confess I am not an expert in this area. My views are largely formed from what I learned years ago when training as a clinical psychologist, when research by Brown and Harris showed life events were potent predictors of depression:

Brown, G.W. & Harris, T.O. (1978). Social origins of depression: A study of psychiatric disorder in women. London: Tavistock. 

These studies were not on intervention, but it does seem plausible that the same factors that are associated with initial onset will also influence response to intervention. Thus it seems reasonable that it would be harder to treat someone's depression if they are still experiencing the factors that led to the initial depression, e.g. living in an abusive relationship, coping with the death of a loved one, or experiencing financial stress.
In response to Jon's query, I did a small trawl through recent articles in Web of Science; I have only looked at abstracts for these, so don't know how good quality the evidence is, but the general impression is that social factors and life events are still regarded as important factors in the etiology of depression - and therefore might also be expected to influence response to intervention. Here's a handful of papers:

Colman, I., Zeng, Y., McMartin, S. E., Naicker, K., Ataullahjan, A., Weeks, M., . . . Galambos, N. L. (2014). Protective factors against depression during the transition from adolescence to adulthood: Findings from a national Canadian cohort. Preventive Medicine, 65, 28-32. doi: 10.1016/j.ypmed.2014.04.008

Cwik, M., Barlow, A., Tingey, L., Goklish, N., Larzelere-Hinton, F., Craig, M., & Walkup, J. T. (2015). Exploring Risk and Protective Factors with a Community Sample of American Indian Adolescents Who Attempted Suicide. Archives of Suicide Research, 19(2), 172-189. doi: 10.1080/13811118.2015.1004472
Dour, H. J., Wiley, J. F., Roy-Byrne, P., Stein, M. B., Sullivan, G., Sherbourne, C. D., . . . Craske, M. G. (2014). Perceived social support mediates anxiety and depressive symptom changes following primary care intervention. Depression and Anxiety, 31(5), 436-442. doi: 10.1002/da.22216
Kemner, S. M., Mesman, E., Nolen, W. A., Eijckemans, M. J. C., & Hillegers, M. H. J. (2015). The role of life events and psychological factors in the onset of first and recurrent mood episodes in bipolar offspring: results from the Dutch Bipolar Offspring Study. Psychological Medicine, 45(12), 2571-2581. doi: 10.1017/s0033291715000495
Sheidow, A. J., Henry, D. B., Tolan, P. H., & Strachan, M. K. (2014). The Role of Stress Exposure and Family Functioning in Internalizing Outcomes of Urban Families. Journal of Child and Family Studies, 23(8), 1351-1365. doi: 10.1007/s10826-013-9793-3

I'd be happy to consider alternative evidence, but my view is that if we want to look at brain or gene predictors, we'd do well to also assess life events and social factors - things that are relatively easy to measure, might explain a significant proportion of variance, and could also possibly provide a mechanism to account for neurobiological findings.