Saturday, 9 February 2019

The Paper-in-a-Day Approach


Guest post by
Jennifer L. Tackett
Northwestern University; Personality Across Development Lab

The PiaD approach was borne of a desire to figure out a way, some way, any way, to tackle that ever-growing project list of studies-that-should-get-done-but-never-do. I’m guessing we all have these lists. These are the projects that come up when you’re sitting in a conference talk and lean over to your grad student and whisper (“You know, we actually have the data to test that thing they can’t test, we should do that!”), and your grad student sort of nods a little but also kind of looks like she wants to kill you. Or, you’re sitting in lab meeting talking about ideas, and suddenly shout, “Hey, we totally have data to test that! We should do that! Someone, add it to the list!” and people’s initial look of enthusiasm is quickly replaced by a somewhat sinister side-eye (or perhaps a look of desperation and panic; apparently it depends on who you ask). Essentially, anytime you come up with a project idea and think – Hey, that would be cool, we already have the data, and it wouldn’t be too onerous/lengthy, maybe someone wants to just write that up! – you may have a good PiaD paper.

In other words, the PiaD approach was apparently borne out of a desire to finally get these papers written without my grad students killing me. Seems as reasonable a motivation as any.

The initial idea was simple.

-       You have a project idea that is circumscribed and straightforward.

-       You have data to test the idea.

-       The analyses to do so are not overly complex or novel.

-       The project topic is in an area that everyone in the lab1 is at least somewhat (to very) familiar with.

What would happen if we all locked ourselves in the same room, with no other distractions, for a full day, and worked our tails off? Surely we could write this paper, right?

The answer was: somewhat, and at least sometimes, yes.
But even better were all the things we learned along the way.

We have been scheduling an annual PiaD since 2013. Our process has evolved a LOT along the way. Rather than giving a historical recounting, I thought I would summarize where we are at now – the current working process we have arrived at, and some of the unanticipated benefits and challenges that have come up for us over the years.

Our Current PiaD Approach

Front-End Work:
We write our PiaD papers in the late spring/early summer. Sometime in the fall, we decide as a group what the focus of the PiaD paper will be and who will be first author (see also: benefits and challenges). Then, in the months leading up to PiaD, the first author (and senior author, if not one-and-the-same), take care of some front-end tasks.2 Accomplishing the front-end tasks is essential for making sure we can all hit the ground running on the day of. So, here are the things we do in advance:

1.              Write the present study paragraph: what exactly do we want to do, and why/how? (Now, we write this as a pre/registration! But in the olden days, a thorough present study paragraph would do.)

2.              Run a first pass of the analyses (again, remember – data are already available and analyses are straightforward and familiar).

3.              Populate a literature review folder. We now use a shared reference manager library (Zotero) to facilitate this step and later populating references.

4.              Create a game plan – a list of the target outlet with journal submission guidelines, a list of all the tasks that must be accomplished on the actual DAY, a list of all the people on the team and preliminary assignments. The planning stage of PiaD is key – it can make or break the success of the approach. One aspect of this is being really thoughtful about task assignments. Someone used other data from that sample for a recent study? Put them on the Methods section. Someone used similar analyses in a recent project? Put them on re-running and checking analyses (first pass is always done by the first author in advance; another team member checks syntax and runs a fresh pass on the day. We also have multiple checks built in for examining final output). Someone has expertise in a related literature? Assign them appropriate sections of the Intro/Discussion. You get the idea. Leverage people’s strengths and expertise in the task assignments.

5.              Email a link to a Dropbox folder with all of the above, and attach 2-3 key references, to everyone on the team, a couple of weeks before the DAY. All team members are expected to read the key papers and familiarize themselves with the Dropbox folder ahead of time.

The DAY:
Because this process is pretty intense, and every paper is different, our PiaD DAYs always evolve a bit differently. Here are some key components for us:

1.     Start with coffee.

2.     Then start with the Game Plan. Make sure everyone understands the goal of the paper, the nature of the analyses, and their assigned tasks. Talk through the structure of the Introduction section at a broad level for feedback/discussion.

3.     WORK LIKE THE WIND.

4.     Take a lunch break. Leave where you are. Turn your computers off. Eat some food. For the most part, we tend to talk about the paper. It’s nice for us to have this opportunity to process more openly mid-day, see where people are at, how the paper is shaping up, what else we should be thinking about, etc. The chance for free and open discussion is really important, after being in such a narrow task-focused state.

5.     WORK LIKE THE WIND.

6.     Throughout the working chunks, we are constantly renegotiating the task list. Someone finishes their task more quickly, crosses it off the Game Plan (we use this as an active collaborative document to track our work in real time), and claims the next task they plan to move to.

7.     Although we have a “no distraction” work space3 for PiaD, we absolutely talk to one another throughout the day. This is one of the biggest benefits of PiaD – the ability to ask questions and get immediate answers, to have all the group minds tackling challenges as they arise. It’s a huge time efficiency to work in this way, and absolutely makes end decisions of much higher quality than the typical fragmented writing approach.

8.         Similarly, we have group check-ins about every 1-1.5 hours – where is everyone on their task? What will they move to work on next?

9.         Over the years, some PiaD members have found walks helpful, too. Feeling stuck? Peel someone off to go walk through your stuck-ness with you. Come back fresher and clearer.

10.       About an hour before end time, we take stock – how close are we to meeting our goals? How are things looking when we piece them all together? What tasks are we prioritizing in the final hour, and which will need to go unfinished and added to the back-end work for the first author? Some years, we are wrapping up the submission cover letter at this stage. Other years, we’re realizing we still have tasks to complete after PiaD. Just depends on the nature of the project.

11.       Celebrate. Ideally with some sort of shared beverage of choice. To each their own, but for us, this has often involved bubbles. And an early bedtime.

Jennifer celebrating with Kathleen, Cassie, Avanté, and bubbles


Back-End Work:

This will be different from year-to-year. Obviously, the goal with PiaD is to be done with the manuscript by the end of the day. EVEN WHEN THIS HAPPENS, we never, EVER do final-proofing the same day. We are just way too exhausted. So we usually give ourselves a couple of weeks to freshen up, then do our final proofing before submission. Other years, for a variety of reasons, various tasks remain. That’s just how it goes with manuscript writing. Even in this case, it is fair to say that the overwhelming majority of the work gets done on the DAY. So either way, it’s still a really productive mechanism (for us).

Some Benefits and Challenges

There are many of both. But overall, we have found this to be a really great experience for many reasons beyond actually getting some of these papers out in the world (which we have! Which is so cool!). Some of these benefits for us are:

1.     Bonding as a team. It’s a really great way to strengthen your community, come together in an informal space on a hard but shared problem, and struggle through it together.

2.     A chance to see one another work. This can be incredibly powerful, for example, for junior scholars to observe scientific writing approaches “in the wild”. It never occurred to me before my grad students pointed this out at our first PiaD, but they rarely get to see faculty actually work in this way. And vice versa!

3.     Accuracy, clarity, and error reduction. So many of our smaller errors could likely be avoided if we’re able to ask our whole team of experts our questions WHILE WE’RE WRITING THE PAPER. Real-time answers, group answers, a chance for one group member to correct another, etc. Good stuff.

4.     Enhancing ethical and rigorous practices. The level of accountability when you are all working in the same space at the same time on the same files is probably as good as you can get. How many of our problematic practices might be extinguished if we were always working with others like this?

5.     One of the goals I had with PiaD was to have the first author status rotate across the team – i.e., grad students would “take turns” being first author. I still think this is a great idea, as it’s a great learning experience for advanced grad students to learn how to manage team papers in this way. But, of course, it’s also HARD. So, be more thoughtful about scope of the project depending on seniority of the first author, and anticipate more front- and back-end work, accordingly.

Bottom Line

PiaD has been a really cool mechanism for my lab to work with and learn from over the years. It has brought us many benefits as a team, far beyond increased productivity. But the way it works best for each team is likely different, and tweaking it over time is the way to make it work best for you. I would love to hear more from others who have been trying something similar in their groups, and also want to acknowledge the working team on the approach outlined here: Kat Herzhoff, Kathleen Reardon, Avanté Smack, Cassie Brandes, and Allison Shields.

Footnotes


1For PiaD purposes, I am defining the lab as PI + graduate students.

2Some critics like to counter, well then it’s not really Paper IN A DAY, now is it??? (Nanny-nanny-boo-boo!) Umm.. I guess not? Or maybe we can all remember that time demarcations are arbitrary and just chill out a bit? In all seriousness, if we all lived in the world where our data were perfectly cleaned and organized, all our literature folders were populated and labeled, etc. – maybe the tasks could all be accomplished in a single day. But unfortunately, my lab isn’t that perfect. YET. (Grad students sending me murderous side-eye over the internet.)

3The question of music or no-music is fraught conversational territory. You may need to set these parameters in advance to avoid PiaD turmoil and potential derailment. You may also need your team members to provide definition of current terminology in advance, in order to even have the conversation at all. Whatever you do, DON’T start having conversations about things like “What is Norm-core?” and everyone googling “norm-core”, and then trying to figure out if there is “norm-core music”, and what that might be. It’s a total PiaD break-down at that point.

-->

Saturday, 12 January 2019

NeuroPointDX's blood test for Autism Spectrum Disorder: a critical evaluation

NeuroPointDX (NPDX), a Madison-based biomedical company, is developing blood tests for early diagnosis of Autism Spectrum Disorder (ASD). According to their Facebook page, the NPDX ASD test is available in 45 US states. It does not appear to require FDA approval. On the Payments tab of the website, we learn that the test is currently self-pay (not covered by insurance), but for those who have difficulty meeting the costs, a Payment Plan is available, whereby the test is conducted after a down payment is received, but the results are not disclosed to the referring physician until two further payments have been made.

So what does the test achieve, and what is the evidence behind it?

Claims made for the test
On their website, NPDX describe their test as a 'tool for earlier ASD diagnosis'. Specifically they say:
'It can be difficult to know when to be concerned because kids develop different skills, like walking and talking, at different times. It can be hard to tell if a child is experiencing delayed development that could signal a condition like ASD or is simply developing at a different pace compared to his or her peers...... This is why a biological test, one that’s less susceptible to interpretation, could help doctors diagnose children with ASD at a younger age. The NPDX ASD test was developed for children as young as 18 months old.'
They go on to say:
'In our research of autism spectrum disorder (ASD) and metabolism, we found differences in the metabolic profiles of certain small molecules in the blood of children with ASD. The NPDX ASD test measures a number of molecules in the blood called metabolites and compares them to these metabolic profiles.

The results of our metabolic test provide the ordering physician with information about the child’s metabolism. In some instances, this information may be used to inform more precise treatment. Preliminary research suggests, for example, that adding or removing certain foods or supplements may be beneficial for some of these children. NeuroPointDX is working on further studies to explore this.

The NPDX ASD test can identify about 30% of children with autism spectrum disorder with an increased risk of an ASD diagnosis. This means that three in 10 kids with autism spectrum disorder could receive an earlier diagnosis, get interventions sooner, and potentially receive more precise treatment suggestions from their doctors, based on information about their own metabolism.'
They further state that this is:  'A new approach to thinking about ASD that has been rigorously validated in a large clinical study' and they note that results from their Children’s Autism Metabolome Project (CAMP) study have been 'published in a peer-reviewed, highly-regarded journal, Biological Psychiatry'.

The test is recommended for a child who:
  • Has failed screening for developmental milestones indicating risk for ASD (e.g. M-CHAT, ASQ-3, PEDS, STAT, etc.). 
  • Has a family history such as a sibling diagnosed with ASD. 
  • Has an ASD diagnosis for whom additional metabolic information may provide insight into the child’s condition and therapy.
In September, Xconomy, which reports on biotech developments, ran an interview with Stemina CEO and co-founder Elizabeth Donley, which gives more background, noting that the test is not intended as a general population screen, but rather as a way of identifying specific subtypes among children with developmental delay.

Where are the non-autistic children with developmental delay? 
I looked at the published paper from the CAMP study in Biological Psychiatry.

Given the recommendations made by NPDX, I had expected that the study would involve comparison of children with developmental delay to compare metabolomic profiles in those who did and did not subsequently meet diagnostic criteria for ASD.

However, what I found instead was a study that compared metabolomics in 516 children with a certified diagnosis of ASD and 164 typically-developing children. There was a striking difference between the two groups in 'developmental quotient (DQ)', which is an index of overall developmental level. The mean DQ for the ASD group was 62.8 (SD = 17.8), whereas that of the typically developing comparison group was 100.1 (SD = 16.5). This information can be found in Supplementary Materials Table 3.

It is not possible, using this study design, to use metabolomic results to distinguish children with ASD from other cases of developmental delay. To do that, we'd need a comparison sample of non-autistic children with developmental delay.

The CAMP study is registered on ClinicalTrials.gov, where it is described as follows:
'The purpose of this study is to identify a metabolite signature in blood plasma and/or urine using a panel of biomarker metabolites that differentiate children with autism spectrum disorder (ASD) from children with delayed development (DD) and/or typical development (TD), to develop an algorithm that maximizes sensitivity and specificity of the biomarker profile, and to evaluate the algorithm as a diagnostic tool.' (My emphasis)
The study is also included on the NIH Project Reporter portfolio, where the description includes the following information:
'Stemina seeks funding to enroll 1500 patients in a well-defined clinical study to develop a biomarker-based diagnostic test capable of classifying ASD relative to other developmental delays at greater than 80% accuracy. In addition, we propose to identify metabolic subtypes present within the ASD spectrum that can be used for personalized treatment. The study will include ASD, DD and TD children between 18 and 48 months of age. Inclusion of DD patients is a novel and important aspect of this proposed study from the perspective of a commercially available diagnostic test.' (My emphasis)
So, the authors were aware that it was important to include a group with developmental delay, but they then reported no data on this group. Such children are difficult to recruit, especially for a study involving invasive procedures, and it is not unusual for studies to fail to meet recruitment goals. That is understandable. But it is not understandable that the test should then be described as being useful for diagnosing ASD from within a population with developmental delay, when it has not been validated for that purpose.

Is the test more accurate than behavioural diagnostic tests? 
A puzzling aspect of the NPDX claims is a footnote (marked *) on this webpage:
'Our test looks for certain metabolic imbalances that have been identified through our clinical study to be associated with ASD. When we detect one or more imbalance(s), there is an increased risk that the child will receive an ASD diagnosis'
*Compared to the results of the ADOS-2 (Autism Diagnostic Observation Schedule), Second Edition
It's not clear exactly what is meant by this: it sounds as though the claim is that the blood test is more accurate than ADOS-2. That can't be right, though, because in the CAMP study, we are told: 'The Autism Diagnostic Observation Schedule–Second Version (ADOS-2) was performed by research-reliable clinicians to confirm an ASD diagnosis.' So all the ASD children in the study met ADOS-2 criteria. It looks like 'compared to' means 'based on' in this context, but it is then unclear what the 'increased risk' refers to.

How reliable is the test?
A test's validity depends crucially on its reliability: if a blood test gives different results on different occasions, then it cannot be used for diagnosis of a long-term condition. Presumably because of this, the account of the study on ClinicalTrials.gov states: 'A subset of the subjects will be asked to return to the clinic 30-60 days later to obtain a replicate metabolic profile.' Yet no data on this replicate sample is reported in the Biological Psychiatry paper.

I have no expertise in metabolomics, but it seems reasonable to suppose that amines measured in the blood may vary from one occasion to another; indeed in 2014 the authors published a preliminary report on a smaller sample from CAMP, where they specifically noted that, presumably to minimise impact of medication or special diets, blood samples were taken when the child was fasting and prior to morning administration of medication. (34% of the ASD group and 10% of the typically-developing group were on regular medication, and 19% of the ASD group were on gluten and/or casein-free diets).

I contacted the authors to ask for information on this point. They did not provide any data on test-retest reliability beyond stating:
Thirty one CAMP subjects were recruited at random for a test-retest analysis during CAMP. These subjects were all amino acid dysregulation metabotype negative at the initial time point (used in the analysis for the manuscript). The subjects were sampled 30-60 days later for retest analysis. At the second time point the 31 subjects were still metabotype negative. There are plans for additional resampling of a select group of CAMP subjects. These will include metabotype positive individuals.
Thus, we do not currently know whether a positive result on the NPDX ASD test is meaningful, in the sense of being a consistent physiological marker in the individual.

Scientific evaluation of the methods used in the Biological Psychiatry paper 
The Biological Psychiatry paper describing development of the test is highly complex, involving a wide range of statistical methods. In their previous paper with a smaller sample, the authors described thousands of blood markers and claimed that using machine learning methods, they could identify a subset that discriminated the ASD and typically-developing groups with above chance accuracy. However, they noted this finding needed confirmation in a larger sample.

In the 2018 Biological Psychiatry paper, no significant differences were found for measures of metabolite abundance, failing to replicate the 2014 findings. However, further consideration of the data led the authors to concentrate instead on ratios between metabolites. As they noted: 'Ratios can uncover biological properties not evident with individual metabolites and increase the signal when two metabolites with a negative correlation are evaluated.'

Furthermore, they focused on individuals with extreme values for ratio scores, on the grounds that ASD is a heterogeneous condition, and the interest is in identifying subgroups who may have altered metabolism. The basic logic is illustrated in Figure 1 – the idea is to find a cutoff on the distribution which selects a higher proportion of ASD than typical cases. Because 76% of the sample are ASD cases, we would expect to find 76% of cases in the tail of the distribution. However, by exploring different cutoffs, it can be possible to identify a higher proportion. The proportion of ASD cases above a positive cutoff (or below a negative cutoff) is known as the positive predictive value (PPV), and for some of the ratios examined by the researchers, it was over 90%.


Figure 1: Illustrative distributions of z-scores for 4 of the 31 metabolites in ASD and typical group: this plot shows raw levels for metabolites; blue boxes show the numbers falling above or below a cutoff that is set to maximise group differences. The final analysis focused on ratios between metabolites, rather than raw levels. From Figure S2, Smith et al (2018).

This kind of approach readily lends itself to finding spurious 'positive' results, insofar as one is first inspecting the data and then identifying a cutoff that maximises the difference between two groups. It is noteworthy that the metabolites that were selected for consideration in ratio scores were identified on the basis that they showed negative correlations within a subset of the ASD sample (the 'training set'). Accordingly, PPV values from a 'training set' are likely to be biased and will over-estimate group differences. However, to avoid circularity, one can take cutoffs from the training set, and then see how they perform with a new subset of data that was not used to derive the cutoff – the 'test set'. Provided the test set is predetermined prior to any analysis, and totally separate from the training set, then the results with the test set can be regarded as giving a good indication of how the test would perform in a new sample. This is a standard way of approaching this kind of classification problem.

Usually, the PPV for a test set will be less good than for a training set: this is just a logical consequence of the fact that observed differences between groups will involve random noise as well as true population differences, and these will boost the PPV. In the test set, random effects will be different are so are more likely to hinder rather than help prediction, and so PPV will decline. However, in the Biological Psychiatry paper, the PPVs for the test sets were only marginally different from those from the training sets: for the ratios described in Table 1, the mean PPV was .887 (range .806 - .943) for the training set, and mean .880 (range .757 - .975) for the test set.

I wanted to understand this better, and asked the authors for their analysis scripts, so I could reconstruct what they did. Here is the reply I received from Beth Donley:
We would be happy to have a call to discuss the methodology used to arrive at the findings in our paper. Our scripts and the source code they rely on are proprietary and will not be made public unless and until we publish them in a paper of our own. We think it would be more meaningful to have a call to discuss our approach so that you can ask questions and we can provide answers.
My questions were sufficiently technical and complex that this was not going to work, so I provided written questions, to which I received responses. However, although the replies were prompt, they did not really inspire confidence, and, without the scripts I could not check anything.

For instance:
My question: Is there an explanation for why the PPVs are so similar for training and test datasets? Usually you'd expect a drop in PPV in the test dataset if the function was optimised for the training dataset, just because the training threshold would inevitably be capitalising on chance.
Response: We observed this phenomenon, as well, and were surprised by the similarity of the training and test confusion matrix performance metric values. We have no way to know why the metrics were similar between sets. Our best guess is that the demographics of the training and test set of subjects had closely matched demographic and study related variables.
But the demographic similarity between a test and training set is not the main issue here. One thing that crucially determines how close the results will be is the reliability of the metabolomic measure. The lower the test-retest reliability of the measure, the more likely that results from a training set will fail to replicate. So it would be helpful if the authors would report the quantitative data that they have on this question.

If we ignore all the problems, how good is prediction? 
Unfortunately, it is virtually impossible to tell how accurate the test would be in a real-life context. First, we would have to make the assumption that a non-autistic group with developmental delay would be comparable to the typically-developing group. If non-autistic children with developmental delay show metabolomic imbalances, then the test's potential for diagnosis of ASD is compromised. Second, we would have to come up with an estimate of how many children who are given the test will actually have ASD: that's very hard to judge, but let us suppose it may be as high as 50%. Then, for the ratios reported in the Biological Psychiatry paper, we can compute that around 50% to 83% of those testing positive would have ASD. Note that the majority of children with and without ASD won't have scores in the tail of the distribution and will not therefore test positive (see Figure 1). On the NPDX website is is claimed that around 30% of children with ASD test positive: That is hard to square this with the account in Biological Psychiatry which reported 'an altered metabolic phenotype' in 16.7% of those with ASD.

Conflict of interest and need for transparency
The published paper gives a comprehensive COI statement as follows:
AMS, MAL, and REB are employees of, JJK and PRW were employees of, and ELRD is an equity owner in Stemina Biomarker Discovery Inc. AMS, JJK, PRW, MAL, ELRD, and REB are inventors on provisional patent application 62/623,153 titled “Amino Acid Analysis and Autism Subsets” filed January 29, 2018. DGA receives research funding from the National Institutes of Health, the Simons Foundation, and Stemina Biomarker Discovery Inc. He is on the scientific advisory boards of Stemina Biomarker Discovery Inc. and Axial Therapeutics.
It is generally accepted that just because there is COI, this does not invalidate the work: it simply provides a context in which it can be interpreted. The study reported in Biological Psychiatry represents a huge investment of time and money, with research funds contributed from both public and private sources. In the Xconomy interview, it is stated that the research has cost $8 million to date. This kind of work may only be possible to do with involvement of a biotechnology company which is willing to invest funds in the hope of making discoveries that can be commercialised; this is a similar model to drug development.

Where there is a strong commercial interest in the outcome of research, the best way of counteracting negative impressions is for researchers to be as open and transparent as possible. This was not the case with the NPDX study: as described above, there were substantial changes from the registered protocol on ClinicalTrials.gov, not discussed in the paper. The analysis scripts are not available – this means we have to take on trust details of the methods in an area where the devil is in the detail. As Philip Stark has argued, a paper that is long on results but short on methods is more like an advertisement than a research communication: "Science should be ‘show me’, not ‘trust me’; it should be ‘help me if you can’, not ‘catch me if you can’."

Postscript
On 27th December, Biological Psychiatry published correspondence on the Smith et al paper by Kristin Sainani and Steven Goodman from Stanford University. They raised some of the points noted above regarding the lack of predictive utility of the blood test in clinical contexts, the lack of a comparison sample with developmental delay, and the conflict of interest issues. In their response, the authors made the point that they had noted these limitations in their published paper.

References
Sainani, K. L., & Goodman, S. N. (2018). Lack of diagnostic utility of 'amino acid dysregulation metabotypes'. Biological Psychiatry. doi:10.1016/j.biopsych.2018.11.012

Smith, A. M., Donley, E. L. R., Burrier, R. E., King, J. J., & Amaral, D. G. (2018). Reply to: Lack of Diagnostic Utility of “Amino Acid Dysregulation Metabotypes”. Biological Psychiatry. doi: https://doi.org/10.1016/j.biopsych.2018.11.013

Smith, A. M., King, J., J, West, P. R., Ludwig, M. A., Donley, E. L. R., Burrier, R. E., & Amaral, D. G. (2018). Amino acid dysregulation metabotypes: Potential biomarkers for diagnosis and individualized treatment for subtypes of autism spectrum disorder. Biological Psychiatry. doi:https://doi.org/10.1016/j.biopsych.2018.08.016

Stark, P. (2018). Before reproducibiity must come preproducibility. Nature, 557, 613. doi:10.1038/d41586-018-05256-0

West, P. R., Amaral, D. G., Bais, P., Smith, A. M., Egnash, L. A., Ross, M. E., . . . Burrier, R. E. (2014). Metabolomics as a tool for discovery of biomarkers of Autism Spectrum Disorder in the blood plasma of children. PLOS One, 9(11), e112445. doi:  https://doi.org/10.1371/journal.pone.0112445

Wednesday, 24 October 2018

Has the Society for Neuroscience lost its way?

The tl;dr version: The Society for Neuroscience (SfN) makes humongous amounts of money from its journal and meetings, but spends very little on helping its members, while treating overseas researchers with indifference bordering on disdain.

I first became concerned about the Society for Neuroscience back in 2010 when I submitted a paper to the Journal of Neuroscience. The instructions to authors explained that there was a submission fee (at the time about $50). Although I 'd never come across such a practice before, I reckoned it was not a large sum, and so went ahead. The Instructions for Authors explained that there was a veto on citation of unpublished work. I wanted to cite a paper of mine that had been ‘accepted in principle’ but needed minor changes, and I explained this in my cover letter. Nevertheless, the paper was desk-rejected because of this violation. A week later, after the other paper was accepted, I updated the manuscript and resubmitted it, but was told that I had to pay another submission fee. I got pretty grumbly at this point, but given that we'd spent a lot of time formatting the paper for J Neuroscience, I continued with the process. We had an awful editor (the Automaton described here), but excellent reviewers, and the paper was ultimately accepted.

But then we were confronted with the publication fee and charges for using colour figures. These were substantial – I can’t remember the details but it was so much that it turned out cheaper for all the authors to join the SfN, which made us eligible for reduced rates on publication fees. So for one year I became a member of the society. 

The journal’s current policy on fees can be found here.  Basically, the submission fee is now $140, but this is waived if first and last authors are SfN members (at cost of $200 per annum for full members, going down to $150 for postdocs and $70 for postgrads). The publication fee is $1,260 (for members) and $1,890 for non-members, with an extra $2,965 if you want the paper to be made open access.

There are some reductions for those working in resource-restricted countries, but the sums involved are still high enough to act as a deterrent. I used Web of Science to look at country of origin for Journal of Neuroscience papers since 2014, and there’s no sign that those from resource-restricted countries are taking advantage of the magnanimous offer to reduce publication fees by up to 50%.
Countries of origin from papers in Journal of Neuroscience (2014-2018)
The justification given for these fees is that ‘The submission fee covers a portion of the costs associated with peer review’, with the implication that the society is subsidising the other portion of the costs. Yet, when we look at their financial statements (download pdf here), they tell a rather different story. As we can see in the table on p 4, in 2017 the expenses associated with scientific publications came to $4.84 million, whereas the income from this source was $7.09 million.

But maybe the society uses journal income to subsidise other activities that benefit its nearly 36,000 members? That’s a common model in societies I’m involved in. But, no, the same financial report shows that the cost of the annual SfN meeting in 2017 was $9.5 million, but the income was $14.8 million. If we add in other sources of income, such as membership dues, we can start to understand how it is that the net assets of the society increased from $46.6 million in 2016 to $58.7 million in 2017.

This year, SfN has had a new challenge, which is that significant numbers of scientists are being denied visas to attend the annual meeting, as described in this piece by the Canadian Association for Neuroscience. This has led to calls for the annual meeting to be held outside the US in future years. The SfN President has denounced the visa restrictions as a thoroughly bad thing. However, it seems that SfN has not been sympathetic to would-be attendees who joined the society in order to attend the meeting, only to find that they would not be able to do so. I was first alerted to this on Twitter by this tweet:

This attracted a fair bit of adverse publicity for SfN, and just over a week later Chris heard back from the Executive Director of SfN who had explained that whereas they could refund registration fees for those who could not attend, they were not willing refund on membership fees. No doubt for an organisation that is sitting on long-term investments of $71.2 million (see table below), the $70 membership fee for a student is chicken feed. But I suspect it doesn’t feel like that to the student, who has probably also incurred costs for submitting an unsuccessful visa application.

Table from p 8 of SfN Annual Financial Report for 2017; The 'alternative investments' are mostly offshore funds in the Cayman Islands and elsewhere
There appears to be a mismatch between the lofty ideals described in SfN's mission statement and their behaviour. They seem to have lost their way: instead of being an organisation that exists to promote neuroscience and help their members, the members are rather regarded as nothing but a source of income, which is then stashed away in investments. It’s interesting to see that under Desired Outcomes, the Financial Reserve Strategy section of the mission statement has: ‘Strive to achieve end of year financial results that generate net revenues between $500,000 and $1 million in annual net operating surplus.’ That is reasonable and prudent for a large organisation with employees and property. But SfN is not achieving that goal: they are making considerably more money than their own mission statement recommends.

That money could be put to good use. In particular, given SfN’s stated claim of wanting to support neuroscience globally, they could offer grants for scientists in resource-poor countries to buy equipment, pay for research assistants or attend meetings. Quite small sums could be transformational in such a context. As far as I can see, SfN currently offers a few awards, but some of these are paid for by external donations, and, in relation to their huge reserves, the sums are paltry. My impression is that other, much smaller, societies do far more with limited funds than SfN does with its bloated income.

Maybe I’m missing something. I’m no longer a member of SfN, so it’s hard to judge. Are there SfN members out there who think the society does a good job for its membership?


Saturday, 13 October 2018

Working memories: a brief review of Alan Baddeley's memoir

This post was prompted by Tom Hartley, who asked if I would be willing to feature an interview with Alan Baddeley on my blog.  This was excellent timing, as I'd just received a copy of Working Memories from Alan, and had planned to take it on holiday with me. It proved to be a fascinating read. Tom's interview, which you can find here, gives a taster of the content.

The book was of particular interest to me, as Alan played a big role in my career by appointing me to a post I held at the MRC Applied Psychology Unit (APU) from 1991 to 1998, and so I'm familiar with many of the characters and the ideas that he talks about in the book. His work covered a huge range of topics and collaborations, and the book, written at the age of 84, works both as a history of cognitive psychology and as a scientific autobiography.

Younger readers may be encouraged to hear that Alan's early attempts at a career were not very successful, and his career took off only after a harrowing period as a hospital porter and a schoolteacher, followed by a post at the Burden Neurological Institute, studying the effects of alcohol, where his funds were abruptly cut off because of a dispute between his boss and another senior figure. He was relieved to be offered a place at the MRC Applied Psychology Unit (APU) in Cambridge, eventually doing a doctorate there under the supervision of Conrad (whose life I wrote about here), experimenting on memory skills in sailors and postmen.

I had known that Alan's work covered a wide range of areas, but was still surprised to find just how broad his interests were. In particular, I was aware he had done work on memory in divers, but had thought that was just a minor aspect of his interests. That was quite wrong: this was Alan's main research interest over a period of years, where he did a series of studies to determine how far factors like cold, anxiety and air quality during deep dives affected reasoning and memory: questions of considerable interest to the Royal Navy among others.

After periods working at the Universities of Sussex and Stirling, Alan was appointed in 1974 as Director of the MRC APU, where he had a long and distinguished career until his formal retirement in 1995. Under his direction, the Unit flourished, pursuing a much wider range of research, with strong external links. Alan enjoyed working with others, and had collaborations around the world.  After leaving Cambridge,  he took up a research chair at the University of Bristol, before settling at the University of York, where he is currently based.

I was particularly interested in Alan's thoughts on applied versus theoretical research. The original  APU was a kind of institution that I think no longer exists: the staff were expected to apply their research skills to address questions that outside agencies, especially government, were concerned with. The earliest work was focused on topics of importance during wartime: e.g., how could vigilance be maintained by radar operators, who had the tedious task of monitoring a screen for rare but important events. Subsequently, unit staff were concerned with issues affecting efficiency of government operations during peacetime: how could postcodes be designed to be memorable? Was it safe to use a mobile phone while driving? Did lead affect children's cognitive development?  These days, applied problems are often seen as relatively pedestrian, but it is clear that if you take highly intelligent researchers with good experimental skills and pose them this kind of challenge, the work that ensues will not only answer the question, but may also lead to broader theoretical insights.

Although Alan's research included some work with neurological patients, he would definitely call himself a cognitive psychologist, and not a neuroscientist. He notes that his initial enthusiasm for functional brain imaging died down after finding that effects of interest were seldom clearcut and often failed to replicate. His own experimental approaches to evaluate aspects of memory and cognition seemed to throw more light than neuroimaging on deficits experienced by patients.

The book is strongly recommended for anyone interested in the history of psychology. As with all of Alan's writing, it is immensely readable because of his practice of writing books by dictation as he goes on long country walks: this makes for a direct and engaging style. His reflections on the 'cognitive revolution' and its impact on psychology are highly relevant for today's psychologists. As Alan says in the interview "... It's important to know where our ideas come from. It's all too tempting to think that whatever happened in the last two or three years is the cutting edge and that's all you need to know. In fact, it's probably the crest of a breaking wave and what you need to know is where that wave came from."

Saturday, 15 September 2018

An index of neighbourhood advantage from English postcode data


Screenshot from http://dclgapps.communities.gov.uk/imd/idmap.html
Densely packed postcodes appear grey: you need to expand the map to see colours
-->
The Ministry of Housing, Communities and Local Government has a website which provides an ‘index of multiple deprivation’ for every postcode in England.  This is a composite index based on typical income, employment, education, health, crime, housing and living environment for each of 32,844 postcodes in 2015. You can also extract indices for the component factors that contribute to the index, which are explained further here. And there is a fascinating interactive website where you can explore the indices on a map of England.

Researchers have used the index of multiple deprivation as an overall measure of environmental factors that might affect child development, but it has one major drawback. The number that the website gives you is a rank from 1 to 32,844. This means it is not normally distributed, and not easy to interpret. You are also given decile bands, but these are just less precise versions of the ranks – and like ranks, have a rectangular, rather than a normal distribution (with each band containing 10% of the postcodes). If you want to read more about why rectangularly distributed data are problematic, see this earlier blogpost.

I wanted to use this index, but felt it would make sense to convert the ranks into z-scores. This is easily done, as z-scores are simply rescaled proportions. Here’s what you do:

Use the website to convert the postcode to an index of deprivation: in fact, it’s easiest to paste in a list of postcodes and you then get a set of indices for each one, which you can download either as .csv or .xlsx file. The index of multiple deprivation is given in the fifth column.

To illustrate, I put in the street address where I grew up, IG38NP, which corresponds to a multiple deprivation index of 12596.

In Excel, you can just divide the multiple deprivation index by 32844, to get a value of .3835, which you can then convert to a z-score using the NORMSINV function. Or, to do this in one step, if you have your index of multiple deprivation in cell A2, you type
 =normsinv(A2/32844)

This gives a value of -0.296, which is the corresponding z-score. I suggest calling it the ‘neighbourhood advantage score’ – so it’s clear that a high score is good and a low score is bad.

If you are working in R, you can just use the command:
neighbz = qnorm(deprivation_index/depmax)
where neighbz is the neighbourhood advantage score,  depmax has been assigned to 32844 and deprivation_index is the index of multiple deprivation.

Obviously, I’ve presented simplified commands here, but in either Excel or R it is easy to convert a whole set of postcodes in one go.

It is, of course, important to keep in mind that this is a measure of the neighbourhood a person lives in, and not of the characteristics of the individual. Postcode indicators may be misleading in mixed neighbourhoods, e.g. where gentrification has occurred, so rich and poor live side by side. And the different factors contributing to the index may be dissociated. Nevertheless, I think this index can be useful for providing an indication of whether a sample of individuals is representative of the population of England. In psychology studies, volunteers tend to come from more advantaged backgrounds, and this provides one way to quantify this effect.

Sunday, 26 August 2018

Should editors edit reviewers?


How Einstein dealt with peer review: from http://theconversation.com/hate-the-peer-review-process-einstein-did-too-27405

This all started with a tweet from Jesse Shapiro under the #shareyourrejections hashtag:

JS: Reviewer 2: “The best thing these authors [me and @ejalm] could do to benefit this field of study would be to leave the field and never work on this topic again.” Paraphrasing only slightly.

This was quickly followed by another example;
Bill Hanage: #ShareYourRejections “this paper is not suitable for publication in PNAS, or indeed anywhere.”

Now, both of these are similarly damning, but there is an important difference. The first one criticises the authors, the second one criticises the paper. Several people replied to Jesse’s tweet with sympathy, for instance:

Jenny Rohn: My condolences. But Reviewer 2 is shooting him/herself in the foot - most sensible editors will take a referee's opinion less seriously if it's laced with ad hominem attacks.

I took a different tack, though:
DB: A good editor would not relay that comment to the author, and would write to the reviewer to tell them it is inappropriate. I remember doing that when I was an editor - not often, thankfully. And reviewer apologised.

This started an interesting discussion on Twitter:

Ben Jones: I handled papers where a reviewer was similarly vitriolic and ad hominem. I indicated to the reviewer and authors that I thought it was very inappropriate and unprofessional. I’ve always been very reluctant to censor reviewer comments, but maybe should reconsider that view

DB: You're the editor. I think it's entirely appropriate to protect authors from ad hominem and spiteful attacks. As well as preventing unnecessary pain to authors, it helps avoid damage to the reputation of your journal

Chris Chambers: Editing reviews is dangerous ground imo. In this situation, if the remainder of the review contained useful content, I'd either leave the review intact but inform the authors to disregard the ad hom (& separately I'd tell reviewer it's not on) or dump the whole review.

DB: I would inform reviewer, but I don’t think it is part of editor’s job to relay abuse to people, esp. if they are already dealing with pain of rejection.

CC: IMO this sets a dangerous precedent for editing out content that the editor might dislike. I'd prefer to keep reviews unbiased by editorial input or drop them entirely if they're junk. Also, an offensive remark or tone could in some cases be embedded w/i a valid scientific point.

Kate Jeffery: I agree that editing reviewer comments without permission is dodgy but also agree that inappropriate comments should not be passed back to authors. A simple solution is for editor to revise the offending sentence(s) and ask reviewer to approve change. I doubt many would decline.

A middle road was offered by Lisa deBruine:
LdB: My solution is to contact the reviewer if I think something is wrong with their review (in either factual content or professional tone) and ask them to remove or rephrase it before I send it to the authors. I’ve never had one decline (but it doesn’t happen very often).

I was really surprised by how many people felt strongly that the reviewer’s report was in some sense sacrosanct and could and should not be altered. I’ve pondered this further, but am not swayed by the arguments.

I feel strongly that editors should be able to distinguish personal abuse from robust critical comment, and that, far from being inappropriate, it is their duty to remove the former from reviewer reports. And as for Chris’s comment: ‘an offensive remark or tone could in some cases be embedded w/i a valid scientific point’ – the answer is simple. You rewrite to remove the offensive remark; e.g. ‘The authors’ seem clueless about the appropriate way to run a multilevel model’ could be rewritten to ‘The authors should take advice from a statistician about their multilevel model, which is not properly specified’. And to be absolutely clear, I am not talking about editing out comments that are critical of the science, or which the editor happens to disagree with. If a reviewer got something just plain wrong, I’m okay with giving a clear steer in the editor’s letter, e.g.: ‘Reviewer A suggests you include age as a covariate. I notice you have already done that in the analysis on p x, so please ignore that comment.’ I am specifically addressing comments that are made about the authors rather than the content of what they have written. A good editor should find that an easy distinction to make. From the perspective of an author, being called out for getting something wrong is never comfortable: being told you are a useless person because you got something wrong just adds unnecessary pain.

Why do I care about this? It’s not just because I think we should all be kind to each other (though, in general, I think that’s a good idea). There’s a deeper issue at stake here. As editors, we should work to reinforce the idea that personal disputes should have no place in science. Yes, we are all human beings, and often respond with strong emotions to the work of others. I can get infuriated when I review a paper where the authors appear to have been sloppy or stupid. But we all make mistakes, and are good at deluding ourselves. One of the problems when you start out is that you don’t know what you don’t know: I learned a lot from having my errors pointed out by reviewers, but I was far more likely to learn from this process if the reviewer did not adopt a contemptuous attitude. So, as reviewers, we should calm down and self-edit, and not put ad hominem comments in our reviews. Editors can play a role in training reviewers in this respect.

For those who feel uncomfortable with my approach - i.e. edit the review and tell reviewer why you have done so – I would recommend Lisa de Bruine’s solution of raising the issue with the reviewer and asking them to amend their review. Indeed, in today’s world where everything is handled by automated systems, that may be the only way of ensuring that an insulting review does not go to the author (assuming the automated system lets you do that!).

Finally, as everyone agreed that, this this does not seem to be a common problem, so perhaps not worth devoting much space to, but I'm curious to know how other editors respond to this issue.

Monday, 20 August 2018

Matlab vs open source: Costs and benefits to scientists and society

An interesting twitter thread came along yesterday, started by this query from Jan Wessel (@wessel_lab):

Quick thread of (honest) questions for the numerous people on here that subscribe to the position that sharing code in MATLAB ($) is bad open-science practice compared to open source languages (e.g., Python). What should I do as a PI that runs a lab whose entire coding structure is based (publicly shared) MATLAB code? Some say I should learn an open-source language and change my lab’s procedures over to it. But how would that work in practice? 

When I resort to blogging, it’s often because someone has raised a question that has captured my interest because it does not have a simple answer. I have made a Twitter moment to store the rest of Jan’s thread and some of the responses to it, as they raise important points which have broad application.

In part, this is an argument about costs and benefits to the individual scientist and the community. Sometimes these can be aligned, but in this case, they is some conflict, because those who can’t afford Matlab would not be able to run Jan’s code. If he were to move to Python, then anyone would be able to do so.

His argument is that he has invested a lot of time in learning Matlab, has a good understanding of how Matlab code works, and feels competent to advise his trainees in it. Furthermore, he works in the field of EEG, where there are whole packages developed to do the complex analysis involved, and Matlab is the default in this field. So moving to another programming language would not only be a big time sink, but would also make him out of step with the rest of the field.

There was a fair bit of division of opinion in the replies. On the one hand, there were those who thought this was a non-issue. It was far more important to share code than to worry about whether it was written in a proprietary language. And indeed, if you are well-enough supported to be doing EEG research, then it’s likely your lab can afford the licensing costs.

I agree with the first premise: just having the code available can be helpful in understanding how an analysis was done, even if you can’t run it. And certainly, most of those in EEG research are using Matlab. However, I’m also aware that for those in resource-limited countries, EEG is a relatively cheap technology for doing cognitive neuroscience, so I guess there will be those who would be able to get EEG equipment, but for whom the Matlab licensing costs are prohibitive.

But the replies emphasised another point: the landscape is continually changing. People have been encouraging me to learn Python, and I’m resisting only because I’m starting to feel too old to learn yet another programming language. But over the years, I’ve had to learn Basic, Matlab and R, as well as some arcane stuff for generating auditory stimuli whose name I can’t even remember. But I’ve looked at Jan’s photo on the web, and he looks pretty young, so he doesn’t have my excuse. So on that basis, I’d agree with those advising to consider making a switch. Not just to be a good open scientist, but in his own interests, which involves keeping up to date. As some on the thread noted, many undergrads are now getting training in Python or R, and sooner or later open source will become the default.

In the replies there were some helpful suggestions from people who were encouraging Jan to move to open source but in the least painful way possible. And there was reassurance that there are huge savings in learning a new language: it’s really not like going back to square one. That’s my experience: in fact, my knowledge of Basic was surprisingly useful when learning Matlab.

So the bottom line seems to be, don’t beat yourself up about it. Posting Matlab code is far better than not posting any code. But be aware that things are changing, and sooner than later, you’ll need to adapt. The time costs of learning a new language may prove trivial in the long term, against the costs of being out of date. But I can state with total confidence that learning Python will not be the end of it: give it a few years and something else will come along.

When I was first embarking on an academic career, I remember looking at the people who were teaching me, who, at the age of around 40, looked very old indeed. And I thought it must be nice for them, because they have worked hard, learned stuff, and now they know it all and can just do research and teach. When I got to 40, I had the awful realisation that the field was changing so fast, that unless I kept learning new stuff, I would get left behind. And it hasn't stopped over the past 25 years!

Saturday, 11 August 2018

More haste less speed in calls for grant proposals


Helpful advice from the World Bank

This blogpost was prompted by a funding call announced this week by the Economic and Social Research Council (ESRC)  , which included the following key dates:
  • Opening date for proposals – 6 August 2018 
  • Closing date for proposals – 18 September 2018 
  • PI response invited – 23 October 2018 
  • PI response due – 29 October 2018 
  • Panel – 3 December 2018 
  • Grants start – 14 February 2019 
As pointed out by Adam Golberg (@cash4questions), Research Development Manager at Nottingham University, on Twitter, this is very short notice to prepare an application for substantial funding:
I make this about 30 working days notice. For a call issued in August. For projects of 36 months, up to £900k - substantial, for social sciences. With only one bid allowed to be led from each institution, so likely requiring an internal sift. 

I thought it worth raising this with ESRC, and they replied promptly, saying:
To access funds for this call we’ve had to adhere to a very tight spending timeframe. We’ve had to balance the call opening time with a robust peer review process and a Feb 2019 project start. We know this is a challenge, but it was a now or never funding opportunity for us.
 
They suggested I email them for more information, and I’ve done that, so will update this post if I hear more. I’m particularly curious about what is the reason for the tight spending timeframe and the inflexible February 2019 start.

This exchange led to discussion on Twitter which I have gathered together here.

It’s clear that from the responses that this kind of time-frame is not unusual, and I have been sent some other examples. For instance this ESRC Leadership Fellowship (£100,000 for 12 months) had a call for proposals issued on 16th November 2017, with a deadline for submissions of 3 January. When you factor in that most universities shut down from late December until early January, and so this would need to be with administrators before the Christmas break, this gives applicants around 30 days to construct a competitive proposal. But it’s not only ESRC that does this, and I am less interested in pointing the finger at a particular funder – who may well be working under pressures outside their control - than just raising the issue of why this needs a rethink. I see five problems with these short lead times:

1. Poorer quality of proposals 
The most obvious problem is that a hastily written proposal is likely to be weaker than one that is given more detailed consideration. The only good thing you might say about the time pressure is that it is likely to reduce the number of proposals, which reduces the load on the funder’s administration. It’s not clear, however, whether this is an intended consequence.

2. Stress on academic staff 
There is ample evidence that academic staff in the UK have high stress levels, often linked to a sense of increasing demands and high workload. A good academic shows high attention to detail and is at pains to get things right: research is not something that can be done well under tight time pressure. So holding up the offer of a large grant with only a short time period to prepare a proposal is bound to increase stress: do you drop everything else to focus on grant-writing, or pass by the opportunity to enter the competition?

Where the interval between the funding call and the deadline occurs over a holiday period, some might find this beneficial, as other demands such as teaching are lower. But many people plan to take a vacation, and should be able to have a complete escape from work for at least a week or two. Others will have scheduled the time for preparing lectures, doing research, or writing papers. Having to defer those activities in order to meet a tight deadline just induces more sense of overload and guilt at having a growing backlog of work.

3. Equity issues 
These points about vacations are particularly pertinent for those with children at home during the holidays, as pointed out in a series of tweets by Melissa Terras, Professor of Digital Cultural Heritage at Edinburgh University, who said:
I complained once to the AHRC about a call announced in November with a closing date of early January - giving people the chance to work over the Xmas shutdown on it. I wasn't applying to the call myself, but pointed out that it meant people with - say - school age kids - wouldn't have a "clear" Xmas shutdown to work on it, so it was prejudice against that cohort. They listened, apologised, and extended the deadline for a month, which I was thankful for. But we shouldn't have to explain this to them. Have RCUK done their implicit bias training?

4. Stress on administrative staff 
One person who contacted me via email pointed out that many funders, including ESRC, ask institutions to filter out uncompetitive proposals through internal review. That could mean senior research administrators organising exploratory workshops, soliciting input from potential PIs, having people present their ideas, and considering collaborations with other institutions. None of that will be possible in a 30-day time frame. And for the administrators who do the routine work of checking grants for accuracy of funding bids and compliance with university and funder requirements, I suspect it’s not unusual to be dealing with a stressed researcher who expects them to do all of this with rapid turnaround, but where the funding scheme virtually guarantees everything is done in a rush, this just gets worse.

5. Perception of unfairness 
Adding in to this toxic mix, we have the possibility of diminished trust in the funding process. My own interest in this issues stems from a time a few years ago when there was a funding call for a rather specific project in my area. The call came just before Christmas, with a deadline in mid January. I had a postdoc who was interested in applying, but after discussing it, we decided not to put in a bid. Part of the reason was that we had both planned a bit of time off over Christmas, but in addition I was suspicious about the combination of short time-scale and specific topic. This made me wonder whether a decision had already been made about who to award the funds to, and the exercise was just to fulfil requirements and give an illusion of fairness and transparency.

Responses on Twitter again indicate that others have had similar concerns. For instance, Jon May, Professor in Psychology at the University of Plymouth, wrote:
I suspect these short deadline calls follow ‘sandboxes’ where a favoured person has invited their (i.e his) friends to pitch ideas for the call. Favoured person cannot bid but friends can and have written the call.
 
And an anonymous correspondent on email noted:
I think unfairness (or the perception of unfairness) is really dangerous – a lot of people I talk to either suspect a stitch-up in terms of who gets the money, or an uneven playing field in terms of who knew this was coming.

So what’s the solution? One option would be to insist that, at least for those dispensing public money, there should be a minimum time between a call for proposals and the submission date: about 3 months would seem reasonable to me.

Comments will be open on this post for a limited time (2 months, since we are in holiday season!) so please add your thoughts.

P.S. Just as I was about to upload this blogpost, I was alerted on Twitter to this call from the World Bank, which is a beautiful illustration of point 5 - if you weren't already well aware this was coming, there would be no hope of applying. Apparently, this is not a 'grant' but a 'contract', but the same problems noted above would apply. The website is dated 2nd August, the closing date is 15th August. There is reference to a webinar for applicants dated 9th July, so presumably some information has been previously circulated, but still with a remarkably short time lag, given that there need to be at least two collaborating institutions (including middle- and low-income countries)
, with letters of support from all collaborators and all end users. Oh, and you are advised ‘Please do not wait until the last minute to submit your proposal’.


Update: 17th August 2018
An ESRC spokesperson sent this reply to my query:

Thank you for getting in touch with us with your concerns about the short call opening time for the recently announced Management Practices and Employee Engagement call, and the fact that it has opened in August.

We welcome feedback from our community on the administration of funding programmes, and we will think carefully about how to respond to these concerns as we design and plan future programmes.

To provide some background to this call. It builds on an open-invite scoping workshop we held in February 2018, at which we sought input from the academic, policy and third-sector communities on the shape of a (then) potential research investment on management practices and employee engagement. We subsequently flagged the likelihood of a funding call around the topic area this summer, both at the scoping workshop itself, as well as in our ongoing engagements with the academic community.

We do our best to make sure that calls are open for as long as possible. We have to balance call opening times with a robust and appropriately timetabled peer review process, feasible project start dates, the right safeguards and compliances, and, in certain cases such as this one, a requirement to spend funds within the financial year. 

We take the concerns that you raise in your email and in your blog post of 11 August 2018 extremely seriously. The high standard of the UK's research is a result of the work of our academic community, and we are committed to delivering a system that respects and responds to their needs. As part of this, we are actively looking into ways to build in longer call lead times and/or pre-announcements of funding opportunities for potential future managed calls in this and other areas.

I would also like to stress that applicants can still submit proposals on the topic of management practices and employee engagement through our standard research grant process, which is open all year round. The peer review system and the Grant Assessment Panel does not take into account the fact that a managed call is open on a topic when awarding funding: decisions are taken based on the excellence of the proposal.

Update: 23rd August 2018
A spokesperson for the World Bank has written to note that the grant scheme alluded to in my postscript did in fact have a 2 month period between the call and submission date. I have apologised to them for suggesting it was shorter than this, and also apologise to readers for providing misleading information. The duration still seems short to me for a call of this nature, but my case is clearly not helped by providing wrong information, and I should have taken greater care to check details. Text of the response from the World Bank is below:
 
We noticed with some concern that in your Aug. 11 blog post, you had singled out a World Bank call for proposals as a “beautiful illustration” of a type of funding call that appears designed to favor an inside candidate. This characterization is entirely inaccurate and appears based on a misperception of the time lag between the announcement of the proposal and the deadline.
Your reference to the 2018 Call for Proposals for Collaborative Data Innovations for Sustainable Development by the World Bank and the Global Partnership for Sustainable Development Data as undermining faith in the funding process seems based on the mistaken assumption that the call was issued on or about August 2. It was not.
The call was announced June 19 on the websites of the World Bank and the GPSDD. This was two months before the closing date, a period we have deemed fair to applicants but also appropriate given our own time constraints. An online seminar was offered to assist prospective applicants, as you note, on July 9.
The seminar drew 127 attendees for whom we provided answers to 147 questions. We are still reviewing submissions for the most recent call for proposals for this project, but our call for the 2017 version elicited 228 proposals, of which 195 met criteria for external review.
As the response to the seminar and the record of submissions indicate, this funding call has been widely seen and provided numerous applicants the opportunity to respond.  To suggest that this has not been an open and fair process does not do it justice.

Here are the links with the announcement dates of June 19th

Friday, 20 July 2018

Standing on the shoulders of giants, or slithering around on jellyfish: Why reviews need to be systematic

Yesterday I had the pleasure of hearing George Davey Smith (aka @mendel_random) talk. In the course of a wide-ranging lecture he recounted his experiences with conducting a systematic review. This caught my interest, as I’d recently considered the question of literature reviews when writing about fallibility in science. George’s talk confirmed my concerns that cherry-picking of evidence can be a massive problem for many fields of science.

Together with Mark Petticrew, George had reviewed the evidence on the impact of stress and social hierarchies on coronary artery disease in non-human primates. They found 14 studies on the topic, and revealed a striking mismatch between how the literature was cited and what it actually showed. Studies in this area are of interest to those attempting to explain the well-known socioeconomic gradient in health. It’s hard to unpack this in humans, because there are so many correlated characteristics that could potentially explain the association. The primate work has been cited to support psychosocial accounts of the link; i.e., the idea that socioeconomic influences on health operate primarily through psychological and social mechanisms. Demonstration of such an impact in primates is  particularly convincing, because stress and social status can be experimentally manipulated in a way that is not feasible in humans.

The conclusion from the review was stark: ‘Overall, non-human primate studies present only limited evidence for an association between social status and coronary artery disease. Despite this, there is selective citation of individual non-human primate studies in reviews and commentaries relating to human disease aetiology’(p. e27937).

The relatively bland account in the written paper belies the stress that George and his colleague went through in doing this work. Before I tried doing one myself, I thought that a systematic review was a fairly easy and humdrum exercise. It could be if the literature were not so unruly. In practice, however, you not only have to find and synthesise the relevant evidence, but also to read and re-read papers to work out what exactly was done. Often, it’s not just a case of computing an effect size: finding the numbers that match the reported result can be challenging. One paper in the review that was particularly highly-cited in the epidemiology literature turned out to have data that were problematic: the raw data shown in scattergraphs are hard to reconcile with the adjusted means reported in a summary (see Figure below). Correspondence sent to the author apparently did not achieve a reply, let alone an explanation.

Figure 2 from Shively and Thompson (1994) Arteriosclerosis and Thrombosis Vol 14, No 5. Yellow bar added to show mean plaque areas as reported in Figure 3 (adjusted for preexperimental thigh circumference and TPC-HDL cholesterol ratio)
Even if there were no concerns about the discrepant means, the small sample size and influential outliers in this study should temper any conclusions. But those using this evidence to draw conclusions about human health focused on the ‘five-fold increase’ in coronary disease in dominant animals who became subordinate.

So what impact has the systematic review achieved? Well, the first point to note is that the authors had a great deal of difficulty getting it accepted for publication: it would be sent to reviewers who worked on stress in monkeys, and they would recommend rejection. This went on for some years: the abstract was first published in 2003, but the full paper did not appear until 2012.

The second, disappointing conclusion comes from looking at citations of the original studies reviewed by Petticrew and Davey Smith in the human health literature since their review appeared. The systematic review garnered 4 citations in the period 2013-2015 and just one during 2016-2018. The mean citations for the 14 articles covered in their meta-analysis was 2.36 for 2013-2015, and 3.00 for 2016-2018. The article that was the source of the Figure above had six citations in the human health literature in 2013-2015 and four in 2016-2018. These numbers aren’t sufficient for more than impressionistic interpretation, and I only did a superficial trawl through abstracts of citing papers, so I am not in a position to determine if all of these articles accepted the study authors’ conclusions. However, the pattern of citations fits with past experience in other fields showing that when cherry-picked facts fit a nice story, they will continue to be cited, without regard to subsequent corrections,  criticism or even retraction.

The reason why this worries me is that the stark conclusion would appear to be that we can’t trust citations of the research literature unless they are based on well-conducted systematic reviews. Iain Chalmers has been saying this for years, and in his field of clinical trials these are more common than in other disciplines. But there are still many fields where it is seen as entirely appropriate to write an introduction to a paper that only cites supportive evidence and ignores a swathe of literature that shows null or opposite results. Most postgraduates have an initial thesis chapter that reviews the literature, but it's rare, at least in psychology, to see a systematic review - perhaps because this is so time-consuming and can be soul-destroying. But if we continue to cherry-pick evidence that suits us, then we are not so much standing on the shoulders of giants as slithering around on jellyfish, and science will not progress.