Saturday 22 December 2012

Genes, brains and lateralisation: how solid is the evidence?

If there were a dictionary of famous neurological quotes, “Nous parlons avec l'hémisphère gauche” by Paul Broca (1865) would be up there among the top hits. Broca’s realisation that the two sides of the brain are functionally distinct was a landmark observation. It was based on a rather small series of patients, but has since been confirmed in numerous studies. After localised brain injury, aphasia (language impairment) is far more likely after damage to the left side than the right side. And nowadays, we can visualise greater activation of the left side in neurologically intact people as they do language tasks in a brain scanner.

There are many fascinating features of cerebral lateralisation, but I’m going to focus here on just one specific question: what do we know about genetic influences on brain asymmetry in humans?  There are really two questions here: (1) how do genes lead to asymmetric brain development? (2) are there genetic variants that can account for individual variation?  – e.g. the fact that a minority of people have right hemisphere language. I hope to return to question 2 at a later date, but for now, I’ll focus on question 1, because after reading a key paper on this topic, I've struck a whole load of questions that I can’t answer. I’m hoping that some of my genetically-sophisticated readers will be able to help me out.

It’s sometimes stated that cerebral lateralisation is a uniquely human trait, but that’s not true. Nevertheless, we are very different from our primate cousins, insofar as we show a strong population bias to right-handedness, and most people have left-hemisphere language. There are other species which show consistent brain asymmetries, but they are a long way from us on the evolutionary tree. Most of the research I’ve come across is on nematode worms, zebrafish, or songbirds. This is a long way from my comfort zone, but there are some nice reviews that document research on genes influencing asymmetries in these creatures (e.g. here and here). It’s clear, though, that it’s complicated: not just in terms of the range of genes involved, but in the different ways they can generate asymmetry. And there don't seem to be obvious parallels to human brain development.

Despite all this uncertainty, there’s growing evidence that brain asymmetries are present from very early on in life –in newborn babies and even in foetal life. This field is still in its infancy (forgive the pun), and samples of babies are typically too small to reveal reliable relationships between structure and function. Nevertheless, there’s considerable interest in the idea that physical differences between the two sides of the brain may be an indicator of potential for language development.

A particularly exciting topic is genetic determinants of cerebral lateralisation. One study in particular, by Sun et al made a splash when it was published in Science in 2005, since when it has attracted over 140 citations. The authors looked for asymmetric gene expression in post mortem embryonic brains. Their conclusions have been widely cited: “We identified and verified 27 differentially expressed genes, which suggests that human cortical asymmetry is accompanied by early, marked transcriptional asymmetries.” The fact that several different genes were identified was of particular interest to me, because genetic theories by neuropsychologists have typically assumed that just a single gene is responsible for human cerebral lateralisation. I’ve never found a single-gene theory plausible, so I was all too ready to accept evidence that involved multiple genes. But first I wanted to drill down deeper into the methods to find out how the authors reached their conclusions. I’m a psychologist, not a geneticist, and so this was rather challenging. But my deeper reading raised a number of questions.

Sun et al used a method called Serial Analysis of Gene Expression (SAGE) which compares gene expression in different tissues or – as in this case – in corresponding left and right regions of the embryonic brain. The analysis looks for specific sequences of 10 DNA base-pairs, or tags, which index particular genes. SAGE output consists of simple tables, giving the identity of each tag, its count (a measure of cellular gene expression) and an identifier and more detailed description of the corresponding gene. These tables are available for left and right sides for three brain regions (frontal, perisylvian and occipital) for 12- and 14-week old brains, and for perisylvian only for a 19-week-old brain. The perisylvian region is of particular interest because it is the brain region that will develop into the planum temporale, which has been linked with language development.  One brain at each age was used to create the set of SAGE tags.

To identify asymmetrically expressed genes the authors state performed a Monte Carlo test and verified this using the chi square test. I haven’t tracked down the specifics of the Monte Carlo test, which is part of the SAGE software package, but the chi square is pretty straightforward, and involves testing whether the distribution of expression on left and right is significantly different from the distribution of left vs. right expression across all tags in this brain region – which is close to 50%.  In the left-right perisylvian region of a 12-week-old embryonic human brain, there were 49 genes with chi square greater than 6.63 (p < .01): 21 were more highly expressed on the left and 28 more highly expressed on the right.  But for each region the authors considered several thousand tags. So I wondered whether the number of asymmetrically expressed genes was any different from what you’d expect if asymmetry was just arising by chance.

It was possible to check this out from the giant supplementary Excel files that accompany the paper, but this proved far from straightforward.  It turns out that the relationship between tags and genes is not one-to-one.  For around 40% of the tags, there is more than one corresponding gene. It was not clear which gene was selected in such cases, and why. I did find some cases where two genes were assigned to a tag, but my impression was that this was unintentional and in general the authors aimed to avoid double-counting tags. We also have the further problem that some genes are indexed by numerous tags, a point I will return to below.

But let’s just focus first on the individual tags. I compiled a master list of all tags that were expressed in any region at any age, and then made a chart of the frequency of expression in each brain region/age. I excluded any tags where the total expression count on both sides was three or less, as this is too small to show lateralisation, and this left me with 3800 to 4600 tags for analysis in each brain region. I did compute chi square as described by Sun et al, but this is not recommended for small numbers, and so I also evaluated the significance of asymmetry using a two-tailed binomial test. This doesn’t make a huge difference, but is more accurate when comparing small numbers.  Figure 1 shows the proportion of the sample for each brain region where the binomial test gives a p-value of a given size. If the distribution of expression in left and right was purely determined by chance, we’d expect the points to fall on the line. If there were genes for asymmetry we would expect the observed values to fall above the line, especially at low levels of p. It is clear this is not the case. I did cross-check my figures against those of Sun et al, and found they appeared to have missed some cases of significant asymmetry, which meant that in general they found rather fewer cases of significant asymmetry than are shown in Figure 1.

Fig 1. Proportion of tags with "significant" asymmetry, by Age/Brain Region

Sun et al didn’t rely solely on statistical tests of SAGE data to establish asymmetrical expression.  They reported validation studies using a different method for assessing gene expression (real-time PCR). But this used genes selected on the basis of a chi square value of 1.9 or greater (P < .17), which included many where the degree of asymmetry was not large. One goal of PCR analysis was to confirm asymmetric expression levels in the same embryonic brains as the SAGE analysis. Of more interest is whether the findings generalise to new brains. The authors did further cross-validation using real-time PCR with six additional brains of different ages, and reported results for the LMO4 gene, where higher perisylvian expression on the right was evident in two brains at 12 and 14 weeks of age, as well as in the original two brains of the same age. Four other brains, aged 16 to 19 months, did not show asymmetry of expression. Some of the other asymmetrically expressed genes were also tested using real-time PCR in the two other brains, and 27 showed consistent asymmetric expression. It was, however, not clear to me how the significance of asymmetry was assessed in these replication samples.

There is one particular issue I find confusing when I try to evaluate the robustness of the asymmetry results. My expectation was that if a gene was asymmetrically expressed, then this should be evident in all the tags indexing that gene. But Table 1 shows that this isn’t so. For the LMO4 gene, which is the focus of special attention in this paper, there are seven tags that are linked with the gene in at least one brain region: only one of these (in red) shows the rightward asymmetry that is the focus of the paper. Another tag (in blue) shows leftward asymmetry in one sample, and the rest have low levels of expression. Maybe there’s a simple explanation for this – if so I hope that expert geneticists among my readers may be able to comment on this aspect.
Table 1. Left- and right-expression levels for seven tags for the LMO4 gene
I’m aware of two other studies (here and here) that looked for asymmetric gene expression in embryonic human brains but failed to find it . One possible reason for this discrepancy is that these studies focused on later stages of development, rather than the 12-14 week-old period where Sun et al found asymmetry. In addition, power is always low in these studies because of the small number of brains available. As Lambert et al (2011) noted, as well as possible effects of age and gender, there may be individual variation from brain to brain, but typically only one or two samples are available at each age.

So what do I conclude from all of this? I realise for a start that these studies are very hard to do. I also realise we have to make a start somewhere, even if the amount of post mortem material is limited. But I have to say I’m not convinced from the evidence so far that the researchers have demonstrated significant asymmetry of genetic expression in embryonic brains. The methods seem to take insufficient account of the possibility of chance fluctuations in the measurements, and the numbers of asymmetries that have been found don't seem impressive, given the huge number of genes that were investigated. Clearly, something has to be responsible for the physical asymmetries that have been found in foetal and neonatal brains, and the odds seem high that genes are implicated. But is the evidence from Sun et al convincing enough to conclude that we have found some of those genes? I'd love to hear views from readers who have more expertise in this area of research.

P.S. 7th Jan 2013
Thanks to Silvia Paracchini, who drew my attention to further relevant articles:
Johnson, M. B., et al (2009). Functional and evolutionary insights into human brain development through global transcriptome analysis. Neuron, 62(4), 494-509. doi: 10.1016/j.neuron.2009.03.027
This paper looked at a slightly later developmental stage - 18 to 23 weeks gestational age - and did correct for the number of genes considered (False Discovery Rate). They reported striking symmetry of gene expression in the mid-gestational period, even though structural brain asymmetries have been described at this stage of development. Note, however, that this is not incompatible with Sun et al, who did not find evidence of asymmetry after 17 weeks gestational age.

Kang, H. J., et al (2011). Spatio-temporal transcriptome of the human brain. Nature, 478(7370), 483-489. 
This is a much larger study, covering the range from 4 weeks gestational age through childhood up to adulthood and old age. This paper does not explicitly report on asymmetry, but they describe genes where the expression varies from brain region to region, or from age to age, after adjustment for False Discovery Rate. I could find no overlap in the list of the genes identified by Sun et al and Kang et al's list of differentially expressed genes.


Abrahams, B. S., Tentler, D., Peredely, J. V., Oldham, M. C., Coppola, G., & Geschwind, D. H. (2007). Genome-wide analyses of human perisylvian cerebral cortical patterning. Proceedings of the National Academy of Sciences, 104, 17849-17854.

Dehaene-Lambertz, G., Hertz-Pannier, L., & Dubois, J. (2006). Nature and nurture in language acquisition: anatomical and functional brain-imaging studies in infants. Trends in Neurosciences, 29, 367-373.

Kivilevitch, Z., Achiron, R., & Zalel, Y. (2010). Fetal brain asymmetry: in utero sonographic study of normal fetuses. American Journal of Obstetrics and Gynecology, 202(4). doi: 359.e1

Lambert, N., Lambot, M.-A., Bilheu, A., Albert, V., Englert, Y., Libert, F., . . . Vanderhaeghen, P. (2011). Genes expressed in specific areas of the human fetal cerebral cortex display distinct patterns of evolution. PLOS One, 6(3), e17753. doi: 10.1371/journal.pone.0017753

Lash, A. E., Tolstoshev, C. M., Wagner, L., Schuler, G. D., Strausberg, R. L., Riggins, G. J., & Altschul, S. F. (2000). SAGEmap: A public gene expression resource. Genome Research, 10(7), 1051-1060. doi: 10.1101/gr.10.7.1051

Sagasti, A. (2007). Three ways to make two sides: Genetic models of asymmetric nervous system development. Neuron, 55(3), 345-351. doi: 10.1016/j.neuron.2007.07.015

Sun T, Patoine C, Abu-Khalil A, Visvader J, Sum E, Cherry TJ, Orkin SH, Geschwind DH, & Walsh CA (2005). Early asymmetry of gene transcription in embryonic human left and right cerebral cortex. Science (New York, N.Y.), 308 (5729), 1794-8 PMID: 15894532

Sun, T., & Walsh, C. A. (2006). Molecular approaches to brain asymmetry and handedness. Nature Reviews Neuroscience, 7, 655-662.

Saturday 15 December 2012

Psychology: Where are all the men?

There's a lot of interest in under-representation of women in certain science subjects, but in psychology, there's more concern about a lack of men. A quick look at figures from UCAS* (Universities & Colleges Admissions Service) shows massive differences in gender ratios for different subjects. In figure 1 I’ve plotted the percentage of women accepted for subjects that had at least 6000 successful applicants to degree courses in 2011.

Fig. 1. % Females accepted on popular UK degree courses 2011
Given the large sample sizes, the sex differences are statistically significant for all subjects except Media Studies, which is bang on 50%. As a psychologist, I found the most surprising thing about this plot was the huge preponderance of women in psychology. This didn’t square with my experiences: my colleagues include a good mix of men and women, so I was keen to find the explanation for the mismatch. There seemed to be several possible explanations, which aren’t mutually exclusive, namely:
  • Oxford University, where I work, may be biased in favour of men
  • The proportions of women decline with career stage
  • The proportion of women in psychology may have increased since I was a student
  • The proportion of women may vary with sub-area of psychology
So I set off to track down the evidence for these different explanations.

Is Oxford University biased against women?

I’m leading our department’s Athena SWAN panel, whose remit is to identify and remove barriers to women’s progress in scientific careers. In order to obtain an Athena SWAN award, you have to assemble a lot of facts and figures about the proportions of women at different career stages, and so I already had at my fingertips some relevant statistics. (You can find these here). Over the past three years, our student intake ranged from 66% -71% women: rather lower than the UCAS figure of 78%. However, acceptance rates were absolutely equivalent for men and women. The same was true for staff appointments: the likelihood of being accepted for a job did not differ by gender. So with a sigh of relief I think we can exclude this line of explanation.

Does the proportion of women in psychology decline with career stage?

I have a research post and so don’t do much teaching. Have I got a distorted view of the gender ratios because my interactions are mostly with more senior staff? This looks believable from the data on our department. Postgraduate figures ranged from 65%-70% women. Ours is a small department, and so it is difficult to be confident in trends, but in 2011 there were 16/27 (59%) female postdocs, 6/11 (55%) female lecturers, 6/13 (46%) senior researchers and 4/11 (36%) female professors. This trend for the proportion of women to decline as one advances through a career is in line with what has been observed in many other disciplines. We also obtained data from other top-level psychology departments for comparison, and similar trends were seen.

Has the proportion of women in psychology increased over time?

My recollection of my undergraduate days was that male psychology students were plentiful. However, I was an undergraduate in the dark ages of the early 1970s when there were only five Oxford colleges that accepted women, and a corresponding shortage of females in all subjects. So I had a dig around to try to get more data. The UCAS statistics go back only to 1996, and the proportion of women in psychology hasn’t changed: 78% in 1996, 78% in 2011. However, data from the USA show a sharp increase in the proportion of women obtaining psychology doctorates from 1960 (18%) through 1972 (27%) to 1984 (50%). This, of course, is in part a consequence of the increase of women in higher education in general. But that isn’t a total explanation: Figure 2 compares proportions of female PhDs over time in different subject areas, and one can see that psychology shows a particularly pronounced increase compared with other disciplines.
Fig 2. Percentages of PhDs by women in the USA: 1950-1984

Does the proportion of women in psychology vary with sub-area?

The term ‘psychology’ covers a huge range of subject matter with different historical roots. Most areas of academic psychology make some use of statistics, but they vary considerably in how far they require strong quantitative or computational skills. For instance, it would be difficult to specialise in the study of perception or neuroscience without being something of a numbers nerd: that’s generally less true for developmental, clinical, interpersonal or social psychology, which require other skills sets. I looked at data from the American Psychological Association (APA), which publishes the numbers of members and fellows in its different Divisions. The APA is predominantly a professional organisation, and non-applied areas of psychology are not strongly represented in the membership. Nevertheless, one can see clear gender differences, which generally map on to the expectation that women are more focused on the caring professions, and men are more heavily represented in theoretical and quantitative areas. Figure 3 shows relevant data for sections with at least 700 members. It is also worth noting that the graph illustrates the decrease in the proportions of women going from membership to fellowship, a trend bucked by just one Division.
Fig 3. Data from American psychological association: Division membership 2011

What, if anything, should we do?

The big question is how far we should try to manipulate gender differences when we find them. I’ve barely scratched the surface in my own discipline, psychology, yet it’s evident that the reasons for such differences are complex. Figure 2 alone makes it clear that women in Western societies have come a long way in the past half-century: far more of us go to university and do PhDs than was the case fifty years ago. Yet the proportion of women declines as we climb the career ladder. In quantifying this trend, it’s important to compare like with like: those who are in senior positions now are likely to have trained at a time when the gender ratio was different. But it's clear from many surveys that demographics changes can't explain the dearth of women in top jobs: there are numerous reasons why women are more likely than men to leave an academic career – see, for instance, this depressing analysis of reasons why women leave chemistry. In our department we are committed to taking steps to ensure that gender does not disadvantage women who want to pursue an academic career, and I am convinced that with even quite minor changes in culture we can make a difference.

The point I want to stress here, though, is that I see this issue - creating a female-friendly environment for women in psychology-  as separate from the issue of subject preference. I worry that the two issues tend to get conflated in discussions of gender equality. My personal view is that psychology is enriched by having a mix of men and women, and I share the concerns expressed here about difficulties that arise when the subject becomes heavily biased to one gender. However, I am pretty uncomfortable with the idea of trying to steer people’s career choices in order to even out a gender imbalance.

Where this has been tried, my impression is that it's mostly been in the direction of trying to encourage more girls into male-dominated subjects. In effect, the argument is that girl's preferences  are based on wrong information, in that they are unduly influenced by stereotypes. For instance, the Institute of Physics has done a great deal of work on this topic, and they have shown that there are substantial influences of schooling on girls’ subject choices. They concluded that the weak showing of girls in physics can be attributed to lack of inspirational teaching, and a perception among girls that physics is a boys’ subject. They have produced materials to help teachers overcome these influences, and we’ll have to wait and see if this makes any appreciable difference to the proportions of girls taking up the subject (which according to UCAS figures has been pretty stable for 15 years: 19% in 1996 and 18% in 2011).

It's laudable that the Institute of Physics is attempting to improve the teaching of physics in our schools, and to ensure girls do not feel excluded. But if they are right, and gender stereotyping is a major determinant of subject choices, shouldn’t we then adopt similar policies to other subjects that show a gender bias, whether this be in favour of girls or boys?

Interestingly, Marc Smith has produced relevant data in relation to A-level psychology, which is dominated by girls, and perceived by boys as a ‘girly’ subject. So should we try to change that? As Smith notes, the female bias seems linked to a preference for schools to teach A-level psychology options that veer away from more quantitative cognitive topics. Here we find that psychology provides an interesting test case for arguments around gender, because within the subject there are consistent biases for males and females to prefer one kind of sub-area to another. This implies that to alter the gender balance you might need to change what is taught, rather than how it is taught, by giving more prominence to the biological and cognitive aspects of psychology. If true, it might be easier to alter gender ratios in psychology than in physics, but only by modifying the content of the syllabus.

One of the IOP's recommendations is: "Co-ed schools should have a target to exceed the current national average of 20% of physics A-level students being girls." But surely this presumes an agenda whereby we aim for equality of genders in all subjects, with equivalent campaigns to recruit more boys into nursing, psychology and English? I'm not saying this would necessarily be a bad thing, but I wonder at the automatic assumption that it has to be a good thing - or even an achievable thing. There are obvious disadvantages of gender imbalances in any subject area - they simply reinforce stereotypes, while at the same time creating challenges at university and in the workplace for those rare individuals who buck the trend and take a gender-atypical subject. But the kinds of targets set by the IOP make me uneasy nonetheless. The downside of an insistence on gender balance is a sense of coercion, whereby children are made to feel that their choice of subject isn't a real choice, but is only made because they  have been brainwashed by gender stereotypes. Yes, let's do our best to teach boys and girls in an inspiring and gender-neutral fashion, but, as the example of psychology demonstrates, we are still likely to find that females and males tend to prefer different kinds of subject matter.

Smith, M (2011). Failing boys, failing psychology The Psychologist, 24 (5), 390-391 Other: WOS:000290745000037  

Howard, A., & et al, . (1986). The changing face of American psychology: A report from the Committee on Employment and Human Resources. American Psychologist, 41 (12), 1311-1327 DOI: 10.1037//0003-066X.41.12.1311

*Update 10th March 2016: The link I originally had for UCAS data ceased to work. I have a new link, and think this should be the correct dataset, but I have not rechecked the figures.

Wednesday 21 November 2012

Moderate drinking in pregnancy: toxic or benign?

There’s no doubt that getting tipsy while pregnant is a seriously bad idea. Alcohol is a toxin that can pass through the placenta to the foetus and cause damage to the developing brain.  For women who are regular heavy drinkers or binge drinkers, there is a risk that the child will develop foetal alcohol syndrome, a condition that affects physical development and is associated with learning difficulties.
But what of more moderate drinking? The advice is conflicting. Many doctors take the view that alcohol is never going to be good for the developing foetus and they recommend complete abstention during pregnancy as a precautionary measure. Others have argued, though, that this advice is too extreme, and that moderate drinking does not pose any risk to the child.

Last week a paper by Lewis et al was published in PLOS One providing evidence on this issue, and concluding that moderate drinking does pose a risk and should be avoided. The methodology of the paper was complex and it’s worth explaining in detail what was done.

The researchers used data from ALSPAC, a large study that followed the progress of several thousand British children from before birth. A great strength of this study is that information was gathered prospectively: in the case of maternal drinking, mothers completed questionnaires during pregnancy, at 18 and 32 weeks gestation.  Obviously, the data won’t be perfect: you have to rely on women to report their intake honestly, but it’s hard to see how else to gather such data without being overly intrusive. When children were 8 years old, they were given a standard IQ test, and this was the dependent variable in the study.

One obvious thing to do with the data would be to see if there is any relationship between amount drank in pregnancy and the child’s IQ. Quite a few studies have done this and a recent systematic review concluded that, provided one excluded women who drank more than 12 g (1.5 UK units) per day or who were binge-drinkers, there was no impact on the child. Lewis et al pointed out, however, that this is not watertight, because drinking in pregnancy is associated with other confounding factors. Indeed, in their study, the lowest IQs were obtained by children of mothers who did not drink at all during pregnancy. However, these mothers were also likely to be younger and less socially-advantaged than mothers who drank, making it hard to disentangle causal influences.

So this is where the clever bit of the study design came in, in the shape of mendelian randomisation. The logic goes like this: there are genetic differences between people in how they metabolise alcohol. Some people can become extremely drunk, or indeed ill, after a single drink, whereas others can drink everyone else under the table. This relates to variation in a set of genes known as ADH genes, which are clustered together on chromosome 4. If a woman metabolises alcohol slowly, this could be particularly damaging to the foetus, because alcohol hangs around in the bloodstream longer. There are quite large racial differences in ADH genes, and for that reason the researchers restricted consideration just to those of White European background. For this group, they showed that variation in ADH genes is not related to social background. So they had a very specific prediction: for women who drank in pregnancy, there should be a relationship between their ADH genes and the child’s outcome. However, if the woman did not drink at all, then the ADH genotype should make no difference. This is the result they reported. It’s important to be clear that they did not directly estimate the impact of maternal drinking on the child’s IQ: rather, they inferred that if ADH genotype is associated with child’s IQ only in drinkers, then this is indirect evidence that drinking is having an impact. This is a neat way of showing that there is an effect of a risk factor (alcohol consumption) avoiding the complications of confounding by social class differences.

Several bloggers, however, were critical of the study. Skeptical Scalpel noted that the effect on IQ was relatively small and not of clinical significance. However, in common with some media reports, he seems to have misunderstood the study and assumed that the figure of 1.8 IQ points was an estimate of the difference between drinkers and abstainers – rather than the effect of ADH risk alleles in drinkers (see below). David Spiegelhalter pointed out that there was no direct estimate of the size of the effect of maternal alcohol intake. Indeed, when drinkers and non-drinkers were directly compared, IQs were actually slightly lower in non-drinkers. Carl Heneghan also commented on the small IQ effect size, but was particularly concerned about the statistical analysis, arguing that it did not adjust adequately for the large number of genetic variants that were considered.

Should we dismiss effects because they are small? I’m not entirely convinced by that argument. Yes, it’s true that IQ is not a precise measure: if an individual child has an IQ of 100, there is error of measurement around that estimate so that the 95% confidence interval is around 95-105 (wider still if a short form IQ is used, as was the case here). This measurement error is larger than the per-allele effects reported by Lewis et al., but they were reporting means from very large numbers of children. If there are reliable differences between these means, then this would indicate a genuine impact on cognition, potentially as large as 3.5 IQ points (for those with four rather than two risk alleles). Sure, we should not alarm people by implying that moderate drinking causes clinically significant learning difficulties, but I don’t think we should just dismiss such a result. Overall cognitive ability is influenced by a host of risk factors, most of which are small, but whose effects add together. For a child who already has other risks present, even a small downwards nudge to IQ could make a difference.

But what about Heneghan’s concern about the reliability of the results? This is something that also worried me when I scrutinised Table 1, which shows for each genetic locus the ‘per allele’ effect on IQ. I’ve plotted the data for child genotypes in Figure 1. Only one SNP (#10) seems to have a significant effect on child IQ. Yet when all loci were entered into a stepwise multiple regression analysis, no fewer than four child loci were identified as having a significant effect. The authors suggested that this could reflect interactions between genes that are on the same genetic pathway.
Effect of child SNP variants (per allele) on IQ (in IQ points), with 95% CI, from Lewis et al Table 1,

I had been warned about stepwise regression by those who taught me statistics many years ago. Wikipedia has a section on Criticisms, noting that results can be biased when many variables are included as predictors. But I found it hard to tell just how serious a problem this was. When in doubt, I find it helpful to simulate data, and so that is what I did in this case, using a function in R that generates multivariate normal data. So I made a dataset where there was no relationship between any of 11 variables – ten of which were designated as genetic loci, and one as IQ. I then ran backwards stepwise regression on the dataset. I repeated this exercise many times, and was surprised at just how often spurious associations of IQ with ‘genotypes’ was seen (as described here). I was concerned that this dataset was not a realistic simulation, because the genotype data from Lewis et al consisted of counts of how many uncommon alleles there were at a given locus (0, 1 or 2 – corresponding to aa, aA or AA, if you remember Mendel’s peas). So I also simulated that situation from the same dataset, but actually it made no difference to the findings. Nor did it make any difference if I allowed for correlations between the ‘genotypes’. Overall, I came away alarmed at just how often you can get spurious results from backwards stepwise regression – at least if you use the AIC criterion that is the default in the R package.

Lewis et al did one further analysis, generating an overall risk score based on the number of risk alleles (i.e. the version of the gene associated with lower IQ) for the four loci that were selected by the stepwise regression. This gave a significant association with child IQ, just in those who drunk in pregnancy: mean IQ was 104.0 (SD 15.8) for those with 4+ risk alleles, 105.4 (SD = 16.1) for those with 3 risk alleles and 107.5 (SD = 16.3) for those with 2 or less risk alleles. However, I was able to show very similar results from my analysis of random data: the problem here is that in a very large sample with many variables some associations will emerge as significant just by chance, and if you then select just those variables and add them up, you are capitalising on the chance effect.

One other thing intrigued me. The authors made a binary divide between those who reported drinking in pregnancy and those who did not. The category of drinker spanned quite a wide range from those who reported drinking less than 1 unit per week (either in the first 3 months or at 32 weeks of pregnancy) up to those who reported drinking up to 6 units per week. (Those drinking more than this were excluded, because the interest was in moderate drinkers). Now I’d have thought there would be interest in looking more quantitatively at the impact of moderate drinking, to see if there was a dose-response effect, with a larger effect of genotype on those who drank more. The authors mentioned a relevant analysis where the effect of genotype score on child IQ was greater after adjustment for amount drank at 32 weeks of pregnancy, but it is not clear whether this was a significant increase, or whether the same was seen for amount drank at 18 weeks. In particular, one cannot tell whether there is a safe amount to drink from the data reported in this paper. In a reply to my comment on the PLOS One paper, the first author states: “We have since re-run our analysis among the small group of women who reported drinking less than 1 unit throughout pregnancy and we found a similar effect to that which we reported in the paper.” But that suggests there is no dose-response effect for alcohol: I’m not an expert on alcohol effects, but I do find it surprising that less than one drink per week should have an effect on the foetal brain – though as the author points out, it’s possible that women under-reported their intake.

I’m also not a statistical expert and I hesitate to recommend an alternative approach to the analysis, though I am aware that there are multiple regression methods designed to avoid the pitfalls of stepwise regression. It will be interesting to see whether, as predicted by the authors, the genetic variants associated with lower IQ are those that predispose to slow alcohol metabolism. At the end of the day, the results will stand or fall according to whether they replicate in an independent sample.

Lewis SJ, Zuccolo L, Davey Smith G, Macleod J, Rodriguez S, Draper ES, Barrow M, Alati R, Sayal K, Ring S, Golding J, & Gray R (2012). Fetal Alcohol Exposure and IQ at Age 8: Evidence from a Population-Based Birth-Cohort Study. PloS one, 7 (11) PMID: 23166662

Thursday 15 November 2012

Are Starbucks hiding their profits on the planet Vulcan?

I just love the fact that the BBC have a Democracy Live channel where you can watch important government business. The Public Accounts Committee may sound incredibly dull, but I found this footage riveting. The committee grills executives from Starbucks, Amazon and Google about their tax arrangements. Quite apart from the content, it provides a wealth of material for anyone interested in how we interpret body language as a cue to a person's honesty. But for me it raised a serious issue about Starbucks. Is it run by aliens?

Tuesday 13 November 2012

Flaky chocolate and the New England Journal of Medicine

Early in October a weird story hit the media: a nation’s chocolate consumption is predictive of its number of Nobel prize-winners, after correcting for population size. This is the kind of kooky statistic that journalists  love, and the story made a splash. But was it serious? Most academics initially assumed not. The source of the story was the New England Journal of Medicine, an august publication with stringent standards, which triages a high proportion of submissions that don’t get sent out for review. (And don't try asking for an explanation of why you’ve been triaged). It seemed unlikely that a journal with such exacting standards would give space to a lightweight piece on chocolate. So the first thought was that the piece had been published to make a point about the dangers of assuming causation from correlation, or the inaccuracies that can result when a geographical region is used as the unit of analysis. But reading the article more carefully gave one pause. It did have a somewhat jocular tone. Yet if this was intended as a cautionary tale, we might have expected it to be accompanied by some serious discussion of the methodological and interpretive problems with this kind of analysis. Instead, beneficial effects of dietary flavanols was presented as the most plausible explanation of the findings.

The author, cardiologist Franz Messerli, did discuss the possibility of a non-causal explanation for the findings, only to dismiss it. He stated “as to a third hypothesis, it is difficult to identify a plausible common denominator that could possibly drive both chocolate consumption and the number of Nobel laureates over many years. Differences in socioeconomic status from country to country and geographic and climatic factors may play some role, but they fall short of fully explaining the close correlation observed.” And how do we know “they fall short?” Well, because the author, Dr Messerli, says so.

As is often the case, the blogosphere did a better job of critiquing the paper than the journal editors and reviewers (see, for instance, here and here). The failure to consider seriously the role of a third explanatory variable was widely commented on, but, as far as I am aware, nobody actually did the analysis that Messerli should have done. I therefore thought I'd give it a go. Messerli explained where he’d got his data from – a chocolatier’s website and Wikipedia – so it was fairly straightforward to reproduce them (with some minor differences due to missing data from one chocolate website that's gone offline). Wikipedia helpfully also provided data on gross domestic product (GDP) per head for different nations, and it was easy to find another site with data on proportion of GDP spend on education (except China, which has figures here). So I re-ran the analysis, computing the partial correlation between chocolate consumption and Nobel prizes after adjusting for spend per head on education. When education spend was partialled out, the correlation dropped from .73 to .41, just falling short of statistical significance.

Since Nobel laureates typically are awarded their prizes only after a long period of achievement, a more convincing test of the association would be based on data on both chocolate consumption and education spend from a few decades ago. I’ve got better things to do than to dig out the figures, but I suggest that Dr Messerli might find this a useful exercise.

Another point to note is that the mechanism proposed by Dr Messerli involves an impact of improved cardiovascular fitness on cognitive function. The number of Nobel laureates is not the measure one would pick if setting out to test this hypothesis. The topic of national differences in ability is a contentious and murky one, but it seemed worth looking at such data as are available on the web to see what the chocolate association looks like when a more direct measure is used. For the same 22 countries, the correlation between chocolate consumption and estimated average cognitive ability is nonsignificant at .24, falling to .13 when education spend is partialled out.

I did write a letter to the New England Journal of Medicine reporting the first of my analyses (all there was room for: they allow you 175 words), but, as expected, they weren't interested. "I am sorry that we will not be able to print your recent letter to the editor regarding the Messerli article of 18-Oct-2012." they wrote. "The space available for correspondence is very limited, and we must use our judgment to present a representative selection of the material received."

It took me all of 45 minutes to extract the data and run these analyses. So why didn’t Dr Messerli do this? And why did the NEJM editor allow him to get away with asserting that third variables “fall short” when it’s so easy to check it out? Could it be that in our celebrity-obsessed world, the journal editors think that there’s no such thing as bad publicity?

Messerli, F. (2012). Chocolate Consumption, Cognitive Function, and Nobel Laureates New England Journal of Medicine, 367 (16), 1562-1564 DOI: 10.1056/NEJMon1211064

Saturday 27 October 2012

Auditory processing disorder (APD): Schisms and skirmishes

Photo credit: Ben Earwicker, Garrison Photography, Boise, ID
A remarkable schism is developing between audiologists in the UK and the USA on the topic of auditory processing disorder (APD) in children. In 2010, the American Academy of Audiology published clinical practice guidelines for auditory processing disorder.  In 2011, the British Society of Audiology published a position statement on the same topic, which came to rather different conclusions. This month a White Paper by the British Society of Audiology appeared reaffirming their position alongside invited commentaries.
So what is all the fuss about? The argument centres on how to diagnose APD in children. Most of the tests used in the USA to identify APD involve responding to speech. One of the most widely-used assessments is the SCAN-C battery which has four subtests:
  • Filtered words: Repeat words that have been low-pass filtered, so they sound muffled
  • Auditory figure-ground: Repeat words that are presented against background noise (multi-talker babble)
  • Competing words: Repeat words that are presented simultaneously, one to each ear (dichotically)
  • Competing sentences: Repeat sentences presented to one ear while ignoring those presented simultaneously to the other ear
In 2006, David Moore, Director of the Medical Research Council’s Institute of Hearing Research in Nottingham, created a stir when he published a paper arguing that APD diagnosis should be based on performance on non-linguistic tests of auditory perception. Moore’s concern was that tests such as SCAN-C, which use speech stimuli, can’t distinguish an auditory problem from a language problem. I made similar arguments in a blog post written last year. Consider the task of doing a speech perception test in a foreign language: if you don’t know the language very well, then you may fail the test because you are poor at distinguishing unfamiliar speech sounds or recognising specific words. This wouldn’t mean you had an auditory disorder.
A recent paper by Loo et al (2012) provided concrete evidence for this concern. They compared multilingual and monolingual children on performance on an APD battery. All children were schooled in English, but a high proportion spoke another language at home.  The child’s language background did not affect performance on non-linguistic APD tests, but had a significant effect on most of the speech-based tests.
Results from the study were reported in 2010 and presented a challenge for the concept of APD.  Specifically, Moore et al concluded that, when effect of task demands had been subtracted out,  non-linguistic measures of auditory processing “bore little relationship to measures of speech perception or to cognitive, communication, and listening skills that are considered the hallmarks of APD in children. This finding provides little support for the hypothesis that APD involves impaired processing of basic sounds by the brain, as currently embodied in definitions of APD.”
Overall, Moore et al found that if we use auditory measures that are carefully controlled to minimise effects of task demands and language ability, we find that they don’t identify children about whom there is clinical concern.  Nevertheless, children exist for whom there is a clinical concern, insofar as the child reports difficulty in perceiving speech in noise. So how on earth are we to proceed?
In the White Paper, the BSA special interest group suggest that the focus should be on developing standardized methods for identifying clinical characteristics of APD, particularly through the use of parental questionnaires.
The experts who responded to Moore and colleagues took a very different line.  The specific points they raised varied, but they were not happy with the idea of reliance on parental report as the basis for APD diagnosis.  In general, they argued for more refined measures of auditory function. Jerger and Martin (USA) expressed substantial agreement with Moore et al about the nature of the problem confronting the APD concept. “There can be no doubt that attention, memory, and language disorder are the elephants in the room. One can view them either as confounds in traditional behavioral tests of an assumed sensory disorder or, indeed, as key factors underlying the very nature of a ‘more general neurodevelopmental delay’” . They rejected, however, the idea of questionnaires for diagnosis, and suggested that methods such as electroencephalography and brain imaging could be used to give more reliable and valid measures of APD.
Dillon and Cameron (Australia) queried the usefulness of a general term such as APD, when the reality was that there may be many different types of auditory difficulty, each requiring its own specific test. They described their own work on ‘spatial listening disorder’, arguing that this did relate to clinical presentation.
The most critical of Moore et al’s arguments were Bellis and colleagues (USA). They implied that a good clinician can get around the confound between language and auditory assessments: “Additional controls in cases in which the possible presence of a linguistic or memory confound exists may include assessing performance in the non-manipulated condition (e.g. monaural versus dichotic, nonfiltered versus filtered, etc.) to ensure that performance deficits seen on CAPD tests are due to the acoustic manipulations rather than to lack of familiarity with the language and/or significantly reduced memory skills.” Furthermore, according to Bellis et al, the fact that speech tasks don’t correlate with non-speech tasks is all the more reason for using speech tasks in an assessment, because “in some cases central auditory processing deficits may only be revealed using speech tasks”. 
Moore et al were not swayed by these arguments. They argued first, that neurobiological measures, such as electroencephalography, are no easier to interpret than behavioural measures. I’d agree that it would be a mistake to assume such measures are immune from top-down influences (cf. Bishop et al, 2012) and reliability of measurement can be a serious problem (Bishop & Hardiman,2010). Moore et al were also critical of the idea that language factors can be controlled for by within-task manipulations when speech tasks are used. This is because the use of top-down information (e.g. using knowledge of vocabulary to guess what a word is) becomes more important as a task gets harder, so a child whose poor language has little impact on performance in an easy condition (e.g. listening in quiet) may be much more affected when conditions get hard (e.g. listening in noise). In addition, I would argue that the account by Bellis et al implies that they know just how much allowance to make for a child’s language level when giving a clinical interpretation of test findings. That is a dangerous assumption in the absence of hard evidence from empirical studies.
So are we stuck with the idea of diagnosing APD from parental questionnaires? Moore et al argue this is preferable to other methods because it would at least reflect the child’s symptoms, in a way that auditory tests don’t. I share the reservations of the commentators about this, but for different reasons. To my mind this approach would be justified only if we also changed the label that was used to refer to these children.  The research to date suggests that children who report listening difficulties typically have deficits in language, literacy, attention and/or social cognition (Dawes & Bishop, 2010; Ferguson et al, 2011). There’s not much evidence that these problems are usually caused by low-level auditory disorder. It is therefore misleading to diagnose children with APD on the basis of parental report alone, as this label implies a primary auditory deficit.
In my view, we should reserve APD as a term for  low-level auditory perceptual problems in children with normal hearing, which are not secondary consequences of language or attentional deficits. The problem is that we can’t make this diagnosis without more information about the ways in which top-down influences impact on auditory measures, be they behavioural or neurobiological. The population study by Moore et al (2010) made a start on assessing how far non-linguistic auditory deficits related (or failed to relate) to cognitive deficits and clinical symptoms in the general population. The study by Loo et al (2012) adopts a novel approach to understanding how language limitations can affect auditory test results, when those limitations are due to the child’s language background, rather than any inherent language disorder. The onus is now on those who advocate diagnosing APD on the basis of existing tests to demonstrate that they are not only reliable but also valid according to these kinds of criteria. Until they do so, the diagnosis of APD will remain questionable.

P.S. 12th November 2012
Brief video by me on "Auditory processing disorder and language impairment" available here: (with links to supporting slideshow and references)
Loo, J., Bamiou, D., & Rosen, S. (2012). The Impacts of Language Background and Language-Related Disorders in Auditory Processing Assessment Journal of Speech, Language, and Hearing Research DOI: 10.1044/1092-4388(2012/11-0068)
Moore, D., Rosen, S., Bamiou, D., Campbell, N., & Sirimanna, T. (2012). Evolving concepts of developmental auditory processing disorder (APD): A British Society of Audiology APD Special Interest Group ‘white paper’ International Journal of Audiology, 1-11 DOI: 10.3109/14992027.2012.723143

See also previous blogpost: "When commercial and clinical interests collide" (6th March 2011)