Autism has three key defining features: impairments in communication, social interaction and behavioural repertoire. The latter encompasses both repetitive behaviours such as stereotyped movements, and restricted interests, e.g., an obsessive fascination with aeroplanes. After autism was first described by Leo Kanner in 1943, the diagnosis quickly became popular, but there were concerns that it was over-used. There was a clear need to translate Kanner’s clinical descriptions into more objective diagnostic criteria. The first step was to develop checklists of symptoms, and these were included for the first time in the 1980 version of the Diagnostic and Statistical Manual of the American Psychiatric Association, DSM-III. However, this still left room for uncertainty: clinicians might, for instance, disagree about interpretation of terms such as: “Pervasive lack of responsiveness to other people”.
The Autism Diagnostic Interview (ADI) was designed to address this problem. It was first published in 1989, with a revised form, ADI-R, appearing in 1994. The ADI-R typically takes 1.5 to 3 hours to administer, and covers all the symptoms of autism and related conditions. Items are coded on a 4-point scale, from 0 (absent) to 3 (present in extreme form). The scores from a subset of items are then combined to give a total for each of the three autism domains, and a diagnosis of autism is given if scores on all three domains are above cutoffs, and onset was evident by 36 months. Validation of the ADI-R was carried out by comparing scores for 25 preschool children who had clinically diagnosed autism and 25 nonautistic children with intellectual retardation or language impairment.
Right from the outset, however, there was concern that diagnosis of autism should not be made on the basis of parental report alone. Some parents are poor informants. On the one hand, they may fail to remember key features of their child’s behaviour; on the other hand, their memories may have been coloured by reading about autism. Parental report therefore needs to be backed up by observation of the child. The Autism Diagnostic Observation Schedule (ADOS), published in 1989, was designed for this purpose. It exposes the child to a range of situations designed to elicit autistic features, and particular behaviours, such as eye contact, are then coded by a trained examiner. ADOS-G, a generic version, was published in 2000, and covers a wide age range, from toddlers through to adults.
ADI-R and ADOS-G quickly became the instruments of choice for autism diagnosis. It was generally appreciated that if we use standard instruments, researchers and clinicians should be able to communicate about autism with a fair degree of confidence that they are referring to individuals who meet the same diagnostic criteria.
There is, however, a downside. ADI-R and ADOS-G were designed to be comprehensive, but they were not designed to be efficient. As noted above, ADI-R takes up to 3 hours to administer and score. ADOS-G takes about 45 minutes. In addition, testers must be trained to use each instrument, and it may take some months to find a place on a training course. Each course lasts around one week, and the trainee then has to do further assessments which are recorded and sent for validation by experts. This process can easily add another 6 months. For anyone under time pressure, such as a doctoral student or grant-holder with research assistants on fixed term contracts, the training requirements can make a study impossible to do. Inclusion of both ADI-R and ADOS-G in a test protocol can double or treble the duration of a project, especially where it is necessary to travel to interview parents who may be available only during anti-social hours.
So, does autism diagnosis need to involve such a lengthy process? This was a question I was prompted to consider when I was asked to speak at a roundtable debate on diagnostic tools for autism at the International Meeting For Autism Research (IMFAR) in London in 2008. I concluded that the answer is almost certainly no. As Matson et al (2007) put it: “Some measures emphasize the fact that they are very detailed. We would argue that detail equals time. From a pragmatic perspective, our view is that a major priority should be to develop the balance between obtaining relevant information to make a diagnosis, while parsing out items that do not enhance that goal”. (p. 49). I was surprised when I first undertook ADI-R training to find that the interview included many items that did not feature in the final algorithm. When I (and others) queried this, we were told that the interview worked in its entirety, and to pull out selected items would disrupt a natural flow. Also, the non-algorithm items might be useful for diagnosing conditions other than autism. While both points may be important in a clinical setting, they have much less force for the poor graduate student who is doing a doctorate on, say, perceptual processing in autism, and only wants to use the ADI-R to confirm a diagnosis that has already been made by a clinician. There was a short form, I was told, but it was not recommended and should only be used by clinicians, not researchers. I was even more surprised to find how the algorithm was devised. I was familiar with discriminant function analysis, whereby you take a set of scores on two (or more) groups, and find the best weighted sum of scores to discriminate the groups. You can then use the correlations between items to drop items from the algorithm successively until you get to the point where accuracy of diagnostic assignment declines if further items are dropped. I had assumed that this kind of statistical data reduction had been used to identify the optimal set of items for identifying autism. I was wrong. It seemed that the items were selected on the basis of their match to clinical descriptions of autism, and no attempt had been made to test the efficiency of the algorithm by dropping items.
There is good reason to believe that a much shorter and simpler procedure would be feasible. In 1999, a study was published comparing diagnostic accuracy of the ADI with a 40-item screening questionnaire. The accuracy of the questionnaire was as good as the interview. My impression is that this finding did not lead to rejoicing at the prospect of a shorter diagnostic procedure, but rather alarm that diagnosis could be reduced to a trivial box-checking exercise. While I have some sympathy with that view, I feel these results should have made the researchers pause to consider whether a much shorter and more efficient approach to diagnosis might be feasible.
But the problems get worse. As Jon Brock recently pointed out on his blog, Kanner’s view of autism as a distinct syndrome is no longer accepted. It’s clear that autism symptoms can occur in milder form, and that they do not necessarily all go together. The broader term ‘Autism Spectrum Disorder’ (ASD) is nowadays used to encompass these cases. The schematic illustration in Figure 1 illustrates the diagnostic problem: one has to decide where to place boundaries on the figure to distinguish ASD from normality, when in reality, all three domains of the autism triad of symptoms shade into normality, with no sharp cutoffs. The ADI-R algorithm specifies whether or not you have autism, and does not give cutoffs for milder forms of ASD. The ADOS-G does have cutoffs for milder forms, but is inappropriate for detailed assessment of repetitive behaviours/restricted interests, so it is not watertight. This means that, when it comes to diagnosis of ASD, a great deal is left to clinician’s judgements. These, rather, than algorithm scores, are used to arrive at diagnoses.
My own recommendation is for a two-step procedure. The first step would involve a much briefer version of the ADI-R, which would be designed to pick up clear-cut cases of autism that everyone would agree on and distinguish them from clearly non-autistic cases. It’s an empirical question, but I suspect that if we were to do a stepwise discriminant analysis to identify a minimum set of diagnostic items, this would be considerably shorter than the current set used in ADI-R. The interview may then need redesigning so that it still flows fluently and follows a logical course, but this should not be impossible. In clinical settings, those identified might require further direct assessment to confirm diagnosis and identify specific needs, but this would not be necessary for determining who should be included in a research study. This would leave a group of children in whom autism was suspected but not confirmed. The question here is whether we will ever arrive at a diagnostic procedure that will clearly separate such children into ASD and non-ASD. I was involved in a study with just such a group of ‘marginal’ cases a few years ago, where we administered ADI-R and ADOS-G. The results were all over the place: some children looked autistic on ADI-R and not on ADOS-G and others showed the opposite pattern. Some had evidence of marked change in behaviour between preschool and school-age years. When I asked an autism expert how such cases should be categorised, he suggested I get an expert clinical opinion. Yet expert clinical opinion is not seen as adequate by many journal editors! And there is documented evidence that even experienced clinicians will disagree in cases where the child has a confusing pattern of symptoms, and that expert diagnoses are not stable over time. My suggestion is that in our current state of knowledge it makes no sense to try and get reliable cutoffs for identifying ASD. Instead we should aim to assess the nature and degree of impairments in different domains. Assessments such as the 3Di or Social Responsiveness Scale, which treat autistic features as dimensions rather than all-or-none symptoms, seem better suited to this task than the existing gold standards.
Finally, I must emphasise that, although I think the ADI-R and ADOS-G are not optimal for diagnosing ASD for research purposes, they nevertheless have value. They distill a great deal of clinical wisdom in the assessment process, and are cleverly crafted to pinpoint the key features of autism. Anyone who undergoes training in their use will come away with a far greater understanding of autism than they had when they started. However, these instruments are not well suited for addressing the NIH aim of “accelerating scientific discovery”. In research contexts they have the opposite effect, by making researchers go through an unnecessarily long and complex diagnostic process which does not yield suitable quantitative results for assessing the dimensional aspect of ASD.
Rondeau E, Klein LS, Masse A, Bodeau N, Cohen D, & Guilé JM (2010). Is Pervasive Developmental Disorder Not Otherwise Specified Less Stable Than Autistic Disorder? A Meta-Analysis. Journal of autism and developmental disorders PMID: 21153874