Examiner Bias in Forensic RFLP Analysis
By WILLIAM C. THOMPSON
(The case discussed here was first described in William C. Thompson, A Sociological Perspective on the Science of Forensic DNA Testing. U.C. Davis Law Review, 1997, 30(4), 1113-1136. Parts of this case study are excerpted from that article).
Forensic DNA analysts often rely on subjective judgment when interpreting test results. Whether a test is interpreted as a damning incrimination or a complete exculpation may depend entirely on a subjective determination. If analysts were blind to the expected result when they made these determinations, then their reliance on subjective judgment would create few problems. In most forensic laboratories, however, analysts are not blind when they "score" DNA tests. Analysts often are in direct contact with detectives and hear all about the case (at least from the police perspective). They may even see themselves as part of the law enforcement team, whose job it is to help "make the case" against an obviously guilty suspect. These circumstances create a danger that analysts may intentionally or unintentionally be biased toward the police theory of the case when making subjective determinations.
Of course this problem does not arise in every case. In many cases the results are sufficiently clear that interpretation is straightforward and uncontroversial. The results of DNA tests can properly be divided into three categories: clear inclusions (i.e., comparisons plainly showing a "match" between the genetic characteristics of two samples); clear exclusions (i.e., comparisons plainly showing a difference between the genetic characteristics of two samples); and ambiguous or uncertain comparisons. It is the third category that raises concerns.
As an illustration, consider the DNA results in People v. Marshall (Los Angeles County Superior Court, No. BA 069796, 1996). Marshall (hereafter "Suspect 1") and another man (hereafter "Suspect 2") were accused of kidnapping and raping a woman. The woman could not identify her assailants, so the key evidence was a DNA test performed by Genetic Design, a commercial laboratory in North Carolina. The laboratory used RFLP analysis, currently the most common DNA testing procedure, to examine the genetic characteristics of the two suspects, the victim, and a vaginal aspirate. Five genetic loci were examined.
According to the laboratory report, "DNA banding patterns obtained from the male fraction of the vaginal aspirate demonstrate DNA from two individuals consistent with the patterns obtained from [the two suspects]." The DNA patterns of Suspect 1 "occur with a frequency of on in 641,100,000 in the North American Black population" and the patterns for Suspect 2 "occur with a frequency of one in 636,500,000 in the North American Black population." The laboratory report gives no indication of any uncertainty about the "match" between the suspects and the vaginal sample, so it would appear that the DNA test provides damning evidence against both suspects. But let's look at the underlying results.
Figure 1 shows one of five autoradiograms (autorads) produced by the laboratory to show the DNA banding patterns of the samples tested. This autorad shows DNA profiles for the genetic locus known as D4S139. Three lanes of the autorad (on the far left, far right, and middle) display multiple bands called size markers. These bands are produced by fragments of bacterial DNA of known size. They are compared to the bands in the other lanes (which are produced by fragments of human DNA) in order to allow the size of these fragments to be estimated. The band patterns of the victim and the two suspects appear in vertical lanes on the left side of the autorad. Each individual has two bands. The position of these bands within their lanes indicates the genotype (for this locus) of the individual who provided the sample. Because this locus is an area where DNA is polymorphic (variable among individuals) the banding patterns tend to vary from person to person, as can be seen for the victim and two suspects. Banding patterns of the female and male portions of the vaginal aspirate appear on the right side of the autorad, along with a sample from a known individual run as a control. The banding pattern of the female vaginal extract is difficult to see due to a dark smear in its lane, which was probably cause by degradation of the DNA.
The key comparison is between the suspects' patterns and the pattern in the male vaginal extract. Two bands corresponding to those of Suspect 2 are clearly visible, indicating that he is a possible source of this DNA. Whether bands corresponding to those of Suspect 1 are also present is less clear. The two dots on the left side of this lane are felt-tip pen marks placed by the forensic analyst to indicate where he thought he saw bands. However, other experts were skeptical about whether the presence of bands could be reliably determined. And one expert thought the upper-most "band" in the male vaginal lane, if present, did not align closely enough with the upper band of Suspect 1 to be called a match. So the results shown on this first autorad, although incriminating for Suspect 2, are equivocal for Suspect 1.
Figure 2 shows a
second autorad produced in this case. The layout of the lanes is the same as the first
autorad, although it shows the genotypes of the samples at a different locus called
D10S28. Again the victim and two suspects have distinct banding patterns. Notice, however,
that Suspect 1 has only one band and that this band is in the same position as the lower
band of Suspect 2. The male vaginal extract lane again contains two bands corresponding to
those of Suspect 2, which provides additional evidence against him. However, it is
impossible to tell from this autorad whether a pattern corresponding to that of Suspect 1
appears in the vaginal extract because the only band matching his could be accounted for
by the DNA of Suspect 2. Additionally, the upper portion of the male vaginal extract lane
contains dark blotches, caused by technical problems in the assay that may obscure bands
of the second rapist. So the evidence remains equivocal as to Suspect 1.
Figures 3, 4 and 5 show the remaining autorads produced in this case. Each shows the DNA banding pattern of the samples for a specific genetic locus. For each locus, the victim and two suspects have distinct banding patterns. For each locus the male vaginal extract lane contains a clear pattern corresponding to that of Suspect 2, providing very strong evidence against him. Whether the male vaginal extract lane also contains a banding pattern matching Suspect 1, however, is ambiguous at each locus. In the autorad shown in Figure 3, for example, several experts saw no band corresponding to the lower band of Suspect 1. Even the forensic analyst, who was the only person who claimed to see a band there, admitted uncertainty about it. The forensic analyst had more confidence in the presence of a band corresponding to the upper band of Suspect 1, but other experts dismissed this putative band as meaningless "schmutz" (a smudge) on the autorad, saying it lacked the morphology of a true band. The situation was the same for the autorad shown in Figure 4. For the autorad shown in Figure 5 no one (not even the hyper-vigilant analyst) detected a band corresponding to the lower band of Suspect 1 in the male vaginal extract lane. The forensic analyst thought he saw a band corresponding to the upper band of Suspect 1, but other experts were adamant that no band was there.
To summarize, the laboratory report indicated that the DNA test had produced powerful evidence against both suspects--a five-locus match between each of them and the DNA found in semen extracted from the victim. The report gave no indication that the evidence against Suspect 1 was weaker than that against Suspect 2. Indeed, because the DNA profile of Suspect 1 was slightly rarer than that of Suspect 2, one might infer that the DNA evidence against him is slightly stronger. Examination of the underlying autorads confirmed a clear, unambiguous match with Suspect 2, but indicated the evidence against Suspect 1 was ambiguous and equivocal.
I was involved in this case as co-counsel for Suspect 1. My initial suspicion, after examining copies of the autorads, was that the forensic analyst had fallen victim to examiner bias (i.e., a tendency to see what one expects). The analyst knew my client was a suspect and could see my client's DNA pattern (from his blood sample) while making the judgment of whether bands corresponding to his were present in the vaginal extract. I feared that the analyst might intentionally or unintentionally have conformed his judgment to the police theory of the case, which held that my client was one of the rapists.
When I raised concerns about examiner bias during the pretrial phase of the case, however, the prosecution took the position that the autorads had been scored objectively by a computer-assisted imaging device. A scanner was used to create a digital image of each autorad and these images were scored by a computer program that detects the presence of bands in each lane according to their optical density, making the process entirely objective. Because I was skeptical of this claim, I obtained a court order, which required the forensic laboratory to re-score the autorads with the computer-imaging device while an independent expert and I watched.
During this re-scoring, the claim that the process was objective evaporated. In order to detect bands in the male vaginal extract lane that corresponded to those of Suspect 1, the analyst had to increase the sensitivity of the computer to the point that it detected many additional "bands" that matched neither suspect. The analyst then performed a "manual override" of the computer's scorings, instructing the computer to "delete" (i.e., ignore) all of the bands that matched neither suspect. An image of the autorad appeared on a computer screen, with green lines indicating places where the computer had detected a "band." The analyst was able to delete any bands that were not deemed to be "true" bands though a simple point-and-click operation with the computer mouse. The software program also allows an analyst to re-position the "bands" using the mouse.
When asked to state the basis for deleting some bands while leaving others, the analyst responded that he could "tell by looking" that the undeleted bands (which happened to match my client) were true bands, while the others were not. A number of the deleted bands had higher optical densities than the bands scored as matching my client. So much for objectivity.
The re-scoring also resolved another issue. As the computer scored the bands, it compared their position to that of the size markers in order to estimate the size of the underlying DNA fragments designated by each band. Forensic laboratories use these sizings to determine whether bands of different samples align closely enough to be called a match. The laboratory had previously reported that the upper band of Suspect 1 for locus D4S139 (see Figure 1) was a perfect match with the highest band in the male vaginal extract lane. But the re-scoring showed that the two bands differ in size by over 9 percent. Because the policy of the laboratory called for a match to be declared only if the sizes of bands differ by less than 4 percent, this new scoring arguably excluded Suspect 1 as a potential contributor of the DNA that gave rise to the upper band in the male vaginal extract lane. It thereby confirmed the suspicion of the independent expert who, based on visual examination, doubted that there was a match.
Additionally, independent scorings of both the digital image of the autorad and a photographic copy of the autorad confirmed that the initial scoring (showing a perfect match) was wrong and that the re-scoring (showing an exclusion) was correct. How was it, then, that the laboratory had initially scored these non-matching bands as a perfect match? My theory is that during the initial scoring the analyst performed a manual override of the computer to re-position the bands and make them match. Perhaps the analyst saw enough similarity between the DNA pattern of Suspect 1 and faint bands in the male vaginal lane to confirm his suspicion that Suspect 1 was guilty, and then took steps to improve the quality of the match to help police make the case against him. When the problems with the DNA evidence came to light, the District Attorney offered my client, Suspect 1, a favorable plea bargain arrangement, which he accepted.
This case shows that DNA test results are not always clear cut. More importantly, it illustrates how an analyst may draw damningly incriminating conclusions from data that are ambiguous or even exculpatory. In my view, innocent people are far more likely to be falsely incriminated through biased interpretation of ambiguous DNA test results than through coincidental matches with persons having the same profile. Consequently, I believe that the issue of how frequently ambiguities arise in DNA tests, and how laboratories deal with them, warrants far more attention than it has received.
Ambiguous DNA test results can arise in a number of ways. Faint results, such as those just discussed, are quite common, particularly in cases involving mixed DNA samples. Minor inconsistencies between DNA profiles are also common. Bands may be somewhat misaligned (as in the autorad shown in Figure 1), or the number of bands observed may differ. The analyst must then decide whether the discrepancies reflect true genetic differences or are simply the result of variability in the assays. In PCR-based tests, where the results are sometimes shown in a pattern of dots on test strips, ambiguities are even more common. The analyst must decide whether to "score" faint dots. When there are discrepancies between the patterns of two samples, the analyst must decide whether to attribute them to true genetic differences between samples or to technical problems in the assays, such as the failure to detect certain alleles due to degradation of the DNA or the appearance of spurious extra dots due to cross-hybridization or contamination. Whether or not experimental controls have failed (and thereby invalidated the test) is sometimes also an issue that turns on subjective judgment.
Because ambiguities in the test results are resolved based on subjective judgment, the analyst has license to invoke all manner of ad hoc, unverified scientific reasoning in service of whatever interpretation is preferred. Analysts sometimes dismiss inconsistencies between profiles or problems with the test results (such as failed controls) by invoking ad hoc explanations that they fail to test empirically. Moreover, their ad hoc explanations sometimes shift with changing circumstances, making them inconsistent from case to case. To make matters worse, analysts sometime rely on other evidence in a case to resolve ambiguities in DNA test results. I heard one forensic analyst defend the scoring of an ambiguous band (a judgment that incriminated a defendant in a rape case) by saying "I must be right, they found the victim's purse in [the defendant's] apartment."
Inferential bootstrapping of this sort is inevitable when analysts fail to use blind or objective scoring procedures. It can be terribly prejudicial to the defendant because it allows the analyst (by relying on other evidence in the case) to convert otherwise equivocal DNA results into a seemingly damning incrimination. To the trier-of-fact it appears that the DNA test results is an independent piece of evidence against the defendant. In fact, the power of the DNA evidence is derived in part from other evidence that the jury may already have considered. Consequently, the jury may double-count evidence against the defendant.
The danger of examiner bias has led scientists in many fields to insist that procedures for interpretation of potentially ambiguous data be either blind or objective. Both reports of the National Research Council call for the use of blind or objective "scoring" procedures by forensic DNA laboratories. But forensic laboratories have not followed the NRC's recommendations in this area. Their failure to do so cannot be explained on scientific grounds. If there is a scientific justification for the continued use of subjective interpretive procedures in forensic DNA testing, in the face of contrary recommendations from the broader scientific community, it has yet to be articulated in the forensic science literature.
To understand the persistence of poor interpretive practices in forensic science we must look beyond science to the sociology of the field. In my view, forensic scientists persist in relying on subjective judgment because they value their discretion to find the "right" result in close cases more highly than they value scientific rigor. Faced with a choice between interpretive procedures that are scientifically rigorous, and procedures that maximize the analyst's discretion to control the outcome of ambiguous cases, many forensic scientists will opt for discretion over rigor whenever they can get away with it.