Last week, in the case of State v. Nieves, the New Jersey state Supreme Court became the first in the country to ban prosecutors from introducing “shaken baby syndrome” or “abusive head trauma” (SBS/AHT) claims in criminal cases, at least where the only claim of abuse is the shaking itself (as opposed to, say, shaking alongside physical blows to the head). The case came closely on the heels of the Texas Court of Criminal Appeals pointing to the state’s “junk science” law when it stayed the execution of Robert Roberson, who was convicted of murder in an SBS/AHT case in 2002.
The outcome in Nieves reflects a growing backlash against SBS/AHT claims in criminal cases. While New Jersey’s was the first state supreme court to exclude the evidence, the justices in Nieves note that courts in other states have become increasingly critical of such evidence (while also acknowledging that other courts continue to broadly approve it). And while numerous medical groups continue to insist that it is a valid medical diagnosis, a growing literature, especially in biomechanics, has cast doubt, arguing, for example, that the amount of force needed to cause the injuries associated with SBS/AHT would likely cause serious injuries to the neck first.
The case highlights the challenge that empirical evidence poses for a legal system that has very few practitioners — lawyers or judges alike — with scientific or statistical training.
Given that the criminal legal system is supposed to err in favor of defendants, the skepticism shown by the court in the Nieves case toward SBS/AHT strikes me as the right way to balance things, given the growing uncertainty about the claim. More importantly, the Nieves case provides a useful way to highlight at least two broader challenges posed by forensic evidence — and scientific evidence more broadly — in the legal system.
To start, the case is a good reminder that a tremendous amount of the forensic evidence used in criminal cases rests on empirical support that is thin to almost nonexistent.
In 2009, for example, the National Academy of Sciences’ National Research Council released a compelling report on forensic evidence that argued that entire swaths of forensic evidence — such as bite-mark matches, blood-splatter patterning, handwriting comparisons and so on — had little to no empirical validation, no way to calculate things such as false positive or false negative rates and often rested primarily on instinct and intuition. It also found that even more established practices such as fingerprint evidence were likely far less reliable than conventional wisdom suggests.
Although that report did not look at eyewitness testimony, a later one did, and it documented widespread concerns with that evidence, too. According to the Innocence Project, mistaken eyewitness testimony played a role in roughly 70% of all convictions that have been later overturned by DNA evidence. And that DNA evidence itself may be fairly well-validated in general, but it is only as reliable as the labs that test it, which sadly are not always run well. Even video evidence now faces an existential threat in the form of deep fakes and other artificial intelligence advances.
But perhaps more important, the case highlights the challenge that empirical evidence poses for a legal system that has very few practitioners — lawyers or judges alike — with scientific or statistical training (perhaps under 10% of law students have a STEM background, and only a few law schools provide any statistical training to their students).
In many cases, it is not easy to get clean evidence about the issues the law cares about.
In many cases, it is not easy to get clean evidence about the issues the law cares about. It’s simply not possible to run a randomized clinical trial to see what happens when you shake a baby, so we have to rely on indirect approaches. For SBS/AHT, the seminal study wasn’t about humans at all but, rather, about the trauma suffered by sedated monkeys experiencing whiplash at 30 mph. Subsequent studies then extrapolated these results to humans, often looking at nonrandom samples of babies already suspected of being abused.
There are lots of scientific jumps being made here — like in this context, are sedated monkeys similar enough to human babies, and how comparable is a 30 mph accident? Many such jumps may be justifiable, but all raise concerns — concerns that it may be hard for lay judges to parse.









