Data, versus Numbers, versus Proof
In The Atlantic last week I wrote a fairly detailed dissent to Adrienne LaFrance's article on police killing. This is to add to that.
None of the statements in that dissent (including my mention of the findings of Mr. Fryer, see below) was to suggest that there is a “correct” number, and I think that that is my overall point.
I wrote,
So now comes Roland G. Fryer, Jr., a frickin’ genius at Harvard who just released, as you noted earlier this week, a study titled “An Empirical Analysis of Racial Differences in Police Use of Force.” The beauty of this study (it’s a working paper, covered gushingly in The New York Times) is actually not the conclusions, but the methodology. Even if you’ve never read a methodology section before, I urge you to read his. His method of capture of narrative into coded data was simply wonderful—in that, for the first time, someone took the trouble to treat police data like most other data.
I really meant it - the methodology was what was so good because Fryer was trying some new things to code police narratives into structured dataThis was cool. A team of researchers read the narratives and then coded these into a set containing data on 65 pre-determined variables in six categories plus race, age and gender: (A) suspect characteristics, (B) suspect weapon(s), (C) officer characteristics, (D) officer response reason, (E) other encounter characteristics, and (F) location characteristics. Then Fryer hired another group to work backwards and see if they could recreate the original narratives, which they could. Which is cool, but also shows me so many problems that can or did arise when non-trained academics tried genuinely hard to do accurately what they did. It's my theory that these researchers labeled and worked with data while in some cases not understanding that the label means something different to the researchers than it means to the police. Similarly, this is how many journalists, reporters and stenographers screw up when covering law enforcement - they think they understand certain details, but they don't. It is a very basic human problem of which all of us have been guilty., which for a non law enforcement person is difficult. Why? Two reasons. First, law enforcement data is difficult to understand to those without actual expertise in law, law enforcement process, policy and procedure. In the words of Oakland Police Department Lt. Chris Bolton, "each piece of police data represents a number of circumstantial or decisional variables."
Second, because cops sometimes lie. Not nearly as much as you'd think, but sometimes.
We included a fairly detailed disclaimer of this (with thanks to direct input from police monitor Walter Katz) in our book: “[O]fficial police reports are written by police officers either directly involved with the incident, or based upon interviews with those directly involved in an incident. It has been demonstrated that officers have lied in the preparation of their statements and reports. This is an enormous challenge and impediment to our work. These lies can be direct, or they may be lies of omission. There are criminal penalties for falsifying these reports and statements, which are in public records. The chances of dishonesty being uncovered get higher each year with the increased scrutiny of civil litigation attorneys, investigative reporters, public advocacy groups, and projects including this one. Civil lawsuits, when filed, especially tend to bring forth many more documents and testimonial records than prior to litigation.
We recognize fully that basing our analyses on these sources comes with risks and challenges. Even if you don’t believe our analysis is better, we believe it is different from those of the available analysis in the press. We of course believe that our formal training in use of force and subject-matter expertise in law enforcement make our analysis better.
By focusing on the sources previously mentioned, in the order stated, the StreetCred Police Killings in Context data project makes every effort to remain unbiased and factual.
So that's what Fryer, and every other researcher, is up against. It is why it is so difficult to use police data to blame anyone. more on that - a lot more on that - in the coming weeks.
Anyway, I was remiss in my dissent in The Atlantic (which was, after all, 1800 words) not to mention the excellent work of Cody Ross (“A Multi-Level Bayesian Analysis of Racial Bias in Police Shootings at the County-Level in the United States, 2011–2014”) in PLOS, which looks to the U.S. Police-Shooting Database (USPSD) for answers to these questions and found "significant bias in the killing of unarmed black Americans relative to unarmed white Americans, in that the probability of being {black, unarmed, and shot by police} is about 3.49 times the probability of being {white, unarmed, and shot by police} on average.
I have a panoply of bones to pick with the USPSD, but not with Mr. Ross' superb analysis but W. David Ball has done an extraordinary analysis of that, plus a description of Bayesian analysis in general and a super-cool critique of the Fryer paper in particular. Andrew Gelman's blog had a terrific analysis of the same piece, and Uri Simonsohn went to town on the specifics of some of the data assumptions that raised his eyebrows towards K2.
And then, last week, came Radley Balko, who wrote in his superb The Watch blog at the Post, that it is quite simply impossible to calculate the percentage of police shootings that are legitimate.
Radley and I spoke (he got the 74% analysis only partly right in his column, believing mistakenly that the 16% I mentioned in my note were part of the 74% of killings that interrupted a fatal attack).
The reason the Fryer paper, and Fatal Encounters, and the Ross paper, and David Bell's critique, and StreetCred Police Killings in Context database, and my book are so important is that they are serious attempts to take almost comically imperfect data about monumentally complex, insanely multi-variate situations and dive in to one or two aspects of them in a way that can provide meaningful insight.
If you were to put together a list of the most serious people looking at these issues, I think that all those names I just mentioned, plus those I've mentioned throughout this note would certainly be in the Top 20.
And the fact is that we can geek the hell out over these numbers and grant points and get great takeaways, but I buried the lede and it is worth repeating:
Law enforcement data is the one place where Occam’s Razor does not apply. Journalists who state facts based on studies do so at their peril.