After the events in Ferguson, MO last year, police related fatalities have become a major talking point in the United States. One of the most common claims made during this discussion is that a black people are more likely to be killed in police altercations. This claim is almost always backed up by listing the many cases that have been seen in the news recently.
While these lists might hint that there is a racial bias in these fatalities, they are in no way proper statistical evidence. This is the equivalent of accusing someone of cheating at a game of dice and listing 5 or 6 of their recent wins as evidence. It is possible that they are a completely honest player on a hot streak. For proper evidence, we need to complete a hypothesis test.
What is a hypothesis test?
A hypothesis test is like a courtroom for numbers. All claims are assumed false until proven true beyond reasonable doubt.
For example, we can test whether a dice is loaded by rolling it 600 times. Assuming the die is fair, we would expect to roll each side 100 times. After completing 600 rolls, we get this distribution.
Clearly this die is rolling more 6s and less 1s than it should be, but that doesn’t instantly mean that it is loaded. We complete a statistical test to answer the question: “What is the probability a fair die would produce a distribution as divergent or more divergent from the perfect 100-100-100-100-100-100 distribution than this one?” I won’t bore you with the math, the probability in this case is .07. This means that a fair die has a 7% chance of producing a distribution as divergent as this one. 7% is a large enough chance that we have reasonable doubt that the die is loaded, so it is not strong enough evidence to make any conclusions. Therefore, since a claim is false until proven true, we say that this die is not loaded.
Testing police-related fatality data with respect to race
Fatalencounters.org is a crowd-sourced database of fatalities involving law enforcement. This data lets us create a scenario very similar to the dice test we just completed. To find the observed data values, we simply count how many people of each race are in the database. The expected values required a little more thought.
We could have just taken the national U.S. racial distribution and extrapolated from there, but that would not have been accurate because these incidents are not evenly distributed around the U.S. Instead, we can take the zip code that each death occurred in and add the local racial distribution to the total (population data from city-data.com). For example, Ferguson, MO has a 36% White, 61% Black, 1.6% Latino/Hispanic, .6% Asian and .4% Native American population, so for each entry in Ferguson we would add .36 to White expected, .61 to Black expected, .016 to Latino/Hispanic expected, and so on.
This is the result of that process.
There seems to be a bias toward black in the data. We must still complete a proper test to be sure. The answer to the question: “What is the probability that we would see a distribution as or more extreme than this one assuming that race plays no factor in police-related deaths?” is on order of 10-82. For reference, the chance of a person getting struck by lightning 13 times this year is on order of 10-79. This is very strong statistical evidence that there is a systematic difference in the race of individuals killed by police actions.
Of course, there is the obligatory correlation does not mean causation reminder. This evidence tells us that a discrepancy exists, but it says nothing about why the discrepancy exists. Also, I am putting faith in the legitimacy of the data from fatalencounters.org. If this data is biased in any way, than any test done on it will also be inaccurate.