Wednesday, May 26, 2004

False Positives and the War on Terrorism

This MSNBC article is an object lesson in the importance of understanding elementary statistics when dealing with rare events. Seeing as terrorists are an extremely rare phenomenon (in Western populations at least), cases of mistaken identity like this one ar bound to be the rule rather than the exception, particularly if the authorities decide to err on the side of caution against attacks instead of abiding by normal standards of evidence. Even with a 99.5 percent accuracy rate, most people who will end up getting arrested for suspected anti-terrorist activities will turn out never to have been guilty of anything.

PORTLAND, Ore. - Offering a rare public apology, the FBI admitted mistakenly linking an American lawyer’s fingerprint to one found near the scene of a terrorist bombing in Spain, a blunder that led to his imprisonment for two weeks.

The apology Monday came hours after a judge dismissed the case against Brandon Mayfield, who had been held as a material witness in the Madrid bombings case, which killed 191 people and injured about 2,000 others.

Mayfield, a 37-year-old convert to Islam, sharply criticized the government, calling his time behind bars “humiliating” and “embarrassing” and saying he was targeted because of his faith.

“This whole process has been a harrowing ordeal. It shouldn’t happen to anybody,” said Mayfield. “I believe I was singled out and discriminated against, I feel, as a Muslim.”

[............]

Court documents released Monday suggested that the mistaken arrest first sprang from an error by the FBI’s supercomputer for matching fingerprints and then was compounded by the FBI’s own analysts.

[............]

Mayfield, a former Army lieutenant, was released last week. But he was not altogether cleared of suspicion; the government said he remained a material witness and put restrictions on his movements.

I think that final quoted paragraph says it all, really: being accused of "terrorism", like accusations of "rape" and certain other crimes thought particularly heinous, is one of those things that tend to stick in the minds of authority figures and the public at large, whatever the courts may decide to the contrary. Most people seem completely unable to shake the assumption that where there's smoke, there has to be fire - one obviously would never have come to the attention of the authorities if one weren't guilty of something, right? I'd highly recommend reading what Bruce Schneier has to say about the issue of false positives in this issue of his Crypto-Gram newsletter:

Last month the U.S. Justice Department administratively discharged the FBI of its statutory duty to ensure the accuracy and completeness of the National Crime Information Center (NCIC) database. This database is enormous. It contains over 39 million criminal records. It contains information on wanted persons, missing persons, and gang members, as well as information about stolen cars, boats, and other information. Over 80,000 law enforcement agencies have access to this database. On average, there are 2.8 million transactions processed each day.

The Privacy Act of 1974 requires the FBI to make reasonable efforts to ensure the accuracy and completeness of the records in this database. Last month, the Justice Department exempted the system from the law's accuracy requirements.

This isn't just bad social practice, it's bad security. A database with more errors is much less useful than a database with fewer errors, and an error-filled security database is much more likely to target innocents than it is to let the guilty go free.

To see this, let's walk through an example. Assume a simple database -- name and a single code indicating "innocent" or "guilty." When a policeman encounters someone, he looks that person up in the database, and then arrests him if the database says "guilty."

Example 1: Assume the database is 100% accurate. If that is the case, there won't be any false arrests because of bad data. It works perfectly.

Example 2: Assume a 0.0001% error rate: one error in a million. (An error is defined as a person having an "innocent" code when he is guilty, or a "guilty" code when he is innocent.) Furthermore, assume that one in 10,000 people are guilty. In this case, for every 100 guilty people the database correctly identifies it will mistakenly identify one innocent person as guilty (because of an error). And the number of guilty people erroneously listed as innocent is tiny: one in a million.

Example 3: Assume a 1% error rate -- one in a hundred -- and the same one in 10,000 ratio of guilty people. The results are very different. For every 100 guilty people the database correctly identifies, it will mistakenly identify 10,000 innocent people as guilty. The number of guilty people erroneously listed as innocent is larger, but still very small: one in 100.

The differences between examples 2 and 3 are striking. In example 2, one person is erroneously arrested for every 100 people correctly arrested. In example 3, one person is correctly arrested for every 100 people erroneously arrested. The increase in error rate makes the database all but useless as a system for figuring out how to arrest. And this is despite the fact that, in both cases, almost no guilty people get away because of a database error.

The reason for this phenomenon is that the number of guilty people is a very small percentage of the population. If one in ten people were guilty, then a 0.0001% error rate would mistakenly arrest one innocent for every 100,000 guilty, and a 1% error rate would arrest approximately one innocent for every guilty. And if the number of guilty people is even less than one in ten thousand, then the problem of arresting innocents magnifies even more as the database has more errors.

[............]

This kind of thing is already happening. There are 13 million people on the FBI's terrorist watch list. That's ridiculous, it's simply inconceivable that a number of people equal to 4.5% of the population of the United States are terrorists. There are far more innocents on that list than there are guilty people not on that list. And these innocents are regularly harassed by police trying to do their job. And in any case, any watch list with 13 million people is basically useless. How many resources can anyone afford to spend watching about one-twentieth of the population, anyway?

When even staunch right-wingers like Bob Barr are willing to join hands with the ACLU to protest the PATRIOT Act and other measures that bypass the traditional safeguards against shoddy law enforcement, one would be foolhardy to dismiss it all as so much whining by malcontents who fail to appreciate the true magnitude of the threat. In reality, it is the champions of such measures who lack a sense of proportion.