Throwing the towel in
by David Moss May 2011
Mr Blunkett is not alone in suffering from that delusion. Many politicians continue to believe that the biometrics emperor is sumptuously dressed when actually he is as naked as the lie told by a snake oil salesman. It is the view of many Whitehall officials also, who continue to believe, for no reason they can give, that mass consumer biometrics will help to detect and prevent crime and to deliver public services more efficiently and to counter terrorism. So much so that they spend hundreds of millions of pounds of taxpayers' money on biometrics technology. This tulipmania is not restricted to the UK and it is not restricted to politicians and civil servants. It affects privacy campaigners, the expression of whose fears amounts unintentionally to a powerful unsolicited testimonial, thereby boosting the sales of biometrics technology. It affects journalists. And it affects normal people. There is a surprising certainty evident in most people even people who announce proudly that they haven't got a clue how any technology works that biometrics work. They can get positively tetchy if you suggest that they're wrong and that mass consumer biometrics don't work. Its source is a mystery but this inexplicable certainty of theirs is jealously guarded and seems to survive any number of adverse encounters with reality.
Is there any hope of inductively extending the results of our technical test more broadly to any other algorithms or databases? A Type B systematic uncertainty evaluation after consideration of changes in the unit of empirical significance and statistical controls over its tangible elements might be of value, provided that the specifics of the changes could be given, but we should not sanctify such a “guesstimate” in an emperor’s cloak of imagined analytic rigor. "Inductively extending"? "Type B systematic uncertainty evaluation"? "The unit of empirical significance"? Who writes this elegant prose? James L Wayman, Antonio Possolo and Anthony J Mansfield, referred to collectively henceforth as "WaPoMa", that's who. Who are, or is, WaPoMa? Three titans of the biometrics industry. Mr Wayman is at San José State University, Mr Possolo at the US National Institute of Standards and Technology (NIST) and Mr Mansfield at the UK National Physical Laboratory. Between them, they provide the academic foundation for the biometrics industry in the Western world, they speak with authority, they have earned the respect they command in academia, in industry and in government. (To all those denizens of academia and industry not listed here who also contribute to the foundations of biometrics, apologies.) And what is WaPoMa saying? He's saying that: You can't extrapolate from the results of a biometrics technology test. At the end of a technology test, all you know is that the results are what the results are. You can't use the results of one test to predict how well or badly any biometrics package will perform in another test. The reason for that is the recalcitrant uncertainty in the test. That's why you can't extrapolate, or "inductively extend". Uncertainty. And although to an optimist it "might be of value" to try to measure that uncertainty, no says WaPoMa with his scrupulous scientist's candour, that measure would just be a guess, the emperor would be naked, the figures given would have no "analytic rigor" (or "analytical rigour", as we say here in the UK).
In his paper, Fundamental issues in biometric performance testing: A modern statistical and philosophical framework for uncertainty assessment, WaPoMa is unsparing. Biometrics technology tests are designed to measure, among other things, the false positive and false negative rates of any number of rival biometrics packages. You can only measure these quantities if they are under "statistical control". And in the world of mass consumer biometrics, they're not, says WaPoMa. So what? What does that mean? By way of explanation, WaPoMa provides this quotation from another luminary of the world of metrology, Churchill Eisenhart: ... a measurement operation must have attained what is known in industrial quality control language as a state of statistical control ... before it can be regarded in any logical sense as measuring anything at all. Not to put too fine a point on it, according to WaPoMa, given the current state of uncertainty in the field of biometrics, a field which is statistically out of control, researchers don't even know what they're measuring when they perform a technology test. What politicians want to know (and public servants and the law and the media and the public) is do biometrics work? Yes? Or no? And what WaPoMa says is that technology tests can't help to answer that question. He doesn't say that tests can't tell you if biometrics work, whatever "work" means. Much more elegantly, he says that for a given biometrics package, no current test can give you an "intimation of its operational acumen". (I wish I'd said that. You will, Oscar, you will.) In case anybody missed the point, WaPoMa spells it out: ... technology testing on artificial or simulated databases tells us only about the performance of a software package on that data. There is nothing in a technology test that can validate the simulated data as a proxy for the “real world”, beyond a comparison to the real world data actually available. In other words, technology testing on simulated data cannot logically serve as a proxy for software performance over large, unseen, operational datasets.
A biometrics technology test is conducted in the lab and is entirely computer-based. An operational test is conducted in the field, in the real world, with the biometrics package coming under attack from the unpredictable torrent of humanity trying to clear immigration at a US airport, for example, or trying to get into an Olympics venue. Given that researchers don't know what they're measuring even in a technology test, according to WaPoMa, there can be no way to measure the performance of an operational biometrics system, where the level of uncertainty is even greater. Back in May 2004, NIST reported that they had tested the flat print fingerprinting technology that was to be used in US-VISIT, the border control system non-US nationals go through when they try to get into the US. That is an example of a technology test. So is their March 2007 report on face recognition technology, i.e. iris scans and facial geometry. Both tests were entirely computer-based. So is the International Fingerprint Verification Competition that used to be run by four universities but seems now to have been discontinued, possibly because the results tell no-one anything. The reports of all those three technology tests have been published. Only IBM's technology trial, conducted to choose the "best" biometrics system for ePassports for UK nationals and residence permits for non-EEA residents of the UK, remains a state secret. NIST predicted a false reject rate of 0.5% in their May 2004 report. The operation of US-VISIT was later reviewed by the Office of the Inspector General. OIG's December 2004 report revealed that 118,000 people presented themselves to US-VISIT every day on average, most of them got through the primary/biometrics inspection, 22,350 of them were referred to secondary inspection, 1,811 of those were refused entry, and the rest were let in. That means that (23,500 - 1,811) / 23,500 = 92% of secondary inspections were a waste of time. Not the sort of accuracy you associate with a working 21st century technology. And it means that the false reject rate was (22,350 - 1,811) / 118,000 = 17.4%, just a tad different from the 0.5% predicted by NIST WaPoMa knows whereof he speaks: "technology testing on simulated data cannot logically serve as a proxy for software performance over large, unseen, operational datasets".
WaPoMa has a lot to say about scenario testing. What it adds up to is, don't bother:
Overall, in WaPoMa's own words: We can conclude that the three types of tests are measuring incommensurate quantities and therefore [we] should not be at all surprised when the values for the same technologies vary widely and unpredictably over the three types of tests. That's quite enough quotation from WaPoMa's paper. You get the picture technology tests are hard to interpret, it's not clear what's being measured, they certainly can't be used to predict the results of scenario tests, which are hard to interpret, it's not clear what's being measured, and neither technology tests nor scenario tests can be used to predict the performance of biometrics systems in operation in the real world, which can't be measured anyway, not least because real impostors don't give themselves up after they've fooled the system and got through, they just don't have that researcher's enthusiasm for maintaining the statistics.
If the border control authorities in the UK, Australia and New Zealand are asked why they have spent a fortune deploying so-called "smart gates" at international airports, their answer can't be "because tests show that the technology works so well". They can't say that because no-one knows what it means, the researchers don't know what they're measuring in a test. So what was it? A metrological impulse purchase? Why are the UK Home Office spending taxpayers' money on the biometrics in ePassports and in residence permits for non-EEA nationals? (That's £650 million of taxpayers' money, split between IBM and CSC.) Why are the Home Office paying VFS Global and CSC to register the biometrics of millions of visa applicants all over the world like so many schoolboy stamp collectors? Why are UK nationals paying three times the correct price for a passport? Why has Pakistan bothered to register the biometrics of 96 million citizens and to issue 70 million of them with biometric ID cards? All that effort. And the result? Not the harmonious state of law-abiding politically tranquil domestic peace and efficient public services sometimes touted as the automatic consequence of ID card schemes. Why is India spending billions on Aadhaar, which depends on biometrics whose reliability is, so say the titans, utterly unknowable? And will the Unique Identification Authority of India ever answer my question how they can claim to offer unique identification when, based on their own figures, they would have to perform 18,000,000,000,000 manual checks to prove uniqueness? And why do they think Aadhaar will eradicate corruption, rather than automate corruption? Why does Safran Group want to spend $1.5 billion acquiring L-1 Identity Solutions Inc., a biometrics company whose technology is statistically out of control? And what rare loss of financial control caused 3M to splash out on buying Cogent Inc.? The questions keep coming. Why is it only governments that believe in biometrics? How come the banks and the major retailers seem to be proof against this particular form of tulipmania? (Thank goodness they are proof against it. Any nation that inserted today's mass consumer biometrics into its payments systems would be instantly reduced to barter.) On what basis does the European Commission spend its member states' money collecting the biometrics of millions of non-EEA visa applicants? Does China really believe (superstitiously?) that biometrics will provide a Golden Shield against political unrest? Are Russia about to introduce biometric visas? No-one knows according to the St. Petersburg Times, not even the Russians. But that's enough questions for the moment because the answer is the same in each case there isn't an answer.
Are WaPoMa right? If the tulipmania persists, if you still think that there are established and trusted biometrics systems dutifully working away, all day every day, all over the world, helping to provide reliable identity management services to populations of 10 million people, 50 million, 100 million, ..., ask yourself what it is that you know about biometrics, measurement and probability that Jim Wayman, Antonio Possolo and Tony Mansfield don't know. It'll be a short list, but do send it to them.
It is to be hoped that it will soon be joined by IBM's biometrics technology report, the "tulip bulb" that is costing UK taxpayers £650 million, with no known benefit. WaPoMa finds it necessary to emphasise that researchers must be careful how they describe their results, they must take into account how their non-technical listeners will understand the researchers' words and how they will use or, in the case of politicians and their officials, almost certainly (Type A) misuse the results. That must be the most painful confession in WaPoMa's paper. Do you really need to do research to discover that you should only say what you mean, that you should say it clearly and that you should only say it if you believe it to be true? Isn't that redundant? Or otiose? Taken for granted? Apparently not. It's certainly taken NIST a long time to learn the lesson. What on earth did NIST think people would understand from their May 2004 report on flat print fingerprinting when they wrote: With the proper selection of an operating point, the one-to-many accuracy for a two-finger comparison against a database of 6,000,000 subjects is 95% with a false match rate of 0.08%. Using two fingers, the one-to-one matching accuracy is 99.5% with a false accept rate of 0.1%. No layman reading that is going to understand that the figures come from a technology test and can't be extrapolated to operational systems. NIST didn't add: ... and by the way we don't know what we've been measuring, that 0.08% false match rate achieved using Cogent products can't be reproduced using another package and it can't be reproduced using Cogent products on another database, forget the 0.1% false accept rate because in an operational system real impostors don't turn themselves in, and please don't get the idea that there's any statistical control in this test of ours for the best possible reasons, we actually haven't got a clue whether flat print fingerprinting will help to protect the US's borders. That omission was mendacious. Judging by the WaPoMa paper, NIST should have made the addition suggested or something like it. One month later they made up for it. A bit. The USA PATRIOT Act 2001 specifies at section 403(c)(1) that NIST has to certify a technology that verifies people's identity: The Attorney General and the Secretary of State jointly, through the National Institute of Standards and Technology (NIST), and in consultation with the Secretary of the Treasury and other Federal law enforcement and intelligence agencies the Attorney General or Secretary of State deems appropriate and in consultation with Congress, shall within 2 years after the date of the enactment of this section, develop and certify a technology standard that can be used to verify the identity of persons applying for a United States visa or such persons seeking to enter the United States pursuant to a visa for the purposes of conducting background checks, confirming identity, and ensuring that a person has not received a visa under a different name or such person seeking to enter the United States pursuant to a visa. [emphasis added] That's what the Act says and, in all honesty, NIST cannot possibly comply. How are they supposed to know if the biometrics used in any particular case are a reliable proxy for someone's identity? It's completely out of their control. They can't put their name to it. So what NIST say in their certificates, according to their June 2004 review of flat print fingerprinting technology, is: For purpose of NIST PATRIOT Act certification this test certifies the accuracy of the participating systems on the datasets used in the test. This evaluation does not certify that any of the systems tested meet the requirements of any specific government application. This would require that factors not included in this test such as image quality, dataset size, cost, and required response time be included. There it is, the irreducible inanity of today's mass consumer biometrics is certificated. WaPoMa's paper was delivered at a March 2010 NIST conference. Rarely has a towel been so well and truly and elegantly and clearly and precisely and comprehensively thrown in.
In May 2010, a new government was elected in the UK. They immediately cancelled the plans to register the biometrics of all British citizens. (Not that those plans were very far advanced, Whitehall only having had eight years to work them out.) And in December 2010 they repealed the Identity Cards Act 2006. Anticipated volumes for the biometrics industry in the UK are down. Volumes, and political support. In April 2011, over the signature of President Obama himself, the White House issued its National Strategy for Trusted Identities in Cyberspace. There is not a single occurrence of the word "biometrics" in the whole 45-page document, nor any of its cognates. Volumes and political support down. Again in April 2011, the Cabinet Office issued restricted documents describing the UK's proposed Digital Delivery Identity Assurance project. In 75 pages there is not a single occurrence of the word "biometrics" nor any of its cognates. Volumes and political support down. The Home Office's biometrics tulipmania may now at last have been shaken off by the rest of the UK government. And yet again in April 2011, with their first-hand experience of tulipmania, the Dutch government suspended its plans to develop a centralised flat print fingerprint population register: "home affairs minister Piet Hein Donner ... says there are currently too many concerns about the security and reliability of the system". Volumes and political support down. Perhaps the tide is going out, the pandemic is receding, ... Governments are starting to peel away. And WaPoMa has pulled the academic rug out from under the biometrics industry's feet. The biometrics companies may be feeling a little lonely with their academic support, their aadhaar, gone. Maybe even a little nervous. If India cancels Aadhaar, where else can these companies ply their trade? Their planet is shrinking. (Ever resourceful, they are now planning to sell biometrics for orangutans.) The earth may look roughly flat but actually it is roughly spherical. It may feel like the centre of the universe but actually it's not even the centre of the Milky Way. There are medical ailments that leeches can't cure and not all future events can be predicted by inspecting the entrails of a sacrificed sheep, however attractive the astrological symbols on the priest's pointed hat. We know that. And now, thanks to WaPoMa, we know that there is no good reason to invest in mass consumer biometrics.
"Test data from scenario evaluations should not be used as input to mathematical models of operational environments that require high levels of certainty for validity". The decision to invest hundreds of millions of pounds of taxpayers' money requires high levels of certainty. Otherwise it's unbusinesslike, irresponsible, unscientific and illogical. What WaPoMa tells us is that, if the investment decision is based on biometrics tests, then the argument is invalid. WaPoMa's paper hasn't been repudiated. Not by San José State University, not by NIST and not by the National Physical Laboratory, all of which institutions advise governments the world over. There has been no stream of academic rebuttals. Nor has there been any public response from the tottering remnants of the mass consumer biometrics industry. There is nothing obviously wrong with his findings, the intimations of WaPoMa's operational acumen remain auspicious. Where did David Blunkett and everyone else get the idea that mass consumer biometrics work reliably? Coming from respected institutions like NIST, statements like "using two fingers, the one-to-one matching accuracy is 99.5% with a false accept rate of 0.1%" must have played a part. WaPoMa didn't write and publish his paper by accident. Why did he write it? He must have thought ahead to the effect his words would have, the perestroika that would follow his glasnost. He must have known that publishing his paper would impugn the credibility of the biometrics companies and the politicians and their officials who have let contracts to them. He went ahead anyway. Why? Was it an act of expiation/atonement for that 99.5% one-to-one matching accuracy? Was it a public service, pointing out that if the anticipated performance of biometrics systems can't provide the basis for investment in mass consumer biometrics, then there is no basis? Whatever, thank you, Messrs Wayman, Possolo and Mansfield. David Moss spent eight years campaigning against the Home Office's ID card scheme RIP. Whitehall haven't given up yet a national identity assurance service has appeared in their G-Cloud Programme. We shall see. |