Throwing the towel in

 

by David Moss

May 2011

 

Biometrics – a latter-day tulipmania
Biometrics “will make identity theft and multiple identity impossible. Not nearly impossible. Impossible”. That was the view of Rt Hon David Blunkett MP, speaking in 2003, when he was the UK's Home Secretary.

A tulip, known as "the Viceroy", displayed in a 1637 Dutch catalogue. Its bulb cost between 3000 and 4200 florins depending on size. A skilled craftsman at the time earned about 300 florins a year.

Mr Blunkett is not alone in suffering from that delusion. Many politicians continue to believe that the biometrics emperor is sumptuously dressed when actually he is as naked as the lie told by a snake oil salesman.

It is the view of many Whitehall officials also, who continue to believe, for no reason they can give, that mass consumer biometrics will help to detect and prevent crime and to deliver public services more efficiently and to counter terrorism. So much so that they spend hundreds of millions of pounds of taxpayers' money on biometrics technology.

This tulipmania is not restricted to the UK and it is not restricted to politicians and civil servants. It affects privacy campaigners, the expression of whose fears amounts unintentionally to a powerful unsolicited testimonial, thereby boosting the sales of biometrics technology. It affects journalists. And it affects normal people.

There is a surprising certainty evident in most people – even people who announce proudly that they haven't got a clue how any technology works – that biometrics work. They can get positively tetchy if you suggest that they're wrong and that mass consumer biometrics don't work. Its source is a mystery but this inexplicable certainty of theirs is jealously guarded and seems to survive any number of adverse encounters with reality.

What the doctors say
That has been the case for years but maybe now, at last, just maybe, this inter-continental pandemic is on the wane. Take a look at this:
Is there any hope of inductively extending the results of our technical test more broadly to any other algorithms or databases? A Type B systematic uncertainty evaluation after consideration of changes in the unit of empirical significance and statistical controls over its tangible elements might be of value, provided that the specifics of the changes could be given, but we should not sanctify such a “guesstimate” in an emperor’s cloak of imagined analytic rigor.

"Inductively extending"? "Type B systematic uncertainty evaluation"? "The unit of empirical significance"? Who writes this elegant prose? James L Wayman, Antonio Possolo and Anthony J Mansfield, referred to collectively henceforth as "WaPoMa", that's who.

Who are, or is, WaPoMa? Three titans of the biometrics industry. Mr Wayman is at San José State University, Mr Possolo at the US National Institute of Standards and Technology (NIST) and Mr Mansfield at the UK National Physical Laboratory. Between them, they provide the academic foundation for the biometrics industry in the Western world, they speak with authority, they have earned the respect they command in academia, in industry and in government. (To all those denizens of academia and industry not listed here who also contribute to the foundations of biometrics, apologies.)

And what is WaPoMa saying?

He's saying that:

• You can't extrapolate from the results of a biometrics technology test. At the end of a technology test, all you know is that the results are what the results are. You can't use the results of one test to predict how well or badly any biometrics package will perform in another test.

• The reason for that is the recalcitrant uncertainty in the test. That's why you can't extrapolate, or "inductively extend". Uncertainty.

• And although to an optimist it "might be of value" to try to measure that uncertainty, no says WaPoMa with his scrupulous scientist's candour, that measure would just be a guess, the emperor would be naked, the figures given would have no "analytic rigor" (or "analytical rigour", as we say here in the UK).

Out of control (statistically) – tests prove nothing ...
And that's not the end of it.

In his paper, Fundamental issues in biometric performance testing: A modern statistical and philosophical framework for uncertainty assessment, WaPoMa is unsparing.

Biometrics technology tests are designed to measure, among other things, the false positive and false negative rates of any number of rival biometrics packages. You can only measure these quantities if they are under "statistical control". And in the world of mass consumer biometrics, they're not, says WaPoMa.

So what? What does that mean? By way of explanation, WaPoMa provides this quotation from another luminary of the world of metrology, Churchill Eisenhart:

... a measurement operation must have attained what is known in industrial quality control language as a state of statistical control ... before it can be regarded in any logical sense as measuring anything at all.

Not to put too fine a point on it, according to WaPoMa, given the current state of uncertainty in the field of biometrics, a field which is statistically out of control, researchers don't even know what they're measuring when they perform a technology test.

What politicians want to know (and public servants and the law and the media and the public) is – do biometrics work? Yes? Or no? And what WaPoMa says is that technology tests can't help to answer that question.

He doesn't say that tests can't tell you if biometrics work, whatever "work" means. Much more elegantly, he says that for a given biometrics package, no current test can give you an "intimation of its operational acumen". (I wish I'd said that. You will, Oscar, you will.)

In case anybody missed the point, WaPoMa spells it out:

... technology testing on artificial or simulated databases tells us only about the performance of a software package on that data. There is nothing in a technology test that can validate the simulated data as a proxy for the “real world”, beyond a comparison to the real world data actually available. In other words, technology testing on simulated data cannot logically serve as a proxy for software performance over large, unseen, operational datasets.
... technology tests prove nothing and operational tests prove nothing ...
Let's clarify WaPoMa's "taxonomy". He distinguishes between technology tests on the one hand, and operational tests on the other.

A biometrics technology test is conducted in the lab and is entirely computer-based. An operational test is conducted in the field, in the real world, with the biometrics package coming under attack from the unpredictable torrent of humanity trying to clear immigration at a US airport, for example, or trying to get into an Olympics venue.

Given that researchers don't know what they're measuring even in a technology test, according to WaPoMa, there can be no way to measure the performance of an operational biometrics system, where the level of uncertainty is even greater.

Back in May 2004, NIST reported that they had tested the flat print fingerprinting technology that was to be used in US-VISIT, the border control system non-US nationals go through when they try to get into the US. That is an example of a technology test. So is their March 2007 report on face recognition technology, i.e. iris scans and facial geometry. Both tests were entirely computer-based. So is the International Fingerprint Verification Competition that used to be run by four universities but seems now to have been discontinued, possibly because the results tell no-one anything.

The reports of all those three technology tests have been published. Only IBM's technology trial, conducted to choose the "best" biometrics system for ePassports for UK nationals and residence permits for non-EEA residents of the UK, remains a state secret.

NIST predicted a false reject rate of 0.5% in their May 2004 report. The operation of US-VISIT was later reviewed by the Office of the Inspector General. OIG's December 2004 report revealed that 118,000 people presented themselves to US-VISIT every day on average, most of them got through the primary/biometrics inspection, 22,350 of them were referred to secondary inspection, 1,811 of those were refused entry, and the rest were let in.

That means that (23,500 - 1,811) / 23,500 = 92% of secondary inspections were a waste of time. Not the sort of accuracy you associate with a working 21st century technology.

And it means that the false reject rate was (22,350 - 1,811) / 118,000 = 17.4%, just a tad different from the 0.5% predicted by NIST – WaPoMa knows whereof he speaks: "technology testing on simulated data cannot logically serve as a proxy for software performance over large, unseen, operational datasets".

... and scenario tests prove nothing
In between technology tests and operational tests, you get scenario tests, such as the UK Passport Service biometrics enrolment trial and the Unique Identification Authority of India's proof of concept trial for Aadhaar. ("Aadhaar" is the brand name for India's unique identification scheme.The word means foundation, or support.) In a scenario test, researchers recruit a putatively representative sample of the population so that they can test biometrics packages with real people under still fairly controllable conditions.

WaPoMa has a lot to say about scenario testing. What it adds up to is, don't bother:

We lack metrics for assessing the expected variability of these quantities between tests and [we lack] models for converting that variability to uncertainty in measurands [the quantities intended here are false positives and negatives, failure to acquire and enrol, and throughput].

... each specific recognition technology (iris, face, voice, fingerprint, hand, etc.) will have specific factors that must be within a state of statistical control. This list of factors is not well understood, although ample work in this area is continuing. For example, recent analysis of iris and face recognition test results shows us that to report false match and false non-match performance metrics for such systems without reporting on the percentage of data subjects wearing contact lenses, the period of time between collection of the compared image sets, the commercial systems used in the collection process, pupil dilation, and lighting direction is to report “nothing at all” [c.f. Eisenhart above]. Our reported measurements cannot be expected to be repeatable or reproducible without knowledge and control of these factors. [emphasis added]

... the test repeatability and reproducibility observed in technology tests are lost in scenario testing due to the loss of statistical control over a wide range of influence quantities.

... Our inability to apply concepts of statistical control to any or all of these factors will increase the level of uncertainty in our results and translate to loss of both repeatability and reproducibility.

... Test data from scenario evaluations should not be used as input to mathematical models of operational environments that require high levels of certainty for validity.

Overall, in WaPoMa's own words:

We can conclude that the three types of tests are measuring incommensurate quantities and therefore [we] should not be at all surprised when the values for the same technologies vary widely and unpredictably over the three types of tests.

That's quite enough quotation from WaPoMa's paper.

You get the picture – technology tests are hard to interpret, it's not clear what's being measured, they certainly can't be used to predict the results of scenario tests, which are hard to interpret, it's not clear what's being measured, and neither technology tests nor scenario tests can be used to predict the performance of biometrics systems in operation in the real world, which can't be measured anyway, not least because real impostors don't give themselves up after they've fooled the system and got through, they just don't have that researcher's enthusiasm for maintaining the statistics.

Tulipmania today – SNAFU
Where does that leave us?

If the border control authorities in the UK, Australia and New Zealand are asked why they have spent a fortune deploying so-called "smart gates" at international airports, their answer can't be "because tests show that the technology works so well". They can't say that because no-one knows what it means, the researchers don't know what they're measuring in a test. So what was it? A metrological impulse purchase?

Why are the UK Home Office spending taxpayers' money on the biometrics in ePassports and in residence permits for non-EEA nationals? (That's £650 million of taxpayers' money, split between IBM and CSC.) Why are the Home Office paying VFS Global and CSC to register the biometrics of millions of visa applicants all over the world like so many schoolboy stamp collectors? Why are UK nationals paying three times the correct price for a passport?

Why has Pakistan bothered to register the biometrics of 96 million citizens and to issue 70 million of them with biometric ID cards? All that effort. And the result? Not the harmonious state of law-abiding politically tranquil domestic peace and efficient public services sometimes touted as the automatic consequence of ID card schemes.

Why is India spending billions on Aadhaar, which depends on biometrics whose reliability is, so say the titans, utterly unknowable? And will the Unique Identification Authority of India ever answer my question how they can claim to offer unique identification when, based on their own figures, they would have to perform 18,000,000,000,000 manual checks to prove uniqueness? And why do they think Aadhaar will eradicate corruption, rather than automate corruption?

Why does Safran Group want to spend $1.5 billion acquiring L-1 Identity Solutions Inc., a biometrics company whose technology is statistically out of control? And what rare loss of financial control caused 3M to splash out on buying Cogent Inc.?

The questions keep coming.

Why is it only governments that believe in biometrics? How come the banks and the major retailers seem to be proof against this particular form of tulipmania? (Thank goodness they are proof against it. Any nation that inserted today's mass consumer biometrics into its payments systems would be instantly reduced to barter.)

On what basis does the European Commission spend its member states' money collecting the biometrics of millions of non-EEA visa applicants?

Does China really believe (superstitiously?) that biometrics will provide a Golden Shield against political unrest?

Are Russia about to introduce biometric visas? No-one knows – according to the St. Petersburg Times, not even the Russians.

But that's enough questions for the moment because the answer is the same in each case – there isn't an answer.

The only thing that's certain? Uncertainty
WaPoMa starts with uncertainty and he finishes with uncertainty. One thing you now know for sure is that if a biometrics salesman promises a government that his products can identify everyone in the country uniquely and verify their identity wherever necessary, then that mountebank is talking nonsense. WaPoMa says so. Maybe now governments will stop wasting their taxpayers' money on technology the reliability of which it is impossible to know?

Are WaPoMa right?

If the tulipmania persists, if you still think that there are established and trusted biometrics systems dutifully working away, all day every day, all over the world, helping to provide reliable identity management services to populations of 10 million people, 50 million, 100 million, ..., ask yourself what it is that you know about biometrics, measurement and probability that Jim Wayman, Antonio Possolo and Tony Mansfield don't know. It'll be a short list, but do send it to them.

Tulipmania yesterday – murky
In this mood of unstinting disclosure, WaPoMa chronicles a sad case in the history of science, when the report of a 1993 scenario test was suppressed because it suggested that a particular hand geometry biometrics system performed badly. One disobliging participant in the trial had been practising beforehand and managed all on his own to alter the equal error rate unfavourably by a factor of 25 – such are the perils of scenario testing and, of course, the operational perils in the field. It was poor design if the trial could be so sensitive to one participant. And it was reprehensible to cover up the results. The report has now been restored to the canon.

It is to be hoped that it will soon be joined by IBM's biometrics technology report, the "tulip bulb" that is costing UK taxpayers £650 million, with no known benefit.

WaPoMa finds it necessary to emphasise that researchers must be careful how they describe their results, they must take into account how their non-technical listeners will understand the researchers' words and how they will use – or, in the case of politicians and their officials, almost certainly (Type A) misuse – the results.

That must be the most painful confession in WaPoMa's paper.

Do you really need to do research to discover that you should only say what you mean, that you should say it clearly and that you should only say it if you believe it to be true? Isn't that redundant? Or otiose? Taken for granted?

Apparently not. It's certainly taken NIST a long time to learn the lesson. What on earth did NIST think people would understand from their May 2004 report on flat print fingerprinting when they wrote:

With the proper selection of an operating point, the one-to-many accuracy for a two-finger comparison against a database of 6,000,000 subjects is 95% with a false match rate of 0.08%. Using two fingers, the one-to-one matching accuracy is 99.5% with a false accept rate of 0.1%.

No layman reading that is going to understand that the figures come from a technology test and can't be extrapolated to operational systems. NIST didn't add:

... and by the way we don't know what we've been measuring, that 0.08% false match rate achieved using Cogent products can't be reproduced using another package and it can't be reproduced using Cogent products on another database, forget the 0.1% false accept rate because in an operational system real impostors don't turn themselves in, and please don't get the idea that there's any statistical control in this test of ours – for the best possible reasons, we actually haven't got a clue whether flat print fingerprinting will help to protect the US's borders.

That omission was mendacious. Judging by the WaPoMa paper, NIST should have made the addition suggested or something like it.

One month later they made up for it. A bit.

The USA PATRIOT Act 2001 specifies at section 403(c)(1) that NIST has to certify a technology that verifies people's identity:

The Attorney General and the Secretary of State jointly, through the National Institute of Standards and Technology (NIST), and in consultation with the Secretary of the Treasury and other Federal law enforcement and intelligence agencies the Attorney General or Secretary of State deems appropriate and in consultation with Congress, shall within 2 years after the date of the enactment of this section, develop and certify a technology standard that can be used to verify the identity of persons applying for a United States visa or such persons seeking to enter the United States pursuant to a visa for the purposes of conducting background checks, confirming identity, and ensuring that a person has not received a visa under a different name or such person seeking to enter the United States pursuant to a visa. [emphasis added]

That's what the Act says and, in all honesty, NIST cannot possibly comply. How are they supposed to know if the biometrics used in any particular case are a reliable proxy for someone's identity? It's completely out of their control. They can't put their name to it. So what NIST say in their certificates, according to their June 2004 review of flat print fingerprinting technology, is:

For purpose of NIST PATRIOT Act certification this test certifies the accuracy of the participating systems on the datasets used in the test. This evaluation does not certify that any of the systems tested meet the requirements of any specific government application. This would require that factors not included in this test such as image quality, dataset size, cost, and required response time be included.

There it is, the irreducible inanity of today's mass consumer biometrics is certificated.

WaPoMa's paper was delivered at a March 2010 NIST conference. Rarely has a towel been so well and truly – and elegantly and clearly and precisely and comprehensively – thrown in.

Tulipmania tomorrow – whither biometrics?
And since then?

In May 2010, a new government was elected in the UK. They immediately cancelled the plans to register the biometrics of all British citizens. (Not that those plans were very far advanced, Whitehall only having had eight years to work them out.) And in December 2010 they repealed the Identity Cards Act 2006. Anticipated volumes for the biometrics industry in the UK are down. Volumes, and political support.

In April 2011, over the signature of President Obama himself, the White House issued its National Strategy for Trusted Identities in Cyberspace. There is not a single occurrence of the word "biometrics" in the whole 45-page document, nor any of its cognates. Volumes and political support – down.

Again in April 2011, the Cabinet Office issued restricted documents describing the UK's proposed Digital Delivery Identity Assurance project. In 75 pages there is not a single occurrence of the word "biometrics" nor any of its cognates. Volumes and political support – down.

The Home Office's biometrics tulipmania may now at last have been shaken off by the rest of the UK government.

And yet again in April 2011, with their first-hand experience of tulipmania, the Dutch government suspended its plans to develop a centralised flat print fingerprint population register: "home affairs minister Piet Hein Donner ... says there are currently too many concerns about the security and reliability of the system". Volumes and political support – down.

Perhaps the tide is going out, the pandemic is receding, ... Governments are starting to peel away. And WaPoMa has pulled the academic rug out from under the biometrics industry's feet. The biometrics companies may be feeling a little lonely with their academic support, their aadhaar, gone. Maybe even a little nervous. If India cancels Aadhaar, where else can these companies ply their trade? Their planet is shrinking. (Ever resourceful, they are now planning to sell biometrics for orangutans.)

The earth may look roughly flat but actually it is roughly spherical. It may feel like the centre of the universe but actually it's not even the centre of the Milky Way. There are medical ailments that leeches can't cure and not all future events can be predicted by inspecting the entrails of a sacrificed sheep, however attractive the astrological symbols on the priest's pointed hat. We know that. And now, thanks to WaPoMa, we know that there is no good reason to invest in mass consumer biometrics.

Thank you
It wasn't just WaPoMa. IBM were at that March 2010 NIST conference as well, delivering a paper on the technique they used to choose the "best" biometrics system. That is the basis on which £650 million of UK taxpayers' money is being spent. Did IBM notice that WaPoMa's keynote speech at the same conference suggested that they were wasting their time, as well as taxpayers' money?

"Test data from scenario evaluations should not be used as input to mathematical models of operational environments that require high levels of certainty for validity". The decision to invest hundreds of millions of pounds of taxpayers' money requires high levels of certainty. Otherwise it's unbusinesslike, irresponsible, unscientific and illogical.

What WaPoMa tells us is that, if the investment decision is based on biometrics tests, then the argument is invalid.

WaPoMa's paper hasn't been repudiated. Not by San José State University, not by NIST and not by the National Physical Laboratory, all of which institutions advise governments the world over. There has been no stream of academic rebuttals. Nor has there been any public response from the tottering remnants of the mass consumer biometrics industry. There is nothing obviously wrong with his findings, the intimations of WaPoMa's operational acumen remain auspicious.

Where did David Blunkett and everyone else get the idea that mass consumer biometrics work reliably? Coming from respected institutions like NIST, statements like "using two fingers, the one-to-one matching accuracy is 99.5% with a false accept rate of 0.1%" must have played a part.

WaPoMa didn't write and publish his paper by accident. Why did he write it? He must have thought ahead to the effect his words would have, the perestroika that would follow his glasnost. He must have known that publishing his paper would impugn the credibility of the biometrics companies and the politicians and their officials who have let contracts to them.

He went ahead anyway. Why? Was it an act of expiation/atonement for that 99.5% one-to-one matching accuracy? Was it a public service, pointing out that if the anticipated performance of biometrics systems can't provide the basis for investment in mass consumer biometrics, then there is no basis?

Whatever, thank you, Messrs Wayman, Possolo and Mansfield.


David Moss spent eight years campaigning against the Home Office's ID card scheme RIP. Whitehall haven't given up yet – a national identity assurance service has appeared in their G-Cloud Programme. We shall see.