How to analyse data without revealing their secrets
Data are valuable. But not all of them are as valuable as they could be. Reasons of confidentiality mean that many medical, financial, educational and other personal records, from the analysis of which much public good could be derived, are in practice unavailable. A lot of commercial data are similarly sequestered. For example, firms have more granular and timely information on the economy than governments can obtain from surveys. But such intelligence would be useful to rivals. If companies could be certain it would remain secret, they might be more willing to make it available to officialdom.
A range of novel data-processing techniques might make such sharing possible. These so-called privacy-enhancing technologies (pets) are still in the early stages of development. But they are about to get a boost from a project launched by the United Nations’ statistics division. The un pets Lab, which opened for business officially on January 25th, enables national statistics offices, academic researchers and companies to collaborate to carry out projects which will test various pets, permitting technical and administrative hiccups to be identified and overcome.
The first such effort, which actually began last summer, before the pets Lab’s formal inauguration, analysed import and export data from national statistical offices in America, Britain, Canada, Italy and the Netherlands, to look for anomalies. Those could be a result of fraud, of faulty record keeping or of innocuous re-exporting.
For the pilot scheme, the researchers used categories already in the public domain—in this case international trade in things such as wood pulp and clocks. They thus hoped to show that the system would work, before applying it to information where confidentiality matters.
They put several kinds of pets through their paces. In one trial, OpenMined, a charity based in Oxford, tested a technique called secure multiparty computation (smpc). This approach involves the data to be analysed being encrypted by their keeper and staying on the premises. The organisation running the analysis (in this case OpenMined) sends its algorithm to the keeper, who runs it on the encrypted data. That is mathematically complex, but possible. The findings are then sent back to the original inquirer.
That inquirer thus receives its answers, but never has access to the information on which those answers are based. Moreover, for extra security, the results are processed by another pet, called differential privacy. This employs elaborate maths to add a smidgen of statistical noise to a result. That makes the findings less precise, but means they cannot be reverse-engineered to reveal individual records. It also permits the organisation releasing the findings to set a so-called “privacy budget”, which determines the level of granularity disclosed by the data. The result is a belt-and-braces approach. In the argot of the field, smpc provides input privacy, while differential privacy offers output privacy.
In a second trial using the same data sets, the pets Lab arranged for Oblivious Software, a company in Dublin, to test “trusted execution environments”, also called “enclaves”, as a form of input privacy. To set these up data are first encrypted by their keeper and then sent to a special, highly secure server that has been built in a trustworthy way, so that every operation can be tracked and its memory fully cleared after the job is done.
Once safely stored in this server’s hardware, the data are decrypted and the desired analysis performed. For extra security, cryptographic hashes and digital signatures are applied, to prove that only authorised operations have taken place. The output is likewise statistically blurred, using differential privacy, before being sent back to the original inquirer.
In the tests, both approaches did indeed spot anomalies. For example, although American and Canadian records of the value of wood pulp traded between the two countries were basically the same, their data on the value of the clock trade differed by 80%. “Tech-wise, it worked,” gushed Ronald Jansen of the un statistics division, who administers the new lab.
Whether it works bureaucratically remains to be seen. But the putative benefits would be great. The use of pets offers not only a means of bringing together data sets that cannot currently interact because of worries about privacy, but also a way for all sorts of organisations to collaborate securely across borders.
The pets Lab’s next goals are to dive more deeply into trade data and to add more agencies to the roster. This all comes as many governments take a bigger interest in pets. In December America and Britain announced they plan, this spring, to launch a “grand challenge” prize around pet systems. The sharing of data—and their use—may now be getting easier. ■
To enjoy more of our mind-expanding science coverage, sign up to Simply Science, our weekly newsletter.