Described as the largest publicly available facial recognition set by Microsoft, the database consisted of 10 million images of nearly 100,000 individuals. Released in 2016 and known as MS Celeb, the database has been used by researchers and private firms to train their facial recognition technology, according to the Financial Times (FT).
Microsoft says that the database was built on images of celebrities found online, however, the Megapixels project, which tracks face databases, noted that a lot of people who must maintain an online presence for their professional lives were also included.
A number of private individuals, including security journalists Kim Zetter, Adrian Chen and Shoshana Zuboff, the author of Surveillance Capitalism had their details included in the database.
The FT investigation found that many of the people whose photos were used in the database had not been asked for their consent and that their images had been scrapped from search engines and videos under the terms of the Creative Commons (CC) license.
“Microsoft has exploited the term ‘celebrity’ to include people who merely work online and have a digital identity. Many people in the target lists are even vocal critics of the very technology Microsoft is using their name and biometric information to built,” said Berlin-based researcher Adam Harvey, whose project Megapixels, uncovered the details of MS Celeb.
- Call to Limit the Sale of Amazon’s Facial Recognition Rejected
- Sacrificing Freedom in the Name of Safety: The Biometric Paradox
- Online Tool Launched to Help Airline Passengers Avoid Facial Recognition
According to the publication, when it contacted people in the database they were unaware of their inclusion. “I am in no sense a public person, there is no way in which I’ve ceded my right to privacy,” Adam Greenfield commented upon learning data had been included in the database.
“It’s indicative of Microsoft’s inability to hold their own researchers to integrity and probity that this was not torpedoed before it left the building. To me, it is indicative of a profound misunderstanding of what privacy is,” he added.
The report found that MS Celeb was being used by commercial entities including IBM, Panasonic, Alibaba, Nvidia, Hitachi, and SenseTime and Megvii.
Chinese suppliers SenseTime and Megvii provide equipment to officials in Xinjiang, where minorities of mostly Uighurs and other Muslims are being monitored and held in internment camps.
Following the report, the company took down the database within days and told FT: “The site was intended for academic purposes. It was run by an employee that is no longer with Microsoft and has since been removed.”
Although the database has been taken down, it is likely still being used by those who downloaded a copy of the data.
“You can’t make a data set disappear. Once you post it, and people download it, it exists on hard drives all over the world,” said Harvey. “Now it is completely disassociated from any licensing, rules of control that Microsoft previously had over it. People are posting it on GitHub, hosting the files on Dropbox and Baidu Cloud, so there is no way from stopping them from continuing to post it and use it for their own purposes.”
Technology policy researcher at the Alan Turing Institute, Michael Veale, said of Microsoft: “They are likely to have taken it down because their lawyers expressed concern that they do not have a basis to process special category data such as faces under Article 9 of GDPR. They may not have a get-out clause for processing biometric data for the purposes of “uniquely identifying a natural person”.
“Particularly as the use of the data set has moved from a purely research use to something that products are being built with. There is no reason to believe that the people in the data set cannot be considered to expressly and clearly have made their faces public.”