In 2012 artificial intelligence researchers have taken a big leap in computer vision, thanks in part to an extraordinarily large set of images – thousands of everyday objects, people and scenes on photos scraped from the internet and hand-labeled. That dataset, known as ImageNet, is still used today in thousands of AI research projects and experiments.
But last week, every human face included in ImageNet suddenly disappeared – after the researchers who run the dataset decided to blur it.
Just as ImageNet helped usher in a new era of AI, efforts to address it reflect challenges affecting numerous AI programs, datasets, and products.
“We were concerned about the issue of privacy,” he said. Olga Russakovsky, an assistant professor at Princeton University and one of those responsible for running ImageNet.
ImageNet was created as part of a challenge which invited computer scientists to develop algorithms that can identify objects in images. In 2012, it was a very difficult task. Back then, a technique called deep learning, which involves teaching a neural network by giving examples of the labels, was more proficient than previous approaches.
Since then, deep learning has driven a renaissance in AI that has also exposed the shortcomings of the field. Facial recognition, for example, has proven a very popular and profitable use of deep learning, but it is also controversial. A number of U.S. cities have banned the use of the technology by the government because of concerns about the privacy of citizens or prejudice, because the programs are less accurate on non-white faces.
ImageNet today contains 1.5 million images with approximately 1,000 labels. It is largely used to measure the performance of machine learning algorithms, or to train algorithms that perform specialized computer vision tasks. If the faces are blurred, 243 198 of the images are affected.
Russakovsky says the ImageNet team wanted to determine if it is possible to blur faces in the dataset without changing how well it recognizes objects. “People have happened to be in the data since they appeared on the web photos depicting these objects,” she says. In other words, in an image showing a beer bottle, even if the face of the person drinking it is a pink stain, the bottle remains intact.
In a research article, along with the update to ImageNet, explains the team behind the dataset that it has blurred the faces through Amazon’s AI service Acknowledgment; then they pay Mechanical Turk workers to confirm and adjust choices.
The blurring of faces did not affect the performance of various object recognition algorithms trained on ImageNet, the researchers say. They also show that other algorithms built with the object recognition algorithms are not affected either. “We hope this proof-of-concept paves the way for more privacy-conscious visual data collection practices in the field,” says Russakovsky.
This is not the first attempt to customize the famous library of images. In December 2019, the ImageNet team removed biased and degrading conditions introduced by human taggers after a project Excavating AI drew attention to the matter.
In July 2020 Vinay Prabhu, a machine learning scientist at UnifyID and Abeba Birhane, a PhD candidate at University College Dublin in Ireland, published research showing that they can identify individuals, including computer scientists, in the data set. They also found pornographic images in it.
Prabhu says the blurring of faces is good, but is disappointed that the ImageNet team did not acknowledge the work he and Birhane did.
Dull faces can still have unintended consequences for algorithms trained on the ImageNet data. For example, algorithms can learn to look for vague faces when searching for specific objects.
“One major issue to consider is what happens when you use a model trained on a face-blur data set,” says Russakovsky. A robot trained on the data set, for example, can be thrown off by faces in the real world.
More great wired stories