Eindhoven University of Technology: From deepfakes to Safe Fakes
For Sander Klomp, researcher at TU/e in the VCA Research group and also working at TU/e spin-off ViNotion, it began as a very practical problem.
“At ViNotion, we create smart algorithms that, for example, allow local authorities to monitor intersections where different traffic flows converge. To train those algorithms, an awful lot of images of vehicles, cyclists and pedestrians are needed. The new EU privacy rules have made this a lot harder, as the faces can be traced back to real people. According to the EU this is in principle only allowed with their explicit consent.”
The solution may seem simple. You anonymize the images, for example, by making the faces more blurry or grainy, or by adding a black bar (see image). As opposed to facial recognition, which requires information about somebody’s individual facial features, person detection is not interested in somebody’s looks. It is sufficient that the algorithm recognizes that it is a human being.
IMAGENET
This is also the path chosen by the likes of ImageNet, with over 14 million images the largest and most widely used database for AI research. in March 2021 it decided to blur all faces.
“But for us it’s different”, says Klomp. “We actually use those images to train our algorithms how people look. If their faces are always blurred, the algorithm will conclude that this is what people always look like. You can imagine what that does to the accuracy of your AI systems. There’s got to be a better way!”
DEEPFAKES
Klomp and his colleagues immediately thought of deepfakes, artificial images or voices generated by artificial intelligence. “By replacing the faces on our images with random fake faces, you can protect the privacy of the people involved, and at the same time train your detectors.”
That solution is not entirely new; the first attempt to anonymize faces using deepfakes was made in 2018. But the results at the time were quite disappointing. However, in the meantime, the algorithms to generate artificially realistic faces, so-called GANs, have gotten a lot better.
Generative Adversarial Networks, a type of machine learning, consist of two neural networks, which play a game with each other. On one side you have a generator, which generates faces at random, and on the other side a discriminator, which determines whether that face is sufficiently ‘real’.
This process repeats itself countless times, until finally the generator has become so smart that it can create faces that are indistinguishable from a real face. The images can then be used to train a face detector.
DEEPPRIVACY
The researchers tested several GANs for their research. In the end, DeepPrivacy, which at the time of the study was the most advanced algorithm for generating fake faces, proved to be the best. It outperformed not only traditional ways of anonymizing faces (like blurring), but also other GANs.
“We see in our tests that detectors trained with DeepPrivacy’s fakes achieve a detection score of around 90 percent. That’s only 1 to 3 percentage points less than detectors trained on non-anonymized data. Still very good, especially when you consider the alternative: not being able to use data at all because of privacy regulations.”
The reason that training on DeepPrivacy images works so much better than with older GANs is that it requires fewer keypoints (see image).
Klomp: “Keypoints are points that are characteristic of the face, such as the position of the eyes or the ears. The anonymizer detects these for each face, and then blacks out the rest of the face so that it becomes unidentifiable. Then the generator “sticks” a new fictional face on it. DeepPrivacy uses only seven keypoints in total, which means it is able anonymize faces as small as 10 pixels.”
PRACTICE
The researchers are very pleased with the result, because they are the first to show that you can train good face detectors on images anonymized with deepfakes.
The next step is to use DeepPrivacy, or possible successors to this GAN, to anonymize the data of ViNotion.
Klomp: “The nice thing is that our method can also be used by researchers who are not at all interested in face detection. Think of a camera on a self-driving car that has to be able to recognize other cars. These images often show identifiable people. You can, of course, anonymize them by blurring, as ImageNet does, but then you lose precision. Our method works better.”