Neural networks achieve superhuman performance in many areas, but they are easily fooled.
In the demo above, we can force neural networks to predict anything we want. By adding nearly-invisible noise to an image, we turn "1"s into "9"s, "Stop" signs into "120 km/hr" signs, and dogs into hot dogs.
These noisy images are called adversarial examples. They break the integrity of machine learning systems, and the illusion of their superhuman performance.
Our world is becoming increasingly automated, yet these systems have strange failure modes.
If machine learning systems are not properly defended, attackers could:
Moreover, all machine learning models (not just neural networks) are vulnerable. In fact, simpler models such as logistic regression are even more easily attacked.
Finally – beyond adversarial examples – there are many more adversarial attack vectors, including data poisoning, model backdooring, data extraction, and model stealing.
There are several proposed defenses, including adversarial training and admission control.
However, no defense is universal and many have proven ineffective, so work with an expert to quantify your risks and invest in defenses appropriately.
(What happens if someone can make your system predict anything they want?).
Here's a list of good resources, in rough order of approachability:
Last – feel free to email me questions.