[Seminar 20.02.2017] Worzyk
Contact
Director
Prof. Dr. Ernst-Rüdiger Olderog
Coordinator
Ira Wempe
[Seminar 20.02.2017] Worzyk
Adversarials-1: Defending by attacking
Nils Worzyk
Abstract
Although neural networks are very successful in the domain of image processing, they are vulnerable to adversarial images - slightly perturbed images, which a human cannot distinguish from the original image. However, for the neural network, the perturbation leads to a different classification of the image.
A lot of research was done on adversarial attacks, and on defenses against those attacks. In this paper, we propose a new defense by applying adversarial attacks to adversarial images. The new type of adversarial images is called adv-1 and by observing the properties of the different transitions - from original to adversarial images, and from adversarial to adv-1 images - we are able to detect adversarial images with a high accuracy, even for unknown attacks. Furthermore we are able to identify the attack, used to create the adversarial image in the first place. Regarding classification, depending on the used attack, our approach reaches correct classification accuracies, comparable to other defenses.