The concept of Adversarial AI is intended to encompass all those techniques that aim to influence the behavior of artificial intelligence applied to a given model with the aim of successfully completing cyber attacks.
The attacker then tries to deceive the model through adversary examples, created precisely to lead to an incorrect prediction. Using deceptive data, proceeding by trial and error, an attempt is made to create a certain effect by varying the inputs until the desired result is obtained, deceiving the artificial intelligence. In this way it is in fact possible to exploit vulnerabilities related to a certain behavior ignored by the developers, all to the advantage of cybercriminals. Artificial intelligence is not immune to cybersecurity risks: we are well aware that criminals are always extremely active in identifying attack potential by overcoming the continuous security measures put in place by those who have to protect the integrity of data and networks.
The extent of the threat.
The seriousness of this scenario can be easily understood if one thinks of the fields of application of artificial intelligence: healthcare, finance, national security, transport. These are highly evolved and sensitive sectors, where a cyber attack goes beyond the boundaries of the network and virtual data while it can have direct repercussions on people's safety. In fact, AI applications are growing in these sectors precisely because they can have a direct impact by simplifying the daily operations of employees. We think of applications in self-driving cars, rather than the financial sector to determine the risk involved in issuing a loan. and the examples could easily continue.
These are hypotheses, at the moment, but highly documented in their feasibility by academic studies that have analyzed and studied these scenarios. To date, most of the studies have focused on image recognition, and have demonstrated with practical examples how attacks of this type are actually possible. Within a scenario in which Machine Learning is rapidly advancing involving the public and private sectors, the studies have considered both whitebox-type attacks, in which the attacker has access to the target model, and blackbox-type attacks, in which the attacker has no access other than the target's outputs. Large companies like Google or IBM are already ahead with investments in the sector to protect Machine Learning applications: the more they are active in the development of Artificial Intelligence models, the more they necessarily have to invest in the security of these models.
Let's think of the case in which the recognition algorithm used on a self-driving car is deceived and does not recognize an obstacle on the road. Here is the link to the experimental study conducted a few years ago.
The different attack types of Adversarial AI.
There are different types of AI attacks, however the most common distinction is when the attack occurs.
It consists in the use of deceptive data intended to contaminate the learning of Machine Learning systems during the training period, influencing their behavior. That is, the attack takes place at the moment of learning, both by inserting "poisoned" data and by modifying the input data or by acting directly on the algorithm, changing it according to its objectives.
These attacks happen not in the moment of learning but in the moment of testing. These are the most common attacks.