Trojan AI

AI systems are vulnerable to data poisoning attacks and such vulnerabilities must be addressed to make AI safe to use.


Our research program looks at a specific kind of attack where a backdoor installed in an AI system during its training time can be exploited later through data poisoning to usurp the system and control its behaviour, possibly causing great harm.

A Trojan attack consists of a trigger and a target action. The attacker inserts the backdoor during the training time by poisoning the AI supply chain i.e. either poisoning the training data with a trigger, or hacking into the trained AI system to insert the trigger, such that whenever the trigger is present in the incoming data the AI system takes the specified target action.

A typical example is an infected autonomous car being misled into thinking of a STOP sign as a speed limit sign (target action) by pasting a small patch (trigger) on the signboard.

We aim to build an anti-virus type scanner that will check for Trojan backdoors before any AI system is deployed. Our progress can be found here.

In future…

We have the following agendas in this research program:

  • Map the threat landscape by researching on the variety of ways such Trojan attacks can take place;
  • Develop high-performing detection algorithms for all of them; and
  • Develop theoretically-guaranteed mitigation strategies, whenever possible.

Join us in this pursuit to keep AI safe to use.