Our research program looks at a specific kind of attack where a backdoor installed in an AI system during its training time can be exploited later through data poisoning to usurp the system and control its behaviour, possibly causing great harm.
A Trojan attack consists of a trigger and a target action. The attacker inserts the backdoor during the training time by poisoning the AI supply chain i.e. either poisoning the training data with a trigger, or hacking into the trained AI system to insert the trigger, such that whenever the trigger is present in the incoming data the AI system takes the specified target action.
A typical example is an infected autonomous car being misled into thinking of a STOP sign as a speed limit sign (target action) by pasting a small patch (trigger) on the signboard.
We aim to build an anti-virus type scanner that will check for Trojan backdoors before any AI system is deployed. Our progress can be found here.
We have the following agendas in this research program:
- Map the threat landscape by researching on the variety of ways such Trojan attacks can take place;
- Develop high-performing detection algorithms for all of them; and
- Develop theoretically-guaranteed mitigation strategies, whenever possible.
Join us in this pursuit to keep AI safe to use.