Researchers at the National Research Nuclear University MEPhI have developed a new neural network architecture called MambaShield that is resistant to so-called data poisoning attacks.
Why Data "Poisoning" Is Dangerous for AI
Such attacks are one of the main threats to modern machine learning systems. For example, when an attacker gradually injects distorted examples into the training data, the model begins to learn incorrectly.
As a result, accuracy can drop sharply — from 95% to 40%. This is especially dangerous for cybersecurity, autonomous transport, finance, and industry.
How the Neural Network from MEPhI Works
MambaShield can filter out malicious data already at the training stage and prevents it from affecting the result. Even if up to 30% of the training set is compromised, the system’s accuracy remains above 97%. At the same time, it operates 4.2 times faster than classical transformers.
The architecture is based on selective state-space models. Simply put, the system itself decides which data to keep and which to discard, filtering out malicious examples.
MambaShield is built on three technologies at once. Progressive Adversarial Robustness Distillation (PARD) makes it possible to transfer knowledge from several models into one compact model. Hierarchical Reinforcement Learning (HRL) helps the system adapt in real time to the attacker’s changing behavior. And PAC-Bayesian certification provides mathematical guarantees of robustness — even under significant data "poisoning."
Experiments on cybersecurity attack datasets (CIC-IoT-2023, CSE-CICIDS2018, UNSW-NB15) showed that detection accuracy reaches 99.1%, while comparable systems are at about 97%. Under attack, it drops by only 2–3%, whereas in conventional models the decline is 18–20%.
There are limitations as well. When working with very long sequences (more than 5,000 steps), rounding errors may accumulate. And if the amount of malicious data becomes too large (more than 50–70%), any system begins to fail.
Potential for Industrial Deployment
MambaShield is already being considered as a foundation for creating trusted AI. Such solutions can be used at nuclear power plants, in the financial sector, and in medicine, where accuracy and robustness are especially important.
The development was published in the journal Expert Systems with Applications and received a grant from the Russian Ministry of Economic Development.