Australian researchers say they have devised a way to stop unauthorised artificial intelligence systems from learning from online images, offering a formal, measurable check on what models can extract from photos, artwork and other visual content.
The technique, developed by CSIRO with the Cyber Security Cooperative Research Centre and the University of Chicago, subtly alters images so they appear unchanged to people but become uninformative to AI models. The team says it places a hard limit on what a model can learn from protected content and provides a mathematical guarantee that this holds even if an attacker adapts their approach or tries to retrain a system.
“Existing methods rely on trial and error or assumptions about how AI models behave,” Dr Derui Wang, a CSIRO scientist, said. “Our approach is different; we can mathematically guarantee that unauthorised machine learning models can’t learn from the content beyond a certain threshold. That’s a powerful safeguard for social media users, content creators, and organisations.”
The researchers argue the approach could help curb deepfakes by preventing models from learning facial features from social media images, and shield sensitive datasets such as satellite imagery or cyber threat intelligence from being ingested into training pipelines. Dr Wang said the method can be deployed automatically and at scale.
“A social media platform or website could embed this protective layer into every image uploaded,” he said. “This could curb the rise of deepfakes, reduce intellectual property theft, and help users retain control over their content.”
While the current work focuses on images, the team plans to extend it to text, music and video. The method remains theoretical for now, with results validated in controlled lab tests. Code has been released on GitHub for academic use, and the group is seeking collaborators across AI safety and ethics, defence, cybersecurity and academia.
The paper, titled Provably Unlearnable Data Examples, was presented at the 2025 Network and Distributed System Security Symposium (NDSS), where it received the Distinguished Paper Award.
University of Chicago researchers have previously released tools such as Glaze and Nightshade that aim to thwart AI models from training on artists’ work by poisoning training data. Those systems demonstrated empirical effectiveness; the new CSIRO-led work seeks to offer formal guarantees about what models can and cannot learn.
Independent experts are likely to scrutinise how the guarantees translate in the wild, where data is scraped at scale and models evolve rapidly. Real-world effectiveness will hinge on how broadly platforms and publishers adopt the protections and how they stand up against future training and attack strategies.
The announcement comes amid heightened concern over deepfakes and online harms, and as governments, including in Australia, weigh rules on AI training data and provenance measures such as watermarking. CSIRO’s approach aims to give creators and organisations a technical control that can complement policy and legal safeguards.
Interested partners can contact the team via [email protected].