A Developer Accidentally Found CSAM in AI Data. Google Banned Him For It
Google suspended a mobile app developers accounts after he uploaded AI training data to his Google Drive. Unbeknownst to him, the widely used dataset, which is cited in a number of academic papers and distributed via an academic file sharing site, contained child sexual abuse material. The developer reported the dataset to a child safety organization, which eventually resulted in the datasets removal, but he claims Googles has been "devastating.A message from Google said his account has content that involves a child being sexually abused or exploited. This is a severe violation of Google's policies and might be illegal.The incident shows how AI training data, which is collected by indiscriminately scraping the internet, can impact people who use it without realizing it contains illegal images. The incident also shows how hard it is to identify harmful images in training data composed of millions of images, which in this case were only discovered accidentally by a lone developer who tripped Googles automated moderation tools.Have you discovered harmful materials in AI training data ? I would love to hear from you. Using a non-work device, you can message me securely on Signal at @emanuel.404. Otherwise, send me an email at emanuel@404media.co.In October, I wrote about the NudeNet dataset, which contains more than 700,000 images scraped from the internet, and which is used to train AI image classifiers to automatically detect nudity. The Canadian Centre for Child Protection (C3P) said it found more than 120 images of identified or known victims of CSAM in the dataset, including nearly 70 images focused on the genital or anal area of children who are confirmed or appear to be pre-pubescent. In some cases, images depicting sexual or abusive acts involving children and teenagers such as fellatio or penile-vaginal penetration, C3P said.In October, Lloyd Richardson, C3P's director of technology, told me that the organization decided to investigate the NudeNet training data after getting a tip from an individual via its cyber tipline that it might contain CSAM. After I published that story, a developer named Mark Russo contacted me to say that hes the individual who tipped C3P, but that hes still suffering the consequences of his discovery.Russo, an independent developer, told me he was working on an on-device NSFW image detector. The app runs locally and can detect images locally so the content stays private. To benchmark his tool, Russo used NudeNet, a publicly available dataset thats cited in a number of academic papers about content moderation. Russo unzipped the dataset into his Google Drive. Shortly after, his Google account was suspended for inappropriate material.On July 31, Russo lost access to all the services associated with his Google account, including his Gmail of 14 years, Firebase, the platform that serves as the backend for his apps, AdMob, the mobile app monetization platform, and Google Cloud.This wasnt just disruptive it was devastating. I rely on these tools to develop, monitor, and maintain my apps, Russo wrote on his personal blog. With no access, Im flying blind.Russo filed an appeal of Googles decision the same day, explaining that the images came from NudeNet, which he believed was a reputable research dataset with only adult content. Google acknowledged the appeal, but upheld its suspension, and rejected a second appeal as well. He is still locked out of his Google account and the Google services associated with it.Russo also contacted the National Center for Missing & Exploited Children (NCMEC) and C3P. C3P investigated the dataset, found CSAM, and notified Academic Torrents, where the NudeNet dataset was hosted, which removed it.As C3P noted at the time, NudeNet was cited or used by more than 250 academic works. A non-exhaustive review of 50 of those academic projects found 134 made use of the NudeNet dataset, and 29 relied on the NudeNet classifier or model. But Russo is the only developer we know about who was banned for using it, and the only one who reported it to an organization that investigated that dataset and led to its removal.After I reached out for comment, Google investigated Russos account again and reinstated it.Google is committed to fighting the spread of CSAM and we have robust protections against the dissemination of this type of content, a Google spokesperson told me in an email. In this case, while CSAM was detected in the user account, the review should have determined that the user's upload was non-malicious. The account in question has been reinstated, and we are committed to continuously improving our processes.I understand Im just an independent developerthe kind of person Google doesnt care about, Russo told me. But thats exactly why this story matters. Its not just about me losing access; its about how the same systems that claim to fight abuse are silencing legitimate research and innovation through opaque automation [...]I tried to do the right thing and I was punished.