
The motif on the left is a real image. The researchers implemented a semantic watermark in it, which should normally proof that the image is generated. The result is visible on the right. The addition of the watermark left hardly any traces in the image; the manipulated version shows slightly shifted edges and blurring.
Computer Science
Semantic Watermarks for AI Image Recognition Can Be Easily Manipulated
Watermarks are supposed to help decide whether an image is genuine or not. But the technology can be easily tricked.
Images generated by Artificial Intelligence (AI) are often almost indistinguishable from real images to the human eye. Watermarks – visible or invisible markers embedded in image files – may be the key to verify whether an image was generated by AI. So-called semantic watermarks, which are embedded deep within the image generation process itself, are considered to be especially robust and hard to remove. Cybersecurity researchers from Ruhr University Bochum, Germany, however showed that this assumption is wrong. In a talk at the Conference on Computer Vision and Pattern Recognition (CVPR) on June 15 in Nashville, Tennessee, USA, the team revealed fundamental security flaws in the supposedly resilient watermarking techniques.
“We demonstrated that attackers could forge or entirely remove semantic watermarks using surprisingly simple methods,” says Andreas Müller from Ruhr University Bochum’s Faculty of Computer Science, who co-authored the study alongside Dr. Denis Lukovnikov, Jonas Thietke, Professor Asja Fischer, and Dr. Erwin Quiring.
Two novel attack strategies
Their research introduces two novel attack strategies. The first method, known as the imprinting attack, works at the level of latent representations – i.e. the underlying digital signature of an image on which AI image generators work. The hidden representation of a real image – its underlying digital structure, so to speak – is deliberately modified to resemble that of an image containing a watermark. This makes it possible to transfer the watermark onto any real image, even though the reference image was originally purely AI-generated. An attacker can therefore deceive an AI provider by making any image appear watermarked – and thus artificially generated – effectively making real images look fake.
“The second method, the reprompting attack, exploits the ability to return a watermarked image to the latent space and then regenerate it with a new prompt. This results in arbitrary newly generated images that carry the same watermark,” explains co-author Dr. Erwin Quiring from Bochum’s Faculty of Computer Science.
Attacks work independently of AI architecture
Alarmingly, both attacks require just a single reference image containing the target watermark and can be executed across different model architectures; they workfor older legacy UNet-based systems as well as for newer diffusion transformers. This cross-model flexibility makes the vulnerabilities especially concerning.
According to the researchers, the implications are far-reaching: Currently, there are no effective defenses against these types of attacks. “This calls into question how we can securely label and authenticate AI-generated content moving forward,” Müller warns. The researchers argue that the current approach to semantic watermarking must be fundamentally rethought to ensure long-term trust and resilience.