Image credit: Original image 'Real v Fake'
Originally appeared in LinkedIn Future Singularity
I attended a number of great sessions at SXSW 2024 related to AI and one that really resonated with me was on the topic of deep fakes and detecting the use of AI when confronted with something controversial or out of the norm. It's become more difficult, certainly, but companies like Reality Defender are confronting the challenge head on. This article was based on and inspired by the presentation from #SXSW.
The common consensus used to always be the idiom "you have to see it to believe it". But now, given the digital nature of nearly everything in our everyday lives, it's become more of an "if you can see it you need to verify it". Since the release of AI tools to the general public in late 2022/early 2023, they have been used to not only create interesting, creative, or provocative content, but also malicious and dangerous content.
Some real-world examples of deepfakes used for malicious purposes include:
- A fabricated image of the Pope in a "puffer" jacket (March 2023).
- A deepfake video of Ukrainian president Zelenskyy surrendering (December 2022).
- A fake photo of a Pentagon bombing, causing a brief stock market dip (May 2023).
These incidents illustrate the potential for deepfakes to manipulate public opinion. The final example showcased the potential financial impact of deepfakes, highlighting the need for robust detection methods.
Now with platforms like Sora, a diffusion transformer from OpenAI, it's possible to make images and video from text in a "generalized" and "scalable" way. Creating amazingly realistic video content with believable real-world interactions.
Deepfakes, have at times, hit closer to home with bad actors replicating a person's image, and often their voice, and use them to advance the bad actor's agenda on the behalf of the bad actor. Examples include things like emulating a child's voice and contacting their parents, demanding money for their safety - even though the child may not be in danger, nor even aware that the event is taking place. More recently, during the primary elections in New Hampshire, a deepfake, AI-generated robocall of Joe Biden telling some New Hampshire voters not to vote in the state primary election and "save their vote for November".
The Challenge of Apathy
The challenge to defending against deepfakes comes in when those who you are trying to educate on the issue aren't directly affected, so they have no "skin in the game". A comparison could be made when you think about people using weak passwords. They are told all the time to use more secure passwords and different passwords for everything for security purposes. However, people don't usually listen to this advice until they are either "made to" by their organization - either work or school, or they themselves are "hacked". Then there is a visceral reaction, a personal and emotionally driven decision, to now modify their passwords and recommend to others that they do the same.
Some of Reality Defender's customers include banks. They use the software to detect deepfakes when people change their banking information in some institutions that allow for the use of their voice as a part of their security process. The software can detect if a generative platform has been used to replicate the user's voice to maliciously attempt to modify the account holder's information.
The solution to AI problems is often AI. But how? What indicators are used?
Identifying Through Indicators
Voice indicators when trying to determine if audio was created using text to speech or voice to voice can be somewhat complex. However, humans tend to recognize natural speech patterns in the audio that they are listening to. As the technology has advanced, it's become harder to discern what is real and what isn't. Free tools exist to help determine the authenticity of an audio recording, but more powerful systems like those offered by Reality Defender can analyze the frequencies of the audio.
Images can sometimes be detected by looking for specific traits like excessive glossiness, or unnatural shadows, distorted text in signage or the absence of diverse objects in the background. However, as the sophistication of the generation tools evolves, so too must the detection techniques. Beyond using tools like those offered by Reality Defender, you can also take personal steps like performing reverse image searches or reviewing the meta data of an image – which may even contain the name of the generator used to create the image.
People in the crowd offered up that perhaps there could be requirement for including watermarks on AI generated content? Sadly, there isn't a good way to enforce something like that. Bad actors will either fake the watermark or completely disregard it. Like most legislation, the people who want to do good, who want to follow the rules, don't really require the legislation. The criminals, or bad actors, simply disregard the legislation because they aren't really operating in legal channels anyways.
Text indicators are sometime more readily recognized. Perhaps too many duplicative uses and modifiers/qualifiers. Text that is too perfect and missing the imperfections that come with human writing. The more obvious indicator is often a change in voice, tone, or person midway through the content.
The Evolving Arms Race
AI fakes are getting better and better, while reality remains the same. It's important that as we develop newer AI models that they take into consideration the validity of "the real" in order to determine "the fake".
In a world of inference, results are certainly probabilistic, and there are no guarantees that the detection of deepfakes is 100% accurate. As the deepfakes get better and AI more refined, the AI models used to detect them need to grow. Emphasizing the importance of developing AI models that prioritize the concept of "real" to effectively detect "fake." The hope is to get ahead of the next iteration instead of merely being reactive to the threat. Reality Defender understands that this requires huge and deep datasets, that of course need to be as unbiased as possible. The datasets are filtered to not use biases based on race, instead they include the Monk Skin Tone Scale.
The Financial Threat of Deepfakes
In June of 2023, the CEO of LexisNexis warned that if the government did not work to confront this issue quickly we could see $1 trillion dollars in AI-assisted fraud within 12-months. Those transactions could easily include the previously mentioned voice, image, and text-to-video deepfakes, highlighting the evolving nature of the threat. If your personal information is online, your image, your information in your social media accounts, etc. All of this information combined can be used to make realistic deep fake versions of you and your information to easily break through the protection systems in place today.
The Future of Deepfake Defense
Ben Colman from Reality Defender predicts a rise in AI-assisted fraud encompassing voice, image, and even text-to-video manipulation. This further shows how deepfakes pose an unprecedented challenge to truth and trust in the digital age. Their potential for harm is far-reaching, affecting not only individuals but also institutions and society as a whole. By combining regulation, technological solutions, and education, we can mitigate the risks associated with deepfakes and preserve the integrity of our digital interactions. In fact, Reality Defender envisions their technology becoming a default security feature on all devices, similar to how antivirus software operates today. Their ultimate goal is to provide seamless protection against AI-based threats.
What are your thoughts? Did you attend this session, or any others related to the topic at SXSW 2024?