When AI Can Fake Realit

2024-09-25 00:00SamGregory
语数外学习·高中版中旬 2024年8期

It’s getting harder, isn’t it, to spot real from fake, AI-generated from human- generated. With generative AI,along with other advances in deep fakery, it doesn’t takemany seconds of your voice, many images of your face, tofake you, and the realism keeps increasing.

I first started working on deepfakes in 2017, when thethreat to our trust in information was overhyped, and thebig harm, in reality, was falsified sexual images. Now thatproblem keeps growing, harming women and girls world-wide.

But also, with advances in generative AI, we’re nowalso approaching a world where it’s broadly easier to makefake reality, but also to dismiss reality as possibly faked.

Now, deceptive and malicious audiovisual AI is not theroot of our societal problems, but it’s likely to contribute tothem. Audio clones are proliferating in a range of electoralcontexts.“Is it, isn’t it”claims cloud human- rights evi-dence from war zones, sexual deepfakes target women inpublic and in private, and synthetic avatars impersonatenews anchors.

I lead WITNESS. We’re a human- rights group thathelps people use video and technology to protect and de-fend their rights. And for the last five years, we’ve coordi-nated a global effort,“Prepare, Don’t Panic,”around thesenew ways to manipulate and synthesize reality, and on howto fortify the truth of critical frontline journalists and hu-man-rights defenders.

Now, one element in that is a deepfakes rapid-responsetask force, made up of media-forensics experts and compa-nies who donate their time and skills to debunk deepfakesand claims of deepfakes. The task force recently receivedthree audio clips, from Sudan, West Africa and India. Peo-ple were claiming that the clips were deepfaked, not real.

In the Sudan case, experts used a machine-learning al-gorithm trained on over a million examples of syntheticspeech to prove, almost without a shadow of a doubt, that itwas authentic.

In the West Africa case, they couldn’t reach a defini-tive conclusion because of the challenges of analyzing au-dio from Twitter, and with background noise.

The third clip was leaked audio of a politician fromIndia. Nilesh Christopher of“Rest of World”brought thecase to the task force. The experts used almost an hour ofsamples to develop a personalized model of the politician’sauthentic voice. Despite his loud and fast claims that it wasall falsified with AI, experts concluded that it at least waspartially real, notAI.

As you can see, even experts cannot rapidly and con-clusively separate true from false, and the ease of calling“that’s deepfaked”on something real is increasing. The fu-ture is full of profound challenges, both in protecting the re-al and detecting the fake.

We’re already seeing the warning signs of this chal-lenge of discerning fact from fiction. Audio and video deep-fakes have targeted politicians, major political leaders in theEU, Turkey and Mexico, and US mayoral candidates.

Political ads are incorporating footage of events thatnever happened, and people are sharing AIWV8BdctQjjK36+Wfu4gBGg6M+lvscX9JkDarUG1Zylg=-generated imag-ery from crisis zones, claiming it to be real.

Now, again, this problem is not entirely new. The hu-man-rights defenders and journalists I work with are used tohaving their stories dismissed, and they’re used to wide-spread, deceptive, shallow fakes, videos and images takenfrom one context or time or place and claimed as if they’rein another, used to share confusion and spread disinforma-tion.

And of course, we live in a world that is full of parti-sanship and plentiful confirmation bias. Given all that, thelast thing we need is a diminishing baseline of the shared,trustworthy information upon which democracies thrive,where the specter of AI is used to plausibly believe thingsyou want to believe, and plausibly deny things you want toignore.

But I think there’s a way we can prevent that future, ifwe act now; that if we "Prepare, Don’t Panic," we’ll kindof make our way through this somehow. Panic won’t serveus well. [It] plays into the hands of governments and corpo-rations who will abuse our fears, and into the hands of peo-ple who want a fog of confusion and will use AI as an ex-cuse.

How many people were taken in, just for a minute, bythe Pope in his dripped-out puffer jacket? You can admit it.

More seriously, how many of you know someone who’sbeen scammed by an audio that sounds like their kid? Andfor those of you who are thinking“I wasn’t taken in, Iknow how to spot a deepfake,”any tip you know now is al-ready outdated. Deepfakes didn’t blink, they do now. Six-fingered hands were more common in deepfake land thanreal life — not so much.

Technical advances erase those visible and audibleclues that we so desperately want to hang on to as proof wecan discern real from fake. But it also really shouldn’t beon us to make that guess without any help. Between realdeepfakes and claimed deepfakes, we need big- picture,structural solutions.

We need robust foundations that enable us to discernauthentic from simulated, tools to fortify the credibility ofcritical voices and images, and powerful detection technolo-gy that doesn’t raise more doubts than it fixes. There arethree steps we need to take to get to that future. Step one isto ensure that the detection skills and tools are in the handsof the people who need them.

I’ve talked to hundreds of journalists, community lead-ers and human-rights defenders, and they’re in the sameboat as you and me and us. They’re listening to the audio,trying to think, “Can I spot a glitch?”Looking at the image,saying, “Oh, does that look right or not?”Or maybe they’regoing online to find a detector.

And the detector they find, they don’t know whetherthey’re getting a false positive, a false negative, or a reli-able result. Here’s an example. I used a detector, which gotthe Pope in the puffer jacket right. But then, when I put inthe Easter bunny image that I made for my kids, it said thatit was human-generated. This is because of some big chal-lenges in deepfake detection.

Detection tools often only work on one single way to make a deepfake, so you need multiple tools, and they don’twork well on low-quality social media content. Confidencescore, 0.76-0.87, how do you know whether that’s reliable,if you don’t know if the underlying technology is reliable,or whether it works on the manipulation that is being used?And tools to spot an AI manipulation don’t spot a man-ual edit. These tools also won’t be available to everyone.

There’s a trade- off between security and access, whichmeans if we make them available to anyone, they becomeuseless to everybody, because the people designing the newdeception techniques will test them on the publicly avail-able detectors and evade them.

But we do need to make sure these are available to thejournalists, the community leaders, the election officials,globally, who are our first line of defense, thought throughwith attention to real-world accessibility and use. Though atthe best circumstances, detection tools will be 85 to 95 per-cent effective, they have to be in the hands of that first lineof defense, and they’re not, right now.

So for step one, I’ve been talking about detection afterthe fact. Step two — AI is going to be everywhere in ourcommunication, creating, changing, editing. It’s not goingto be a simple binary of“yes, it’s AI”or“phew, it’s not.”AI is part of all of our communication, so we need to betterunderstand the recipe of what we’re consuming.

Some people call this content provenance and disclo-sure. Technologists have been building ways to add invisi-ble watermarking to AI- generated media. They’ve alsobeen designing ways -- and I’ve been part of these efforts --within a standard called the C2PA, to add cryptographically signed metadata to files.

This means data that provides details about the con-tent, cryptographically signed in a way that reinforces ourtrust in that information. It’s an updating record of how AIwas used to create or edit it, where humans and other tech-nologies were involved, and how it was distributed. It’s ba-sically a recipe and serving instructions for the mix of AIand human that’s in what you’re seeing and hearing.

And it’s a critical part of a new AI-infused media liter-acy. And this actually shouldn’t sound that crazy. Our com-munication is moving in this direction already. If you’relike me — you can admit it — you browse your TikTok“For You”page, and you’re used to seeing videos that havean audio source, an AI filter, a green screen, a background,a stitch with another edit.

This, in some sense, is the alpha version of this trans-parency in some of the major platforms we use today. It’sjust that it does not yet travel across the internet, it’s not re-liable, updatable, and it’s not secure. Now, there are alsobig challenges in this type of infrastructure for authenticity.

As we create these durable signs of how AI and humanwere mixed, that carry across the trajectory of how media ismade, we need to ensure they don’t compromise privacy orbackfire globally.

We have to get this right. We can’t oblige a citizenjournalist filming in a repressive context or a satirical mak-er using novel gen-AI tools to parody the powerful ... tohave to disclose their identity or personally identifiable in-formation in order to use their camera or ChatGPT.

Because it’s important they be able to retain their abili-ty to have anonymity, at the same time as the tool to createis transparent. This needs to be about the how of AI-humanmedia making, not the who.

This brings me to the final step. None of this workswithout a pipeline of responsibility that runs from the foun-dation models and the open-source projects through to theway that is deployed into systems, APIs and apps, to theplatforms where we consume media and communicate.

I’ve spent much of the last 15 years fighting, essential-ly, a rearguard action, like so many of my colleagues in thehuman rights world, against the failures of social media. Wecan’t make those mistakes again in this next generation oftechnology. What this means is that governments need to en-sure that within this pipeline of responsibility for AI, thereis transparency, accountability and liability.

Without these three steps — detection for the peoplewho need it most, provenance that is rights-respecting andthat pipeline of responsibility, we’re going to get stucklooking in vain for the six-fingered hand, or the eyes thatdon’t blink. We need to take these steps. Otherwise, we riska world where it gets easier and easier to both fake realityand dismiss reality as potentially faked.

And that is a world that the political philosopher Han-nah Arendt described in these terms:“A people that no lon-ger can believe anything cannot make up its own mind. It isdeprived not only of its capacity to act but also of its capaci-ty to think and to judge. And with such a people you canthen do what you please.”That’s a world I know none ofus want, that I think we can prevent.

Thank you.

识别真假 — —即识别 AI 生成与人类生成的内容 — —变得愈发困难,是吧?利用生成式人工智能,加上深度伪造技术的其他进展,只需几秒声音、几张脸部图像,就能伪造你的身份,而且逼真程度不断提升。

我于2017年开始研究深度伪造技术,当时信息信任遭遇的威胁被夸大,而实际上造成巨大伤害的是虚假的情色图片。如今这个问题愈演愈烈,伤害着全球的妇女和女童。

不过,随着生成式人工智能的进步,我们的世界现在不仅变得更容易伪造现实,也更容易将现实视为可能是伪造的。

现在,欺骗性和恶意的视听人工智能并不是我们社会问题的根源,但它可能会为这些问题添油加醋。各种选举环境中,音频克隆激增。 “是,不是”的争议模糊了战区的人权证据,情色深度伪造瞄准了公开和私密场合的妇女,合成头像则冒充新闻主播。

我领导着一个名为“目击者”的组织。我们是一个人权组织,帮助人们运用视频和技术来保护及捍卫自己的权利。过去五年,我们围绕这些操纵和合成现实的新方法及如何巩固重要前线记者和人权捍卫者的真相,协调开展了一项全球活动,名为“准备好,不要慌”。

其中之一是深度伪造快速应对特别小组,由媒体取证专家和公司组成,他们贡献时间和技能来揭穿深度伪造及其相关主张。小组最近收到了来自苏丹、西非和印度的三段音频剪辑。人们声称这些剪辑是深度伪造的,不是真实的。

针对苏丹的那段音频,专家使用了一种经由100多万个合成语音样本训练的机器学习算法,几乎毫无疑问地证明了其真实性。

西非音频,由于分析来自推特的音频困难重重,再加上背景噪音的影响,专家无法得出明确结论。

第三段剪辑是印度一个政客泄露的音频。科技媒体 Rest of World 的尼勒什·克里斯托弗将它提交给了特别小组。专家使用了近一个小时的样本来开发该政客真实声音的个性化模型。尽管他极力声称这全是 AI 伪造的,但专家们最终得出结论,至少部分内容是真实的,不是 AI 伪造的。

正如你们所见,即使是专家也无法快速而确切地区分真假,而且愈加容易将真实的东西说成“那是深度伪造的”。未来充满了巨大的挑战,既要保护真实,又要识别伪造。

我们已经看到了辨别事实和虚构这一挑战的预警信号。音视频深度伪造已经瞄准了政客,欧盟、土耳其和墨西哥的主要政治领袖,以及美国的市长候选人。

政治广告在使用从未发生过的事件的素材,人们在分享来自危机区域的 AI 生成图像且声称这样的图像是真实的。

再强调一遍,这个问题早已有之。与我合作的人权捍卫者和记者已经习惯了他们的报道被打回,习惯了广泛存在的欺骗性的肤浅假新闻 — —从一个背景、时间或地点获取的视频和图像被声称属于另一个背景、时间或地点,用来制造混淆和传播虚假信息。

当然,我们生活在一个充满党派偏见和大量确认偏见的世界。鉴于这一切,我们最不希望看到的就是共享可靠信息的基准线不断降低,而那样的信息是民主制度蓬勃发展所需的 — —在此,AI 这个幽灵被用来让你理直气壮地相信你想相信的事情,理直气壮地否认你想忽视的事情。

但我认为,如果我们立即采取行动,我们有办法防止这样的未来成真;如果可以“准备好,不要慌”,我们将会以某种方式渡过这个难关。恐慌对我们毫无裨益。(它)正是会滥用我们恐惧的政府和企业需要的,是那些想制造混乱并会把 AI 作为借口的人需要的。

有多少人只是一时被教皇穿着时髦羽绒服所迷惑了?你可以承认。更严重的是,有多少人知道有人被听起来像自己孩子的音频所欺骗了?对那些认为“我没有被骗,我知道如何辨别深度伪造”的人来说,现在知道的任何技巧都已经过时了。深度伪造以前不会眨眼,现在会了。在深度伪造的世界中,六指手比现实生活中更常见 — —其实并非如此。

技术进步抹去了那些我们渴望用来证明我们能够区分真假的可见和可听的线索。但我们也不该没有任何帮助就做出那种猜测。在真实的深度伪造和所谓的深度伪造之间,我们需要全局性、结构性的解决方案。

我们需要坚实的基础使我们能够辨别真实与模拟,需要工具来加强关键声音和图像的可信度,需要强大的检测技术且不会引起更多怀疑。我们需要采取三个步骤来迈向这样的未来。第一步是确保那些检测技术和工具掌握在需要它们的人手中。

我与数百名记者、社区领袖和人权捍卫者进行过交流,他们和你、我及我们的处境一样。他们听音频时会想“我能发现一个错误吗?”,看图像时会说“哦,这看起来对劲儿吗?”。或许,他们会上网找个检测器。

他们能找到检测器,但他们不知道自己得到的检测结果是错误肯定、错误否定还是确实可靠的。

举个例子,我用一个检测器成功识别了教皇穿着羽绒服的图像。但是,同一个检测器,当我输入自己为孩子制作的复活节兔子的图像时,它却说是人工生成的。这是由于深度伪造检测中存在的一些重大挑战。

检测工具通常一种工具只针对一种深度伪造方法,因此需要多种工具来进行检测,并且它们在低质量社交媒体内容上效果不佳。置信度得分为0.76-0.87,如果不知道底层技术是否可靠,或者它是否适用于正在使用的操作,怎么知道它是否可靠呢?识别AI操作的工具无法识别人工编辑。这些工具也不会提供给所有人使用。在安全性和可用性之间存在权衡,这意味着如果我们让所有人都可以使用它们,它们对每个人都会变得无用了,因为设计新欺骗技术的人会测试公开可用的检测器并规避其检测。

但我们确实需要确保这些工具可以提供给全球的记者、社区领袖、选举官员等相当于第一道防线的人,并且要考虑现实世界的可访问性和使用情况。尽管在最佳情况下,检测工具的有效性会达到85%到95%,但它们必须掌握在第一道防线的手中,而现在却没有。

因此,第一步,我一直在谈论事后检测。第二步,在我们的沟通、创造、改变和编辑中,AI 将无处不在。这不会是一个简单的二元选择, “是 AI ”或“不是 AI ”。AI 是我们所有沟通的一部分,因此我们需要更好地了解我们所消费内容的配方。

有些人称之为内容来源和披露。技术人员一直在构建方法来为 AI 生成的媒体添加隐形水印。

他们还在设计方式 — —我参与了这些工作 — —在一个名为C2PA的标准内,为文件添加加密签名的元数据。

这意味着提供有关内容详细信息的数据,以一种加密方式签名,可以加强我们对该信息的信任。

它是使用 AI 创建或编辑内容的更新记录,涉及人类和其他技术的参与方式,以及内容的分发方式。

基本上,它是你所看到和听到的融合 AI 和人类的配方和使用说明。

这是新型 AI 媒体素养的关键组成部分。实际上,这听起来不应该那么疯狂。我们的沟通已经朝着这个方向发展了。如果你和我一样 — —你可以承认 — —你会浏览 TikTok 的“推荐”页面,你已经习惯看到的视频有音频来源、AI 滤镜、绿幕、背景、与其他编辑的组合。

在我们今天使用的一些主要平台上,这在某种意义上是这种透明度的初始版本。只是它尚未在互联网上传播,不可靠、不可更新,也不安全。现在,这种确保真实性的基础设施也存在着重大挑战。

当我们创造这些持久的标记来说明 AI 和人类的融合方式时,这些标记会在媒体制作过程中传播,我们需要确保它们不会损害隐私或在全球产生逆转效果。

我们必须搞清楚这一点。我们不能强迫在压迫环境中拍摄的人民记者或使用新颖的生成式 AI工具来戏仿强权的讽刺创作者……必须披露他们的身份或个人身份信息才能使用摄像头或 ChatG-PT。

因为重要的是,他们能够保持匿名,同时能够透明地创作。应该关注的是 AI 和人类媒体制作的方式,而不是制作者的身份。

这就引出了最后一步。如果没有从基础模型和开源项目到部署到系统、API 和应用程序,再到我们消费媒体和进行沟通的平台的责任链,所有这些都不会起作用。

过去15年,和很多人权界的同事一样,我花了很多时间战斗,基本上是一场防守战,针对社交媒体的失败。在这一代技术中,我们不能再犯同样的错误。这意味着政府需要确保 AI 的这一责任链中存在透明度、责任和义务。

如果没有这三个步骤 — —为最需要的人提供检测、尊重权利的出处和责任链,我们将徒劳地寻找六指手或不眨眼的眼睛。我们需要采取这些步骤。否则,我们将面临一个情景,那就是伪造现实和可能视现实为伪造会变得越来越容易。

那样的世界就是政治哲学家汉娜·阿伦特如此描述的世界: “无法再相信任何事情的人会无法自己做决定。他们不仅丧失了行动的能力,还失去了思考和判断的能力。对于这样的人,你可以为所欲为。”我知道没有人想要这样的世界,而我认为我们可以阻止世界变成那个样子。

谢谢。