Funny thing, as a clueless little kid in the 80s whose mind was shaped by popular fiction, I often suspected this kind of thing already existed back then. One of my 'gotcha' questions for adults was, "I've only ever seen him on TV, so how do I know Ronald Reagan is even real?"

Over 30 years later, while I would've never anticipated smartphones... I really thought impersonation technology through video & audio editing (not dependent upon look-alike actors) would've been here sooner. Another example of wildly underestimating the complexity of what might seem like a simple problem.

Maybe a dumb idea, but I wonder if there's a future in cryptographically signing videos in order to prove provenance. I'm imagining a G7 meeting, for instance, where each participant signs the video before it's released. Future propagandists, in theory, wouldn't be able to alter the video without invalidating the keys. And public figures couldn't just use the "altered video" excuse as a get-out-of-jail free card.

It wouldn't solve any of the fundamental problems of trust, of course (namely, the issue of people cargo-culting a specific point of view and only trusting the people that reinforce it). But, it would at least allow people to opt out of sketchy "unsigned" videos showing up on their feeds.

I guess it would also allow people to get out of embarrassing situations by refusing to sign. But, maybe that's a good thing? We already have too much "gotchya" stuff that doesn't advance the discourse.

Hmm, this does make me wonder what kind of an effect will deepfakes have on people's general perception of the world?

I might be far fetching here, but wouldn't this lead to people being more mindful of what they watch and interact with? I think all that it will take is a few "state of the art" deepfakes to cause a ruckus and the domino effect should do the rest.

Anyone in the field spent time thinking on this or has had similar notions?

How about first making deepfake faces actually believable?

Seems like every AI project does something halfheartedly, ponders what the world will be like once it’s perfected, and then starts the next project long before the first project is actually useful for anything but meme videos.

The Jennifer Connelly and Henry Cavill demo on that page makes me think of the Scramble Suit from A Scanner Darkly

Now everyone can build their own Star Wars sequel movies! I was wondering about that after the disaster that was TROS.

I didn't think it would be possible to do in this decade, but we seem to be making progress fast now. Very impressive to see. (and scary)

None of the videos on this page really look convincing. In terms of generating static photos existing "photoshops" people have been making for 25 years are far better. I don't see the need to clutch pearls and call for new laws to put people in prison quite yet.

But even the failures at temporal coherence have their own aesthetic appeal. Like all of this stuff has been it's very "dreamy" the way the clothing subtly shifts forms.

Beyond the coolness I'm glad that individual people are getting access to digital manipulation capabilities that have only before been available to corporations, institutions, and government before.

I have been wondering if using a human 3d model (which are quite real, but not 100% there yet) can be overwritten by better texturing -after the render- for complete immersion. So you use a motion tracked animation of a 3d model (or static for a picture) and then apply a way to make the last bit more convincing with better texture and lighting.
> But if you want to describe human activities in a text-to-video prompt (instead of using footage of real people as a guideline), and you’re expecting convincing and photoreal results that last more than 2-3 seconds, the system in question is going to need an extraordinary, almost Akashic knowledge about many more things than Stable Diffusion (or any other existing or planned deepfake system) knows anything about.

> These include anatomy, psychology, basic anthropology, probability, gravity, kinematics, inverse kinematics, and physics, to name but a few. Worse, the system will need temporal understanding of such events and concepts...

I wonder if unsupervised learning (as could be achieved by just pointing a video camera at people walking around a mall) will become more useful for these sorts of model; one could imagine training an unsupervised first-pass that simply learns what kind of constraints physics, IK, temporality, and so on will provide. Then given that foundation model, one could layer supervised training of labels to get the "script-to-video" translation.

Basically it seems to me (not a specialist!) that a lot of the "new complexity" involved in going from static to dynamic, and image to video, doesn't necessarily require supervision in the same way that the existing conceptual mappings for text-to-image do.

Combined with the insights from the recent Chinchilla paper[1] from DeepMind (which suggested current models could achieve equal performance if trained with more data and fewer parameters), perhaps we don't actually need multiple OOMs of parameter increases to achieve the leap to video.

Again, this is not my field, so the above is just idle speculation.


It's interesting to consider the "full body" deepfakes, but wouldn't the limitation of face deepfakes be even more constraining here? The proportions of limbs' length vs torso, hip / shoulder ratio etc -- it seems like a more effective approach (and something already in commercial use) would be mocap + models -- and that's just for still images.

For motion, there's yet another layer of fakery required (and this is something security / identity detection systems tackle nowadays) -- stuff like gait, typical motions or gestures or even poses. To deepfake a Tom Cruise clone, you need to not just look like the actor, but project the same manic energy, and signature movements.

That the two splashy examples are hot people in their underwear is pretty telling for what one major use of this will be. Makes me feel weird. I find takes on deepfakes fraying shared epistemology alarmist, people will continue to believe whatever they want to believe and falsifying evidence is still a crime, but the ability to conjure moving images of whatever human body you want without that person's permission feels bad. DALL-E adding protections against sexual or violent imagery is a short term solution, at best, IMO. Maybe I'm being alarmist, too. Perhaps it won't be as easy as toggling a switch next to your friend's photo to take their clothes off.
Reminds me of the Michael Chrichton movie named "Looker"
Great article, but showcasing this tech by demonstrating that you can have half naked pictures and videos of real people without their (top half) consent is not going to go down well.
The road to realistic full-body deepfakes will be through the adult entertainment industry because of course it will. Some academics may begin the discussion but at the end of the day this is one part of AI image generation that has a clear and extremely large profit motive and won't struggle to find funding in any way.

I'm pretty sure Slashdot is willing to put up the money for thousands of renders of "Natalie Portman pours Hot Grits over <thing>" alone.

What's going on with the scrolling behavior of this page? I'm getting a very annoying "scrolling with inertia" behavior in Chrome for desktop.
TV shows won't need to do casting for extras any more, they'll just have the main cast and then one person who plays all the other characters.
I don't think you need videos with extreme levels of annotations as this article suggests.

If a model is already trained on lots of images and captions, it would probably be possible to just feed it tons of whatever video and let it figure out the rest itself.

In a soon-approaching world where all movies have deep-fake actors, popular music is generated etc. how do you approach the economics of creativity and content generation?

Should Tom Cruise heirs receive a perpetual rent 200 years from now when Mission Impossible 57 staring their ancestor is airing?

What regulation should be put in place / would be effective in a world where any teen with the latest trending scoial media app on their phone can realistically impersonate a celebrity in real-time for likes?

I love those text-to-video samples with people screaming into phones.
This might be an outlier, but I think the benefit of completely outlawing deepfakes is worth the "but freedom!" harm.

I think deepfakes have the power to do much more real, immediate damage to society vs the "threat" of AGI

The scrolling behavior on this page is horrendous.
The Great Dictator 2023 with Charlie Chaplin would be great!
Why would you want to do that, though?

Honest question. It’s going to be a long trip throughout the Uncanny Valley where everyone will clearly notice the fakery and then … what? What is the end goal here? Ok, making more Superman movies starring Christopher Reeves, obviously. But then what?

To quote someone who deserves to be more in more deep-fakes, “Your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should.”

It's quite frightening to imagine what this could do when weaponised against women, used for harassment and the creation of nonconsensual pornography based on people's likeness. I wonder if this is one of the first things we'll start seeing legislation relating to.

It's also concerning to imagine the social impact this could have on young boys as well, in a climate where pornography addiction issues become more visible each year.