Over 30 years later, while I would've never anticipated smartphones... I really thought impersonation technology through video & audio editing (not dependent upon look-alike actors) would've been here sooner. Another example of wildly underestimating the complexity of what might seem like a simple problem.
It wouldn't solve any of the fundamental problems of trust, of course (namely, the issue of people cargo-culting a specific point of view and only trusting the people that reinforce it). But, it would at least allow people to opt out of sketchy "unsigned" videos showing up on their feeds.
I guess it would also allow people to get out of embarrassing situations by refusing to sign. But, maybe that's a good thing? We already have too much "gotchya" stuff that doesn't advance the discourse.
I might be far fetching here, but wouldn't this lead to people being more mindful of what they watch and interact with? I think all that it will take is a few "state of the art" deepfakes to cause a ruckus and the domino effect should do the rest.
Anyone in the field spent time thinking on this or has had similar notions?
Seems like every AI project does something halfheartedly, ponders what the world will be like once it’s perfected, and then starts the next project long before the first project is actually useful for anything but meme videos.
I didn't think it would be possible to do in this decade, but we seem to be making progress fast now. Very impressive to see. (and scary)
But even the failures at temporal coherence have their own aesthetic appeal. Like all of this stuff has been it's very "dreamy" the way the clothing subtly shifts forms.
Beyond the coolness I'm glad that individual people are getting access to digital manipulation capabilities that have only before been available to corporations, institutions, and government before.
> These include anatomy, psychology, basic anthropology, probability, gravity, kinematics, inverse kinematics, and physics, to name but a few. Worse, the system will need temporal understanding of such events and concepts...
I wonder if unsupervised learning (as could be achieved by just pointing a video camera at people walking around a mall) will become more useful for these sorts of model; one could imagine training an unsupervised first-pass that simply learns what kind of constraints physics, IK, temporality, and so on will provide. Then given that foundation model, one could layer supervised training of labels to get the "script-to-video" translation.
Basically it seems to me (not a specialist!) that a lot of the "new complexity" involved in going from static to dynamic, and image to video, doesn't necessarily require supervision in the same way that the existing conceptual mappings for text-to-image do.
Combined with the insights from the recent Chinchilla paper from DeepMind (which suggested current models could achieve equal performance if trained with more data and fewer parameters), perhaps we don't actually need multiple OOMs of parameter increases to achieve the leap to video.
Again, this is not my field, so the above is just idle speculation.
For motion, there's yet another layer of fakery required (and this is something security / identity detection systems tackle nowadays) -- stuff like gait, typical motions or gestures or even poses. To deepfake a Tom Cruise clone, you need to not just look like the actor, but project the same manic energy, and signature movements.
I'm pretty sure Slashdot is willing to put up the money for thousands of renders of "Natalie Portman pours Hot Grits over <thing>" alone.
If a model is already trained on lots of images and captions, it would probably be possible to just feed it tons of whatever video and let it figure out the rest itself.
Should Tom Cruise heirs receive a perpetual rent 200 years from now when Mission Impossible 57 staring their ancestor is airing?
What regulation should be put in place / would be effective in a world where any teen with the latest trending scoial media app on their phone can realistically impersonate a celebrity in real-time for likes?
I think deepfakes have the power to do much more real, immediate damage to society vs the "threat" of AGI
Honest question. It’s going to be a long trip throughout the Uncanny Valley where everyone will clearly notice the fakery and then … what? What is the end goal here? Ok, making more Superman movies starring Christopher Reeves, obviously. But then what?
To quote someone who deserves to be more in more deep-fakes, “Your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should.”
It's also concerning to imagine the social impact this could have on young boys as well, in a climate where pornography addiction issues become more visible each year.