Real-time digital humans are here! Digital Domain’s Digital Humans Group (DHG) has been making strides within this complex world and leading the industry in closing the gap between the uncanny valley and truly photorealistic digital beings. But creating believable digital humans is no small feat.
From its award-winning turn on The Curious Case of Benjamin Button in 2008 (which aged Brad Pitt backward as a believable CG human) to Tupac’s Pepper’s ghost based “hologram” appearing at Coachella in 2012 and Marvel’s Thanos in Avengers: Infinity War and Endgame, Doug Roble, senior director of software R&D, and DD’s DHG have been working to create the most believable digital characters possible.
Since Benjamin Button, the software developers and artists at Digital Domain have been developing techniques to make it easier to create digital humans that accurately reflect the performance of an actor. Before this initiative, creating a digital character required significant effort from talented artists. The actor’s performance was used only as a reference, and digital characters were manipulated to match the performance—all by hand. With the new technology developed at Digital Domain, an actor’s performance could, nearly automatically, be transferred onto a digital character, while still retaining the ability to be modified by an artist.
At the start of Avengers, the team realized that the technology used to animate the villain Thanos could be real-time. Using machine-learning techniques, GPU-accelerated computation and the latest rendering technology, the performance of an actor could be immediately transferred onto a realistic character. There would be absolutely no time for an artist to correct mistakes; the system would have to be able to reproduce every expression accurately and automatically.
“The move from pre-rendered photorealistic characters to real-time digital humans is basically a sea change for us at Digital Domain,” explains Roble. “It’s not only transformed how we work with the assets, the geometry, the animation and the characters, but also how it's rendered. Every part of our pipeline has changed in order to do it in real time.”
Digital Domain is using Epic’s Unreal game engine, powered by Nvidia graphics cards to replace (or complement) Maya animation and V-Ray rendering. It begins with artists and technicians developing character assets. Once the data is locked, the software takes over to stream the character in real-time.
“About two-and-a-half years ago we thought that, with the latest graphics cards and the latest machine learning, we could take the technology behind Thanos and do it in real time,” explains Roble. The DHG set out to do just that and, in April of 2019, Roble and the DHG team created the first digital human ever to give a TED Talk.
Their virtual avatar, affectionately named “DigiDoug,” is a re-creation of Roble himself. It speaks and moves with near-photorealistic, lifelike, real-time precision, with amazingly detailed facial quality down to pores, wrinkles and eyelashes. It even conveys real emotions as the real-life Roble drives the movement and speech onstage while wearing a full-body inertial motion-capture suit and headgear featuring a single camera. His movements are then processed, with help from machine-learning software, which creates the digital representation. Aside from a 1/6th-of-a-second latency delay, it is clear that they are close to cracking the code on an indistinguishable real-time human character.
Although the DigiDoug performance is in real time, creating the model of DigiDoug was not. First, high-resolution scans of Doug making a variety of expressions were captured. This data even captures how the blood in the face changes as different expressions are made and the delicate way his eyelashes move. Once the team built the model of his face using this data, Doug was captured some more! The team used traditional motion-capture techniques, combined with proprietary software, to build a large database of images of Doug’s face and high-resolution 3D meshes that accurately model Doug’s face. They also captured a large variety of expressions, emotions and dialogue.
Once built, a single camera sends a black and white image of Doug’s face to the system and the machine-learning system automatically creates a three-dimensional representation of Doug’s face—all within 16 ms. The system has become an expert at understanding Doug’s facial movements. Even if it hasn’t seen the expression he’s making, it does a very good job of reproducing his face.
Deep learning allows the algorithm to get unboundedly complex, enabling the system to be enlarged to improve fidelity and its understanding of an individual’s face. It can then represent staggeringly complicated interpolation algorithms, allowing things that were not possible before, like turning a 2D image into a piece of 3D geometry.
Building on this, the Digital Domain team has taken deep learning a step further, using the previous frame to help with the current frame. With that, the team can see where the pixels are and know where they came from. That makes things even sharper and more lifelike, according to Roble and the team.
DigiDoug’s rendering is now being done with real-time ray tracing, which allows much better lighting, thanks to Nvidia’s latest RTX cards. “The way the light bounces off the character has taken a jump in terms of fidelity,” says Roble. “The skin looks a lot more lifelike. The way light refracts and reflects off the skin is now much more in tune with the environment that the character is in.”
A 3D DigiDoug now resides in virtual reality space, able to interact with users wearing an Oculus or Vive headset, with the real Doug operating motion-capture in an adjacent room or via an internet connection. The next step, Roble says, will be to let the computer control DigiDoug on its own.
“Technologies being currently developed, like the Light Field Lab true holographic display, will be a perfect application to help enhance the overall experience of interacting with what looks and feels like a real human,” says Roble. “Someone could walk up to their display without a headset and see a digital character that looks real and start having a conversation with him or her. It would be like I was in the room with them. And that is just mind-blowing.”