Recently, Nvidia AI researchers have introduced a new type of AI which generates 2D talking heads during video conferences. The company’s research team says that they are capable of achieving a wide range of manipulation, from rotating or motion transfer and video reconstruction.
It uses the first frame in a video for a 2D photo and then uses unsupervised learning to gather 3D key points.
Additionally, in tests, to outperforming other approaches they have used benchmark datasets achieves of H.264 quality video using one-tenth of the bandwidth.
On Monday, Arun Mallya, Ting-Chun Wang, and Ming-Yu Liu, Nvidia Research Scientists published a paper about the model. In the results, they have shown that the latest AI model outperforms vid2vid, a few-shot GAN detailed at NeurIPS last year.
“By modifying the keypoint transformation only, we are able to generate free-view videos. By transmitting just, the keypoint transformations, we can achieve much better compression ratios than existing methods,” the paper reads. “By dramatically reducing the bandwidth and ensuring a more immersive experience, we believe this is an important step toward the future of video conferencing.”
In October, the release of the model followed the debut of Maxine which is Nvidia’s video conferencing service. Besides, Maxine will deliver subtle AI-powered features similar to Zoom’s research such as face alignment, noise reduction. and like a conversational AI avatar or live translation.
Microsoft Teams and Zoom’s Video calls also use forms for multiple things such as to blur backgrounds and power augmented reality animation and effects.
Today, Microsoft has introduced an update to the Teams calling experience. Also, a paper about the Nvidia AI releases a day before that could shake up the enterprise communications landscape and fuel the feud between Slack and Microsoft Teams.