A production version of Maxine is now available in NVIDIA AI Enterprise; research demo shows how 3D technology can enhance video communications.
NVIDIA Maxine expands to video editing
Wireless connectivity enables people to participate in virtual meetings from more locations than ever before. Often, audio and video quality suffers significantly when callers are on the move or in locations with poor connections.
Advanced real-time Maxine features such as background noise cancellation, super-resolution, and eye contact enable remote users to enhance their interpersonal communication experience.
Additionally, Maxine can now be used for video editing. NVIDIA partners are transforming this professional workflow by taking the same capabilities as Maxine to take video conferencing to the next level. The goal of editing a video (whether it’s a sales pitch or a webinar) is to reach the widest audience possible. With Maxine, professionals can leverage artificial intelligence capabilities to enhance audio and video signals.
With Maxine, presenters can look away from the screen to view notes or scripts, while still keeping their eyes on the camera. Users can also shoot video at a low resolution and increase the quality later. Plus, Maxine allows people to record videos in many different languages and export them in English.
Maxine features that will be released in Early Access this year include:
Interpreter: Translate Simplified Chinese, Russian, French, German, and Spanish into English while animating images of users to show them speaking English.
Speech Fonts: Enables users to apply characteristics of a speaker’s voice and map them to audio output.
Audio Super-Resolution: Improve audio quality by increasing the temporal resolution and expanding bandwidth of audio signals. It currently supports upsampling from 8,000Hz to 16,000Hz and from 16,000Hz to 48,000Hz. This feature has also been updated, reducing latency by over 50% and increasing throughput by 2x.
Maxine Client: Brings the AI capabilities of Maxine microservices to video conferencing sessions on PC. The application is optimized for low-latency streaming and will use the cloud for all of its GPU computing requirements. Thin clients will be available on Windows this fall, with support for more operating systems to follow.
Maxine can be deployed in the cloud, on-premises or at the edge, meaning high-quality communications can be conducted from virtually anywhere.
Take video conferencing to a new level
Many partners and customers are using Maxine to experience high-quality video conferencing and editing. Two of Maxine’s features — Eye Contact and Real-Time Portrait — are now available in production versions of the NVIDIA AI Enterprise software platform. Eye Contact simulates direct eye contact with the camera by estimating the user’s gaze and aligning it with the camera. Live Portraits Animate portrait photos of people with a live video feed.
Software company Descript aims to make video a staple of every communicator’s toolkit, along with docs and slides. With NVIDIA Maxine, Descript professionals and beginners alike can use AI capabilities to improve their video content workflows.
”With the NVIDIA Maxine Eye Contact feature, users no longer need to worry about memorizing scripts or doing tedious video re-recording,” said Jay LeBoeuf, director of business and corporate development at Descript. “They can nail the script every time while maintaining a perfect screen presence.”
Reincubate’s Camo app aims to expand access to great video by leveraging the hardware and devices people already have. It does this by giving users greater control over images and enabling a powerful, efficient video effects and transition processing pipeline. Utilizing technology powered by NVIDIA Maxine, Camo offers users an easier way to achieve incredible video creation.
”Integrating NVIDIA Maxine into Camo couldn’t be easier and it allows us to get high performance out of our users’ RTX GPUs immediately,” said Aidan Fitzpatrick, founder and CEO of Reincubate. “With Maxine, teams can move faster and with more confidence.”
Quicklink’s Cre8 is a powerful video production platform for creating professional branding, virtual and hybrid live events. The user-friendly interface combines an intuitive design with all the tools you need to build, edit, and customize professional-looking productions. Featuring NVIDIA Maxine technology, the Cre8 maximizes productivity and video production quality, giving operators complete control.
”Quicklink Cre8 now offers the world’s most advanced video production platform,” said Quicklink CEO Richard Rees. “With NVIDIA Maxine, we were able to add advanced features including automatic framing, video noise removal, noise and echo cancellation, and eye contact simulation .”
Los Angeles-based gemelo.ai offers a platform for creating AI twins that scale users’ voice, content, and interactions. Using Maxine’s live portrait capabilities, the gemelo.ai team can bring new opportunities for large-scale, personalized content and one-on-one engagement.
“The realism of Live Portrait is game-changing and opens up new areas of potential for our AI twins,” said Paul Jaski, CEO of gemelo.ai. The superpower of infinite scalability in content creation and interaction across apps, websites and mixed reality experiences.”
NVIDIA Research Shows How 3D Video Enhances Immersive Communication
In addition to powering Maxine’s advanced features, NVIDIA AI also enhances 3D video communications. NVIDIA Research recently published a paper showing how AI can power a 3D videoconferencing system with minimal capture equipment.
3D telepresence systems are typically expensive, require large spaces or production studios, and use high-bandwidth, volumetric video streaming—all of which limit the technology’s accessibility. NVIDIA Research shared a new approach that runs on a novel VisionTransformer-based encoder that takes 2D video input from a standard webcam and converts it into a 3D video representation. Instead of passing 3D data back and forth between meeting participants, AI keeps the bandwidth requirements of the call the same as a 2D meeting.
The technology takes a 2D video of the user and uses volume rendering to automatically create a 3D representation called a Neural Radiative Field (NeRF). As a result, participants can transmit 2D video like traditional video conferencing, while simultaneously decoding a high-quality 3D representation that can be rendered in real time. With Maxine’s Live Portrait, users can bring their portraits to life in 3D.
AI-mediated 3D videoconferencing can significantly reduce 3D capture costs, provide high-fidelity 3D representations, accommodate realistic or stylized avatars, and enable mutual eye contact during videoconferencing. Related research projects demonstrate how AI can help improve communication and virtual interactions, and inform future NVIDIA videoconferencing technologies.
I don’t think the title of your article matches the content lol. Just kidding, mainly because I had some doubts after reading the article.
I don’t think the title of your article matches the content lol. Just kidding, mainly because I had some doubts after reading the article.