ADVERTISEMENT

Microsoft’s AI Turns Mona Lisa Into Rapping Sensation, Video Goes Viral | WATCH

This AI can take still photos of people's faces and turn them into animated characters that move and talk just like real humans.

<div class="paragraphs"><p>Image source: Microsoft</p></div>
Image source: Microsoft

In a stunning display of technological innovation, a video featuring the Mona Lisa rapping has taken the internet by storm. The viral clip, created using Microsoft's AI technology known as VASA-1, showcases the iconic painting singing along to a rap performed by actor Anne Hathaway.

This AI can take still photos of people's faces and turn them into animated characters that move and talk just like real humans. In this case, it transformed the famous Mona Lisa into a lively rapper, complete with synchronised lip movements and expressive facial features.

The video quickly went viral after being shared on social media platforms. One post featuring the singing Mona Lisa clip has already racked up over seven million views and counting.

Users' comments on the viral video went from amusement to confusion. One user said, "This is crazy, weird, and spooky all together🤯", while another wondered, "I'm curious about what this technology will be like in a year and a half 🤯". Another user commented, "I think this rapping Mona Lisa has really messed with my brain."

What is Microsoft’s VASA?

According to Microsoft’s official website, VASA stands for Virtual Avatar Speech Animation, a revolutionary framework designed to generate lifelike talking faces from single static images and audio clips.

VASA-1, its flagship model, boasts the capability to synchronise lip movements with audio seamlessly while capturing a spectrum of facial nuances and natural head motions, lending authenticity and liveliness to virtual characters.

The core innovations include a holistic facial dynamics and head movement generation model that works in a face latent space, and the development of such an expressive and disentangled face latent space using videos.

"Through extensive experiments including evaluation on a set of new metrics, we show that our method significantly outperforms previous methods along various dimensions comprehensively," Microsoft said.

"Our method not only delivers high video quality with realistic facial and head dynamics but also supports the online generation of 512x512 videos at up to 40 FPS with negligible starting latency. It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors," it added.