Google’s generative video model Veo 3 has a subtitles problem

MIT Technology Review - AI
Jul 15, 2025 14:40
Rhiannon Williams
1 views
airesearchtechnology

Summary

Google’s new generative video model, Veo 3, introduced the ability to create sounds and dialogue alongside hyperrealistic video clips, quickly attracting attention from creatives. However, the model struggles with generating accurate subtitles, highlighting ongoing challenges in synchronizing audio and text in AI-generated content. This limitation points to the need for further advancements in multimodal AI systems for seamless video production.

As soon as Google launched its latest video-generating AI model at the end of May, creatives rushed to put it through its paces. Released just months after its predecessor, Veo 3 allows users to generate sounds and dialogue for the first time, sparking a flurry of hyperrealistic eight-second clips stitched together into ads, ASMR videos,…