Wednesday, May 20, 2026
Home / Technology / Google’s Gemini Omni turns images, audio, and text...
Technology

Google’s Gemini Omni turns images, audio, and text into video — and that’s just the start

CN
CitrixNews Staff
·
Google’s Gemini Omni turns images, audio, and text into video — and that’s just the start

When Google launched Gemini three years ago, the goal was to build a multimodal large language model — a single neural network that was trained on text, image, audio, and video and could generate content in any of those formats.

Today, at its Google I/O developer conference, the company took a concrete step toward that goal with Gemini Omni, a new family of multimodal models that Google CEO Sundar Pichai says will be able to “create anything from any input.” 

Originally reported by TechCrunch