Gemini Omni Flash Video Generator

Gemini Omni Flash is the first model in Google's new Omni family. It treats video as the starting point of an 'any input, any output' system—mix text prompts, reference images, audio clips, and short videos in a single ask, and the model reasons across all of them to make one cohesive clip. Conversational editing lets you keep refining the same scene turn after turn while characters, physics, and continuity hold.

Model

Coming Soon

Gemini Omni Flash Video

Google's first "create anything from any input" model, starting with video. Mix text, images, audio, and clips in one prompt, then edit by conversation.

Any Input, One Video

Mix text, images, audio, and reference clips in a single prompt. Omni Flash reasons across all of them and renders one cohesive result.

Conversational Editing

Refine the same scene turn after turn. Lighting, characters, camera, action—change any of it while continuity and physics hold across edits.

Real-World Physics & Knowledge

Believable gravity, fluids, and impact, grounded in Gemini's understanding of history, science, and culture for scenes that feel coherent.

Native Synchronized Audio

Dialogue, sound effects, and music are generated together with the picture. Lip-sync and beat-matching land on the frame, no separate audio pass.

Gemini Omni Flash: Any Input, One Video

Google's first 'create-anything' model starts with video. Mix text, images, audio, and existing clips in one prompt, edit by conversation, and let the Gemini knowledge base ground every scene in real-world physics.

Mix Text, Images, Audio, and Video in One Prompt

Gemini Omni Flash reads every modality in the same prompt and produces a single cohesive video. Hand it a paragraph of script plus a reference photo, a voice memo, and a 3-second clip you want to match the motion of—it reasons across all four and renders one result. No stitching, no separate audio pass, no juggling tools.

Join the Waitlist

Edit Your Video by Talking to It

Every conversational turn layers on the last. Change the lighting, swap the outfit, move the camera behind the subject, add a second character walking in from the right—the scene remembers what came before. Characters stay consistent, physics holds up, and continuity carries across edits, so you can keep refining the same clip instead of regenerating from scratch.

Join the Waitlist

Scenes That Actually Obey Physics

Omni Flash has an upgraded intuition for gravity, fluids, kinetic energy, and impact. Liquid splashes the way liquid splashes, hair and fabric carry weight, and bouncing balls actually bounce. Combined with Gemini's knowledge of history, science, and culture, your prompts read more like directions and less like a wishlist of disconnected effects.

Join the Waitlist

Native Synchronized Audio Out of the Box

Dialogue, ambient sound, and music are generated together with the picture—not bolted on afterwards. Lip-sync lines up with the action on screen, sound effects hit on impact, and music sits naturally under the cut. The finished MP4 is ready to drop into YouTube Shorts, an ad, or a social post without a separate sound-editing pass.

Join the Waitlist

How To Use Gemini Omni Flash Video

Generate Your First Clip in 3 Steps

Bring whatever inputs you have, describe what you want, then refine through conversation.

Drop In Your Inputs

Start with whatever you have: a written prompt, a reference photo, a voice memo, a short video to use as motion reference—or any combination. You don't have to pick one mode. Omni Flash treats all of them as part of the same instruction and figures out how they fit together.

Describe the Scene You Want

Write the scene in plain language—who is in it, what they're doing, the mood, the camera move, the aspect ratio you want for short-form or widescreen, and whether you want generated dialogue or sound effects. The model uses Gemini's real-world knowledge to fill in the details you don't spell out.

Refine by Talking, Then Download

Watch the first take, then keep talking: 'make it night,' 'add a second person walking in,' 'pull the camera back.' Each turn layers on the previous one without breaking continuity. When the clip lands, download the finished MP4 with synchronized audio baked in.

Why Choose Us

Why Gemini Omni Flash Stands Out

Six reasons Google's new multimodal video model is a different kind of generator.

🎛️ Truly Multimodal Prompts

Text plus image plus audio plus reference video—all in the same prompt, reasoned over together. No more juggling separate tools for each input type or stitching results together by hand.

💬 Conversational Editing

Refine the same scene turn after turn. Lighting, characters, camera, action—change any of it without losing the thread of what you already built.

🌍 Grounded in Real-World Knowledge

Omni inherits the Gemini base model's understanding of history, science, and culture, so prompts about specific places, eras, or styles come back recognizable, not generic.

⚖️ Believable Physics

Gravity, fluids, kinetic energy, and impact behave the way they should. Splashes splash, fabric carries weight, and motion has consequences instead of floating across the frame.

🎵 Native Synchronized Audio

Dialogue, sound effects, and music are rendered together with the picture. Lip-sync, beat-matching, and ambient sound land on the frame without a separate audio pass.

🛡️ SynthID Watermark on Every Output

Every clip carries Google's invisible SynthID mark, verifiable through the Gemini app, Chrome, and Search—so the videos you make are provably yours and traceable as AI-generated.

FAQ

Gemini Omni Flash Video FAQ

Common questions about Google's new multimodal video model and how it differs from earlier video generators.

What is Gemini Omni Flash?

Gemini Omni Flash is the first model in Google's new Gemini Omni family, announced at Google I/O on May 19, 2026. It is a multimodal model that generates video from any combination of text, image, audio, and video inputs, with conversational editing built in. Google's stated goal for the Omni line is 'any output from any input,' starting with video.

How is Gemini Omni Flash different from Veo?

Veo is Google's earlier standalone video generator. Gemini Omni Flash brings video generation into the core Gemini system and adds true multimodal prompting—text, image, audio, and video reference all in the same ask—plus conversational editing where each turn builds on the last. It also generates synchronized audio natively rather than as a separate pass.

How long can a Gemini Omni Flash clip be?

Single clips are currently capped at 10 seconds. Google has described this as a deployment decision—a way to widen access while compute demand is high—rather than a hard model limit. Longer durations are expected as the Omni family expands.

Can I edit a video after it's generated?

Yes—conversational editing is one of the headline features. Watch the first take, then keep talking: change the lighting, the camera angle, the action, add or remove characters, swap the environment. Each instruction layers on the last, and characters, physics, and continuity carry across turns instead of resetting.

Does Gemini Omni Flash generate audio with the video?

Yes. Dialogue, sound effects, ambient sound, and music are rendered together with the picture as part of the same generation. Lip-sync lines up with the action on screen and sound effects hit on impact, so the MP4 is ready to use without a separate audio pass.

Are Gemini Omni Flash videos watermarked?

Yes. Every video generated by Omni carries Google's SynthID—an invisible watermark verifiable through the Gemini app, Gemini in Chrome, and Google Search. This makes outputs traceable as AI-generated without affecting how the clip looks or sounds.

When can I use Gemini Omni Flash on this site?

We're working on direct access right now. Join the waitlist on this page and you'll be the first to know the moment Gemini Omni Flash video generation goes live here.