Leaking · Google I/O 2026

Gemini Omni
One model for text, image, video and audio

Surfacing in early May 2026 across multiple leaks, Gemini Omni is Google's upcoming unified multimodal model: native generation of text, image, video and synced audio inside a single Gemini-trained system.

See the capabilities Articles Jump to sources

Unified model Synced audio In-chat editing

Omni

Text

Image

Video

Audio

5–10s Clip length

1080p Max output

16:9 · 9:16 · 1:1 Aspect ratios

I/O 2026 Expected reveal

Capabilities

The whole pipeline collapses into one model

Unlike specialised video models such as Veo, Sora 2, Seedance 2.0 or Kling, Gemini Omni keeps language reasoning, image generation, video generation and audio synthesis under one architecture.

Native multimodal output

A single prompt produces matching text, keyframes and video, with consistent characters, style and lighting carrying across formats.

One unified Gemini stack

No more chaining of specialised models. Text, image, video and audio share the same weights and the same long context.

Synced native audio

Ambient sound, score and dialogue are aligned with the picture in the same forward pass — footsteps land on the beat, lips match speech on first export.

Direct in-chat editing

Swap an object, change the lighting, adjust a camera move in natural language — no full regeneration, echoing the Nano Banana editing playbook.

Remix and steer

Upload an existing clip and redirect it with prompts. Reference images, videos and audio can be combined in a single instruction.

Templates & styles

Built-in templates for product ads, Reels, music videos and cinematic shorts lower the floor for first-time users while keeping camera language consistent.

Specs

What can be pieced together before the keynote

Numbers below are aggregated from Reddit/X leaks and reporting by TestingCatalog, Programming Insider and OfficeChai.

Dimension	Known signal
Model family	Google Gemini — successor branding for the Veo line
Model ID	bard_eac_video_generation_omni / v3smm-lora-prod
Clip length	5 / 8 / 10 seconds per generation, chainable in-app
Resolution	480p / 720p / 1080p
Aspect ratios	16:9, 9:16, 1:1
Audio	Natively synthesized, synced in a single pass
Inputs	Text / image / video / audio references
Access	Staging inside Gemini app, API expected post I/O
Quota signal	Reports say two Omni generations burn ~86% of an AI Pro daily quota

Architecture

Three product lines collapse into one Omni

Google's generative stack used to be split across Veo for video, Nano Banana / Imagen for image and Gemini for text. Omni rolls those into a single architecture.

Before

Veo 3.1

Video + native audio

Nano Banana / Imagen

Image generation & editing

Gemini 2.5 / 3.x

Reasoning · long context

Now · Omni

Gemini Omni

Text · image · video · audio, one model, one prompt

Text Image Video Audio

Use cases

From a single brief to publishable content

A unified model with long context and synced audio means teams can write one coherent brief and walk away with a finished cut.

Product ads

Hero shots, packaging reveals and lifestyle cuts shipped with ambient audio already locked.

Reels & Shorts

Vertical 9:16 clips with on-mic dialogue and beat-synced motion, built for scroll-stopping social.

Music videos

Reference a track and Omni cuts visuals to the beat, keeping a consistent character across shots.

Cinematic shorts

Chain multiple 10-second omni-clips into multi-shot sequences with continuous lighting and audio bed.

Landing-page hero loops

Loopable 16:9 atmospheric clips for SaaS, fashion and DTC sites — branded and silent-friendly.

Explainers & tutorials

Turn a script into a narrated sequence with lip-synced dialogue and matching ambient sound.

Compare

Where Omni sits in the 2026 video stack

Aggregated from Artificial Analysis, Looksy AI, Oimi AI and the official keynotes — for orientation, not benchmark scores.

Model	Maker	Architecture	Native audio	Clip length
Gemini Omni Omni	Google	Unified omni (video + image + audio)	Synced in one pass	5 / 8 / 10s
Veo 3.1	Google	Specialised video model	Yes	~8s
Seedance 2.0	ByteDance	Specialised multi-modal video	Yes	up to 15s / shot
Sora 2	OpenAI	Specialised video model	Yes	~20s
Kling V3.0	Kuaishou	Specialised video model	Limited	~10s

Timeline

From the first leak to the I/O 2026 stage

Ordered by public report date, still evolving.

2026 · 05 · 02
First "Powered by Omni" string

X user @Thomas16937378 spotted "Start with an idea or try a template. Powered by Omni." inside the Gemini video tab.
2026 · 05 · 11
Full preview card inside Gemini mobile

TestingCatalog and Chetaslua surfaced the "Meet our new video model" card, the full model ID and the 10-second clip cap.
2026 · 05 · 12 – 18
Demos circulate in the wild

A "professor solving trig on a chalkboard" clip showcased text coherence and physical fidelity, sparking heavy comparison with Veo 3.1.
2026 · 05 · 19 – 20
Expected unveil at Google I/O 2026

Main-stage time is widely expected for Omni, possibly alongside Flash / Pro tiering, an API, and reshuffled subscription tiers.

FAQ

The questions people ask most about Gemini Omni

What exactly is Gemini Omni?

It's Google's upcoming unified multimodal model that natively generates text, image, video and synced audio inside one architecture — effectively merging Veo, Imagen and Gemini.

When will it ship?

As of mid-May 2026 Omni is still in the leak phase. The widely expected reveal is the Google I/O 2026 main stage on May 19–20.

How does it relate to Veo 3.1?

Metadata suggests Omni inherits engineering from the Veo stack, but it drops the Veo brand and folds video into Gemini's text and image layers.

Does it really generate sound?

Yes. Ambient sound, score and dialogue are produced in the same pass as the video — that's the whole reason for the 'omni' name.

What is the current clip-length limit?

The leaked model ID points to 5, 8 or 10 seconds per generation, with multi-clip chaining at the client layer.

How will pricing work?

Unconfirmed. A Reddit screenshot shows two Omni generations burning ~86% of the AI Pro daily quota, so a higher 'Ultra / Pro Plus' tier is plausible.

Sources

Primary reports and public links

Everything on this page is aggregated from the public sources below. Cross-reading is recommended.

programminginsider.com Read source

Gemini Omni
One model for text, image, video and audio

The whole pipeline collapses into one model

Native multimodal output

One unified Gemini stack

Synced native audio

Direct in-chat editing

Remix and steer

Templates & styles

What can be pieced together before the keynote

Three product lines collapse into one Omni

From a single brief to publishable content

Product ads

Reels & Shorts

Music videos

Cinematic shorts

Landing-page hero loops

Explainers & tutorials

Where Omni sits in the 2026 video stack

From the first leak to the I/O 2026 stage

First "Powered by Omni" string

Full preview card inside Gemini mobile

Demos circulate in the wild

Expected unveil at Google I/O 2026

The questions people ask most about Gemini Omni

Primary reports and public links

TestingCatalog · Programming Insider report

Oimi AI · Gemini Omni Leaked roundup

OfficeChai · Gemini Omni Spotted

Looksy AI · Gemini Omni product page

Gemini 2.5 technical report

Gemini Omni One model for text, image, video and audio

Quick stats

The whole pipeline collapses into one model

Native multimodal output

One unified Gemini stack

Synced native audio

Direct in-chat editing

Remix and steer

Templates & styles

What can be pieced together before the keynote

Three product lines collapse into one Omni

From a single brief to publishable content

Product ads

Reels & Shorts

Music videos

Cinematic shorts

Landing-page hero loops

Explainers & tutorials

Where Omni sits in the 2026 video stack

From the first leak to the I/O 2026 stage

First "Powered by Omni" string

Full preview card inside Gemini mobile

Demos circulate in the wild

Expected unveil at Google I/O 2026

The questions people ask most about Gemini Omni

Primary reports and public links

TestingCatalog · Programming Insider report

Oimi AI · Gemini Omni Leaked roundup

OfficeChai · Gemini Omni Spotted

Looksy AI · Gemini Omni product page

Gemini 2.5 technical report

Gemini Omni
One model for text, image, video and audio