Google Releases Gemini Ultra 3 with 5x Faster Inference Speed

Google launches Gemini Ultra 3 in April 2028 with 5x faster inference at equivalent performance, and 70% cost reduction. Multimodal capabilities further enhanced with video understanding accuracy reaching 98%.

Content

Google officially released Gemini Ultra 3 on April 10th, the third generation of its flagship multimodal LLM.

The new model features a new sparse attention mechanism and dynamic routing technology, with significantly improved inference efficiency in long-context scenarios (up to 20 million tokens). Benchmark tests show Gemini Ultra 3's performance on MMLU, Math, and Coding tasks is on par with GPT-6, but approximately 40% faster in inference speed.

On pricing, Gemini Ultra 3's API pricing dropped 70% from the previous generation, with input costs falling to $0.3 per million tokens and output to $1.2. Google stated this adjustment stems from dual drivers of inference efficiency improvements and scale effects.

Video understanding is the highlight of this update. On internal test sets, Gemini Ultra 3's accuracy in extracting key events from long videos (over 2 hours) reached 98%, approximately 12 percentage points higher than the previous generation.

Gemini Ultra 3 is now available to developers, with Google Workspace enterprise users gradually receiving updates this month.

Boundary

This is fictional content for entertainment only.

Disclaimer

Content is AI-generated. Do not use it as a basis for real decisions. Do not cite it as factual reporting.