- Model update and capacity development
- # 1. Gemini 2.5 Flash Preview new model
- 4. Lyria RealTime: Real-time music generation
- # 5. Gemini 2.5 Pro Deep Think
- 6. Gemma 3n (light multimodule model)
- API functionality enhancement and developer tool
- # 1. Thought Summers
- # # 2. Thinking Budgets
- # 5. Video understanding update
- Overall:
- # Gemini API’s new capabilities make it one:
- # # recommended scene:
Google published on I/O 2025 a series of important updates to Gemini API and Google AI Studio covering new features such as model capability expansion, audio and video input support, visualization of the thinking process, and browser automation controls.
These updates have significantly enhanced the ability of developers to build text, images, audio, video and multi-model agents.
Gemini API has been progressively developed as a complete multi-model smart platform for a wide range of scenarios from code generation to audio dialogue, from web-based information to browser operating control. Developers are allowed to call their most advanced text, image, audio and video models. The update focuses on the following:
** New model and voice capability upgrade**
** Real-time music generation**
** Multi-modular input enhancement (video understanding, etc.)**
** Development of tools and API structural updates (e.g., thoughts summary, browser control, astroactivity call)**
** More efficient and economical model access for developers (e.g. batch API)**
Model update and capacity development
# 1. Gemini 2.5 Flash Preview new model
-
** Version identification:** Gemini-2.5-flash-preview-05-20
-
** Performance improvement:** More than the previous generation in logical reasoning, code generation, long context treatment.
-
Assessment of achievement: ranked second in LMarena list, after Gemini 2.5 Pro.
-
** Efficiency gains:** The assessment shows an increase in efficiency in the use of token 22%.
2. Gemini 2.5 Pro & Flash TTS (text to voice)
-
** Support languages: ** More than 24 languages
-
** Type of support: ** Single voicer, multi-sounder (multi-speaker), support** Emotional, verbal control**
-
Application scene: Creation of artificial AI roles, multi-wheel voice dialogue agents, audio content creation, etc.
- Gemini 2.5 Flash Native Voice Dialogue Model
-
Function characteristics: 30 different sound styles available
-
The distinction between automatic background recognition and speaking.
-
Responding to user tone/emotional changes
-
The use of “thinking models” for complex logical processing
Application scenario:
-
Call the center smart agent.
-
Multi-role voice story.
-
A voice-like voice assistant.
4. Lyria RealTime: Real-time music generation
-
** Rationale:** Create real-time current connections through WebSocket, and the model continues to generate music segments.
-
** Control method: ** Texttip control generation style and rhythm.
-
**Application example: **PromptDJ-MIDI example application in Google AI Studio.
# 5. Gemini 2.5 Pro Deep Think
-
Use: ** Experimental function to handle ** complex mathematical and programming problems
-
** Performance:** Longer and more accurate reasoning chain applicable to advanced code generation and logical resolution.
6. Gemma 3n (light multimodule model)
-
** Deployment platform:** for peripheral devices such as mobile phones, notebooks, tablets, etc.
-
**Supported model: ** Text + Audio + Image.
-
Technical architecture: PLE parameter cache: Reduction of reasoning burden by layer cache.
-
MatFormer architecture: reduced calculation and memory costs.
API functionality enhancement and developer tool
# 1. Thought Summers
-
Use: extract model in the middle of reasoning (chain-of-thought) to help developers understand the path of model thinking.
-
** Presentation:** Title Category
-
Tool call chain display.
-
Along with the final answer.
** Code Example (Python):** ♪ From google import ♪ From google. == sync, corrected by elderman == @elder_man “What is the sum of the first 50 prime numbers?” = client.models.generate_content(s) Model = “gemini-2.5-flash-preview-05-20”, It’s not a good idea. Config=types.GenerateContentConfig( Thinking_config=types.ThinkingConfig( Thinking_budget=1024, == sync, corrected by elderman == @elder_man I’m not sure what I’m talking about. I’m not sure what I’m talking about. I’m not sure what I’m talking about. I don’t know, for part in resonse.candidates [0].content.parts: If part.thought: Yeah, it’s a good idea, isn’t it? It’s a good idea, it’s a good idea. Else: It’s not a good idea, it’s a good idea, it’s a good idea, it’s a good idea, it’s a good idea, it’s a good idea, it’s a good idea, it’s a good idea, it’s a good idea, it’s a good idea, it’s a good idea, it’s a good idea, it’s a good idea, it’s a good idea, it’s a good idea, it’s a good idea, it’s a good idea, it’s a good idea, it’s a good idea.
# # 2. Thinking Budgets
-
Function: Controls the depth of the “thinking” model when generating content to balance accuracy, delay and cost.
-
Application: Token usage can be limited and applied to low-delay scenarios.
#3. URL Context tool
-
Note: Models can automatically access relevant context information from the designated web page.
-
**Portfolio: ** Available in conjunction with GoogleSearch grouping tool to enhance research proxy capacity.
-
** Usage: **
== sync, corrected by elderman == @elder_man Tool (url_content=types.Urcontext), Tool (google_search=types.GoogleSearch) [Original: English] = client.models.generate_content(s) Model = “gemini-2.5-flash-preview-05-20”, “Give me a 3-day scheme based on YOUR_URL…” Config=GenerateContentConfig (tools=tools) I’m not sure what I’m talking about.
##4. Browser Automation Control (Project Mariner)
-
**Function: ** Controls browser behaviour, e.g. click buttons, scroll pages, fill out forms, etc.
-
** Method of deployment:** One key to deploy browser agent in Cloud Run.
-
** Cooperating enterprises:** UiPath, Browserbase, Automation Anywhere etc. have been involved in early detection.
# 5. Video understanding update
-
Input support: YouTube video link, direct upload video.
-
Support function: Video summary, analysis, translation
-
Video clipping (extract snippet analysis)
-
Variable Frame Rate (0.1 ~ 60 FPS) supports high frame content such as games/sports
-
Resolution control: 720p / 480p / 360p
6. Asynchronous function calls (Async Action Calling)
-
**New feature: ** To achieve an aneurysm in Live API, without blocking the main dialogue process.
-
**Setting method: ** Sets the behavior field as NN_BLOCKING in the function definition.
7. Batch API (Batch API)
-
**Function: ** Supports multiple batches to return results up to 24 hours.
-
Success: It’s half the cost of an interactive API.
-
Provide a higher rate limit
**Application scenario: ** Large-scale analysis, bulk document processing, offline assessment, etc.
Overall:
# Gemini API’s new capabilities make it one:
-
Real multi-model unified interface.
-
For light to heavy loads of equipment
-
A complex interactive scene covering audio, video, images, web pages, text, etc.
-
Support for more transparent and controlled model output and mind manipulation
** Developer can:**
-
Fast prototype test.
-
Build commercial smart agents
-
Integrated voice, video, image, etc. input output
-
Create smart workflows using tool scheduling and automated control interfaces
# # recommended scene:
-
To build a voice talk robot.
-
Development of a video content summary tool
-
Music generation creative applications
-
Browser Automation Test Tool
-
Research type AI Information Agent
Original language: https://develators.googleblog.com/en/gemini-api-io-updates/ View developer documents for more example codes and API guides: https://ai.google.dev