Performance assessment
# More stable: 4/4 reliability standards
** Shows good against mainstream models**
Tool integration: not only language models, but also intelligent assistants
**o3 prices down 80% below GPT 4o **

OpenAI officially launched a new generation model, O3-Pro, which is now open to all ChatGPT professional and API users and will soon be extended to the corporate and educational versions. Compared to previous versions (e.g. o3 and o1-Pro), O3-Pro has achieved significant improvements in a number of ways. The expert review highly evaluated its capabilities in the fields of science, education, data analysis, writing and programming, and noted that it is more excellent in terms of clarity, integrity of content, compliance with directives and accuracy.

** The expert assessment indicates** that performance is better than that of o3,o3-pro in a number of key areas, including science, education, programming, data analysis and writing.
O3-Pro rated higher ** clarity, comprehensiveness, command compliance and accuracy**.
Similar to O1-Pro, o3-Pro, excellent in mathematics, science and programming in academic assessment.
OpenAI’s “4/4 Reliability Assessment” was used to test model stability - only if all of the four responses were correct will it be successful.
O3-Pro has access to ChatGPT advanced tools: Web search, file analysis, image recognition, Python programming, personalization of memory, etc.
o3-Pro to become a default model for professional and team-based users from this very day on, replacing o1-pro; users of enterprise and educational versions will be granted access next week.

Performance assessment

In expert assessments, evaluators generally prefer o3-pro to o3, emphasizing performance enhancement in key areas such as science, education, programming, data analysis and writing. O3-Pro obtained higher scores in terms of clarity, comprehensiveness, command understanding and execution, accuracy of content, etc. 与 o1-pro 类似，o3-pro 在数学、科学和编程方面表现出色，这一点已在学术评估中得到验证。

# More stable: 4/4 reliability standards

In order to verify the stability of the model, OpenAI uses a rigorous evaluation criterion called “4/4 response” (all answers are correct in four). Only if the model is able to answer the question correctly in four consecutive attempts is it considered to be truly reliable. In this assessment, O3-Pro’s performance proved not only that it was smart, but that it was stable.

Shows good against mainstream models

Performance is comparable to Gemini 2.5 Pro and is consistent with the intelligence index in the evaluation
Smart is better than Claude 4 Sonnet Tinking, but the unit cost is lower

** Less concise than Claude 4 Opus** but ** than Gemini 2.5 Pro and DeepSeek R1**

Tool integration: not only language models, but also intelligent assistants

O3-pro is much more than “chat chat” AI, which brings together all the high-level tools of ChatGPT, making it a truly multi-skilled intellectual assistant:

Web search: quick search and integration of real-time information
Document analysis**: available for reading and understanding the contents of the uploaded document
Image recognition**: capable of processing visual input and understanding images
Python programming capability: direct operation code, data processing, drawing, etc.
Personalized memory**: Keeping in mind user preferences and continuously optimizing interactive experiences

o3 prices down 80% below GPT 4o

OpenAI substantially reduced the price of the o3 model 80%: from $8.40 per million input/output token to $2/ $8, while providing a **75% discount for cache input token. o3Pro price: input: $2/ 1 million o1-Pro: $600 o3-Pro: $20 O3-Pro is better than O1-Pro in every way, faster, smarter and stronger. And it’s 30 times cheaper.
o3 The cost per token is equal to GPT-4.1: The price is uniform between reasoning and non-extremistic models, but the actual cost per request is still higher due to o3 **the average output is about 7 times the GPT-4.1 **.

O3 unit token cost aligned to GPT-4.1

The two are at the same price as “each token”.
But it’s a lot lower than GPT 4o.
But since the number of tokens generated by o3 ** is about 7 times as high as GPT-4.1 , the cost of a full query is higher**

📉 AI 智能成本正在迅速下降：

GPT-4 level smarts have declined since their release **100 times more **
Trained models that meet the “advanced intelligence threshold” and whose economic thresholds continue to fall
The marginal cost of the AI service continues to decline for the user deployment, contributing to more landing scenarios This means that the higher-performance AI model is becoming more usable and better suited for commercial integration and large-scale use.

OpenAI to launch a new generation of models for o3-Pro performances that are significantly improved by 80% below the price of o3 that is cheaper than GPT 4o

Contents

Performance assessment

# More stable: 4/4 reliability standards

Shows good against mainstream models

Tool integration: not only language models, but also intelligent assistants

o3 prices down 80% below GPT 4o

OpenAI to launch a new generation of models for o3-Pro performances that are significantly improved by 80% below the price of o3 that is cheaper than GPT 4o

Contents

Performance assessment

# More stable: 4/4 reliability standards

** Shows good against mainstream models**

Tool integration: not only language models, but also intelligent assistants

**o3 prices down 80% below GPT 4o **

Related Posts

Shows good against mainstream models

o3 prices down 80% below GPT 4o