- Introduction to the core model
- Claude Opus 4
- New feature bright spot
- Performance comparison versus actual validation
- Claude Code full online
- Four new functions read out in detail
- Security and transparency of thinking
- Product and pricing
Anthropic published the Fourth Generation Series of the Claude Model: Claude Opus 4 and Claude Sonnet 4, which is not only a systematic evolution of the original model but also of the future AI application landscape.
Whether it is code generation, complex reasoning, long-term task processing or intelligent proxy construction, they show significant advantages.
在 Claude 4 的发布会上 Anthropic CPO Mike Krieger 详细阐述了 Agent 底层的三大能力:
-
Contextual Intelligence: no longer simple command execution, but understanding why and how. Your 100th mission with Agent should be much better than the 1st, like the 100th day of the new employee; Claude 4 shows the greatness of this ability. In the tests, it will proactively create a “rememory file” to save key information. When playing with Pokemon, it will even write navigation notes: “Try five times and stay stuck; if stuck, try the opposite direction; go to the other side of the room when the indoor navigation takes place.” This autonomous learning and knowledge accumulation is the core value of the human workforce.
-
Long-runing Exchange: Addressing complex tasks that take hours or even days to coordinate other Agents and humans. This is not only a matter of durability, but also the ability to maintain unity of purpose and consistency in context;
-
** Genuine Collaboration: ** Transparent reasoning process, adapted to human style of work. The key is the balance between “intelligent autonomy” and “human oversight” – AI deals with cumbersome details, humans take control of the big directions. Video by @indigo
Introduction to the core model
Claude Opus 4
-
** Principal: Professional-level programming capacity, continuing mission performance**
-
Lead in two authoritative coding benchmarking tests: SWE-bench score: 72.5%
-
Terminal-bench score: 43.2%
** Capable of running for several consecutive hours, processing hundreds of steps of reasoning**, especially suitable for AI proxy assignments and long-cycle R & D scenarios. Apply feedback (real user authentication):
-
Cursor: Considered to be “a major leap in the ability to understand the code”.
-
Replit: Multi-file code changes are more precise.
-
Block: “A significant increase in quality and stability in code editing and debugging”.
-
Rakuten: its agents operate independently for seven hours and have a stable performance.
-
Cognition: Successfully responding to complex decision-making that the models of the past could not handle.
Claude Sonnet 4 – Balanced and efficient universal model
-
Standing: high performance and efficiency in routine tasks
-
SWE-bench received a score of 72.7 per cent, slightly higher than Opus 4, and was particularly good at code automation and rational reasoning.
-
Although the overall performance is not as high as Opus 4, it is more efficient and responsive as it is appropriate for embedded or immediate response tasks.
Apply feedback:
-
GitHub Copilot: Sonet 4 will be the new engine and will be deployed to the new version of the smart programming assistant.
-
iGent: for multifunctional autonomous development tasks, code navigation error is almost zero.
-
Sourcegraph: The Sonnet 4 is considered to have enhanced code quality and mission continuity.
New feature bright spot
-
** Support for Tools + Long-term Thinking** (Beta version): Models can use tools such as search for alternate reasoning and improve the quality of responses.
-
Support for parallel use of multiple tools and efficiency gains
** The memory capacity has improved considerably**:
-
“Long-term memory” can be created by accessing local documents, extracting and retaining key facts
Example: Opus 4 automatically writes notes to record the strategy when playing a treasured dream, and these are real notes generated by the model itself.
💾 新增“记忆”功能
-
Models can create “rememory files” for the storage of mission-critical data.
-
Be of particular excellence in the application of AI proxy assignments, allowing for contextual consistency in successive sessions.
** Mandate simplification and controlled thinking presentation**
-
Introduction of a “thinking digestor” to refine the reasoning chain over a long period of time, using only about 5% of the cases.
-
Developers can apply for “Developer Mode” to see the full reasoning trail for advanced prompt debugging.
Performance comparison versus actual validation
** Benchmarking lead**
-
Opus 4 and Sonnet 4 are at the top of the SWE-Bench Verified.
-
Opus 4 performed excellently in a number of long, multi-round reasoning benchmarks, significantly exceeding Claude 3.7.
-
Sonnet 4 is slightly less than Opus 4, but much higher than 3.7 stability and accuracy.
User feedback validation**
-
Cursor: state-of-the-art code model, deep understanding of large code repositories.
-
Replit: The accuracy and consistency of multiple document changes have increased significantly.
-
GitHub Copilot: Sonnet 4 will be used to drive its new generation of code agents.
-
iGent/Sourcegraph: Multifunctional autonomous development, error rate close to 0, code quality improvement is significant.
Claude Code full online
Claude Code, a programming assistant designed for developers, is now officially fully open: Functional integration compatible with the platform
-
Support the GitHub Actions backstage tasking.
-
Original integrated VS Code and Jet Brains, model editors appear directly in the document, supporting in-line comment and change tracking.
-
Can run Claude Code in IDE terminal, achieve “local AI programming partner”.
SDK and automation capacity
-
Release Claude Code SDK to build custom AI tools and smart agents.
-
Example project “Claude Code on GitHub” enters the beta test, which can be used in full recall: Automatically respond to evaluation recommendations
-
Fix C.I. Errors
-
Modify Snippets
#Anthropic API Launch AI Agent Build New Capabilities
Anthropic has officially published four new features for AI smart agents in its API. These features are in the public beta phase, working with Claude Opus 4 and Sonnet 4 models to significantly enhance the ability, efficiency and flexibility of developers to build smart agents.
代码执行工具:在 API 层面运行并调试代码。
MCP connector: Connect to multi-component workflows or external services.
** Document API: Interact Claude with data from external file systems.
**Prompt cache: Prompt cache up to one hour to enhance performance and consistency.
Four new functions read out in detail
1 Code Exchange Tool Claude is no longer just a “ writing code “ , but is capable of ** running Python ** with complete analytical execution capability.
-
Run Python in the sandbox environment to generate visualized graphs and analyses.
-
Possibility: Financial modelling (e.g. portfolio analysis, forecasting)
-
Scientific calculations (e.g. simulation and experimental data processing)
-
Business intelligence (e.g. sales analysis, automatic report generation)
-
Document processing (format conversion, data extraction, report generation)
-
Statistical analysis (e.g. regression, hypothetical testing, predictive models)
** Usage policy: 50 hours per day free of charge, exceeding the portion charged at $ 0.05/hour/container. 2 **MCP connector (Model Context Protocol Contractor) This function streamlines Claude ‘ s connection to the external system, leaving developers without the need to manually write their client code.
-
Support the connection to any remote MCP services such as Zapier, Asana etc.
-
Automatically handle the following tasks: Connection management, tool discovery, authentication and bug processing
-
Smart call remote, automatically determines the call order and parameters
** Example : Build a project management agent to read tasks in Asana, assign tasks and perform data analysis in conjunction with code implementation.
📌 无需手动集成 API,开发效率大幅提升。
3 **Document API(Files API)
This capacity addresses the efficiency of handling a large number of documents in multiple rounds of dialogue.
- Support for one upload, multiple quotes: The same documents are not required to be uploaded per round and are suitable for such scenarios as the knowledge base, technical documentation, structured data, etc.
Files can be accessed directly by the code execution tool:
- For example, uploading a CSV data set to analyse, generate charts, generate summaries, etc. on a sustainable basis over multiple tasks.
Reduce the cost of duplicate uploading and contextual construction. Prompt Extended Cache Optimizing long-term tasks or context-rich interactive performance and costs:
-
The original cache TTL (time of survival) is 5 minutes and now provides 1 hour extension options.
-
Bring: Max 90% Cost Decline
-
** Up to 85% Delay Reduction**
The scenes include:
-
Multiple rounds of work stream.
-
Phased analysis or coordination tasks
-
Proxy with complete context for intersession
Fits well for enterprise-level Agent applications that need to maintain contextual consistency.
Security and transparency of thinking
-
Models have reduced 65 per cent of “shortcut” behavior in complex missions.
-
Introduction of a “thinking summing up function” to summarize only about 5 per cent of the tasks, with a clearer reasoning trajectory.
-
Provide Developmenter Mode to support the advanced tip project.
Product and pricing
-
Opus 4: Enter $15 / Output $75 per million token.
-
Sonnet 4: enter $3/output $15 per million token.
-
Both can be obtained through Anthropic API, Amazon Bedrock and Google Cloud Vertex AI.
-
Sonnet 4 is open to free users; Opus 4 is included in Pro/Max/Team/Enterprise.
Official presentation: https://www.anthropic.com/news/claude-4 https://www.anthropic.com/news/agent-capabilites-api