OpenAI releases cloud-based AI software development assistants who can perform multiple encoding tasks simultaneously

What’s Codex? Codex is a cloud-based AI software development assistant introduced by OpenAI that can handle multiple tasks in parallel. Its design objectives are: ** Replacing or supporting programmers with duplicative, cumbersome and time-consuming programming tasks** It not only writes codes, answers questions about the project, identifies and fixes errors, but also completes standard development tasks such as Generating Test Codes, submitting Pull Request.

  • Fixing bugs.

  • Reconstruct the code.

  • Add Test

  • Review the code.

  • Generate code changes according to natural language needs

Each task will be carried out in an isolated cloudy environment, which has pre-positioned your code library, so Codex has enough context to understand and deal with the problem. Codex 是在 OpenAI 的 o3 系列模型基础上专门为软件开发进行了微调(fine-tuned)。

What can Codex do?

Codex is a multifunctional programming agent system with the following capabilities: # Codex is not a simple code generator, but a cloud-end AI programming agent with ** autonomous execution capability. It runs like a remote developer, doing specific tasks for you at the cloud end.

  • ** Cloud Implementation** (not running on your computer)

  • ** Job Drive** (You gave a clear goal, Codex did it independently)

  • ** Traceable** (you can see what each step does)

  • ** Limited autonomy** (Codex is not “free play”, it’s an efficient solution around the task)

  • Secretable (no web access, no data disclosure or access to other services)

# Codex model mechanism

Codex uses a model called Codex-1, based on OpenAI ‘s O3 series of optimization training: its intellectual behavior relies on the following mechanisms:

1. User-initiated task: input command

As a user, you send requests through the sidebar of ChatGPT or the Codex CLI tool, such as:

  • “Adding user registration to me in the project”

  • “Interpretation of what this function is in the utils.py file.”

  • “Repairing a bug in the login process”

  • “Help me write the test case and run it.”

These requests may be in natural languages or a clear development mission statement.

  1. Codex parsing task: understanding intent and planning execution Codex uses the large language model behind it codex-1 (an enhanced-trained AI model):
  • ** understand your request**

  • ** Analysis of project context (code, structure, existing documents)**

  • ** Develop a sound implementation strategy**: e.g. new documents, modification of existing documents, running tests, etc.

If you configure the file AGENTS.md in the code library, Codex will refer to the description to perform the task more accurately, for example:

  • How to run the test (what command to use)

  • Project code style or development specifications

  • Key modules or document structure of the project

3. Codex initiates isolation (shattering container)

Each task will activate a ** completely isolated implementation environment on the cloud ** which would be equivalent to opening a unique virtual developer on the cloud:

  • Pre-positioned your project code library (incorporated through GitHub etc.)

  • configurated dependencies, test frames and tool chains

  • Disable Internet access to ensure security (no access to external API)

In this environment, Codex is like a remote developer who gets into your project directory and has access to view, run and modify the files in the project.

##4. Codex Task: Write Code + Run Command When the environment is in place, Codex will begin to perform the following tasks:

  • Read and write files: create, modify, rename code files

  • Write code: the ability to transfer code in a natural language, generating the right function, class or module

  • Run command: perform test, run linters (code specification tool), type check, etc.

  • Analysis of feedback: whether the test was passed, whether the error was reported and, if necessary, the code adjusted again

This is a closed-ring implementation process** until the completion of the mission or a clear failure.

5. Codex provides verifiable results after the mission is completed

Once the mission is over, Codex will:

  • Submitting changes (as the developers submit Git)

  • Provide the following for user verification: ** Code diff (where changed)**

  • Executed command line log

  • ** Test passed and failed**

  • ** terminal screenshot / Overview of document changes**

You can:

  • Request Codex for further modification.

  • Submit modifications as Pull Request to GitHub

  • or directly download and integrate into your local project.

How do you use Codex?

User can interact with Codex CLI via the ChatGPT sidebar or with Codex:

  • ** Using the “Code” button**: let Codex perform an actual encoded task.

  • ** Use the Ask button**: ask any questions about the code library.

  • Setting ** AGENTS.md** File: This file is like an “operational note” telling Codex how to test the code, which tools, the code style of the project, etc.

  • Mission completion usually takes 1 to 30 minutes, depending on the complexity, and you can monitor the progress of Codex in real time.

Every task runs in an independent environment, Codex automatically records the execution log and test the output, so you can see how each step is done.

# Codex Mode Description

1. Ask mode (read-only analysis) Applicable to:

  • The answer to the structural question.

  • Visualize request streams (e.g. generate mermaid graphs)

  • Provide recommendations for code optimization (no change in code)

Example: Document and create a computer diagram of the full recuest fly from the client endpoint to the datebase. 2. Code mode (auto-modified) Applicable to:

  • Bug, fix it.

  • Security audit

  • Automation.

  • Test generation.

  • Create PR submission code

Example: There’s a memory-safety in .

Performance and effects: How strong is Codex?

  • Train mode: Employ real software engineering tasks + Enhanced learning + Human code style preferred training.

  • Effect: Even without a manual configuration of the Codex environment (e.g. lack of AGENTS.md files), it can perform high-quality tasks.

  • In the internal software engineering assessment benchmark, it has a 70-80% success rate in performing complex tasks. OpenAI internal SWE task set, which validates a significantly higher pass rate than o3 and other models.

  • High performance even without supporting documentation (e.g. AGENTS.md).

此外,Codex 支持上下文长度最高可达 192k tokens,可以理解和处理超大规模项目代码库。

Safety design and transparency

Codex is subject to multiple security mechanisms in its implementation: Codex 在设计上充分考虑了安全与可审查性

  • All operations retrace: each change is accompanied by a log, test results and file discrepancy (diff), and you can see clearly what it has changed.

  • ** Explicit denial of malicious uses: Codex was trained to ** refuse to generate codes for malware, viruses, fishing tools, etc., even if these technologies may be used in a legitimate setting.

  • ** Sandbox isolation execution**: Codex cannot access the Internet, but only reads the code and dependence you explicitly provide. No leaking information or downloading unknown content.

  • ** Users are responsible for final code review**: while Codex is highly automated, users still have to be held accountable for the quality and legitimacy of the final code.

Codex CLI with light version codex-mini

OpenAI synchronizes the release of a terminal tool: Codex CLI, suitable for use by local developers. Its characteristics include:

  • Codex capabilities available locally without cloud services;

  • Support for rapid question-and-answer, automatic completion, re-engineering, etc.;

  • New lightweight model codex-mini-latest: Faster and lower delays;

  • Maintenance of strong command understanding and code quality;

  • Fits for tasks that are demanding in real time.

Moreover, CLI users can now log in and configure API directly from the ChatGPT account, without the need to generate Token manually.

Pricing and range of use

Type ChatGPT Pro/Enterprise/Team Users Use Plus/ Edu Users To Open Codex CLI / codex-mini Use API to support billing Codex-mini model API pricing

  • Enter: $1.50 / million tokens

  • Output: $6.00 / million tokens

  • 75% Prompt Cache Discount

Real use case (early tester)

The following companies have been involved in the testing and validation of Codex:

  • Cisco: Testing how to accelerate product development and iterative speed through Codex.

  • Temporal: Use Codex for testing, re-engineering codes, debugging problems.

  • Superhuman: Hand over routine duplicate tasks (e.g. increased test coverage) to Codex to ease the burden on engineers.

  • Kodiak Robotics: highly complex scenes applied to autopilot code debugging, tool development, etc.

These team feedbacks suggest that Codex is best placed to deal with “clear and repetitive” tasks, and it is recommended** that multiple tasks be assigned to different Codex examples at the same time to improve efficiency**.

The future direction

OpenAI’s future plans for Codex include:

  • Support** Midway Interaction of Tasks**: Users can adjust targets or view intermediate results at any time.

  • Integration with more tools: not only support GitHub, but also connect to your IDE, CI/CD, Issue tracking system in the future.

  • More complex task disassembly and multi-agent collaboration: simulation of the human team working model.

The ultimate goal is to:** allow developers to focus on key logic and design decision-making, while the rest will be effectively represented by AI.** Official presentation: https://openai.com/index/introduction-codex/