OpenAI /v1/responses
POST /v1/responses is the OpenAI Responses compatible entry point added in cc-router v2.3+. The design intent: let Codex CLI and any OpenAI Responses style client reuse every upstream subscription through cc-router without writing a second adapter.
Applies to cc-router v3.0.0 and later.
Protocol position and translation flow
Client (OpenAI Responses) cc-router Upstream
──────────────────────── ───────────────────────────────── ─────────────
POST /v1/responses ─────────────────────►│ handler::responses │
body: OpenAI Responses format │ ↓ request_to_anthropic │
│ Anthropic Messages format │
│ ↓ pipeline::dispatch │──► pick sub / rewrite model
│ pipeline returns Anthropic SSE/JSON │◄── upstream response
│ ↓ translate_*_to_responses │
│ OpenAI Responses format │
◄──────────────────── HTTP response──────│ │
The dispatch pipeline is untouched — every upstream provider path (including the codex / openai / gemini / kiro providers that already do protocol translation) is reused as-is. /v1/responses just wraps an inbound translator around the same pipeline.
Request
POST /v1/responses
Content-Type: application/json
| Header | Required | Notes |
|---|---|---|
Content-Type: application/json | Yes | The body must be JSON |
x-api-key or Authorization: Bearer ... | Per auth settings | /v1/responses is not allowlisted — required once auth is enabled |
| Other OpenAI SDK headers | No | Not consumed; not forwarded upstream (upstream protocol is Anthropic) |
Auth caveat: the same token check as /v1/messages, but the 401 error body is still Anthropic-shaped (auth_layer runs before the handler). Only 4xx/5xx produced inside the responses handler use the OpenAI shape.
The request body is a subset of the OpenAI Responses API. cc-router consumes and translates these fields:
| Field | Type | Required | Behavior |
|---|---|---|---|
model | string | Yes | Resolved to a virtual model; supports gpt-5.5 / gpt-5.4 / gpt-5.4-mini aliases. Missing this returns 400 |
stream | boolean | No (defaults to false) | true runs SSE translation; false runs JSON translation |
instructions | string | No | Translated to Anthropic system; if input also carries developer/system role text, the two are concatenated |
input | array | Yes | Flat item stream (message / function_call / function_call_output / reasoning), translated into Anthropic messages |
max_output_tokens | integer | No | Translated to Anthropic max_tokens (defaults to 4096 when omitted — Anthropic requires it) |
reasoning.effort | string | No | Translated to Anthropic thinking { type: enabled, budget_tokens } with budget mapping: minimal → 0 / low → 1024 / medium → 8192 / high → 16384 |
tools | array | No | Translated to Anthropic tool schema |
tool_choice | object | No | Translated to Anthropic tool_choice |
Non-streaming request example
curl http://127.0.0.1:23456/v1/responses \
-H 'Content-Type: application/json' \
-d '{
"model": "gpt-5.4",
"max_output_tokens": 256,
"instructions": "You are concise.",
"input": [
{ "type": "message", "role": "user",
"content": [{ "type": "input_text", "text": "Explain cc-router in one sentence" }] }
]
}' | jq
Response (non-streaming)
200 OK,Content-Type: application/json- The body is the standard OpenAI Responses
responseJSON- cc-router translates the Anthropic message returned by the pipeline into OpenAI Responses form
output[]is flattened: text content_block →messageitem; tool_use →function_callitem; thinking →reasoningitem (encrypted_contentround-trips through the Anthropicsignature)usage.input_tokens / output_tokensare translated to OpenAI field namesstop_reasonis translated tostatus / incomplete_details(e.g.max_tokens→incomplete_details.reason = max_output_tokens)
{
"id": "resp_xxx",
"object": "response",
"created_at": 1767225600,
"status": "completed",
"model": "gpt-5.5",
"output": [
{
"type": "message",
"role": "assistant",
"content": [{ "type": "output_text", "text": "..." }]
}
],
"usage": { "input_tokens": 42, "output_tokens": 128, "total_tokens": 170 }
}
Response (streaming SSE)
200 OK,Content-Type: text/event-stream(only this header is set —cache-control / transfer-encodingare not, so axum manages chunked encoding to avoidIncompleteMessageon HTTPS+rustls paths)- The stream is the OpenAI Responses SSE protocol, produced by an internal converter translating from upstream Anthropic SSE in real time.
Event mapping
| Anthropic event | Translated to OpenAI Responses event |
|---|---|
message_start | response.created + response.in_progress |
content_block_start (text) | response.output_item.added + response.content_part.added |
content_block_delta (text_delta) | response.output_text.delta |
content_block_start (thinking) | response.output_item.added (reasoning item) |
content_block_delta (thinking_delta) | response.reasoning_summary_text.delta |
content_block_start (tool_use) | response.output_item.added (function_call item) |
content_block_delta (input_json_delta) | response.function_call_arguments.delta |
content_block_stop | response.content_part.done + response.output_item.done |
message_delta | (usage extracted side-channel; no direct frame emitted) |
message_stop | response.completed |
Upstream disconnect / missing message_stop | A best-effort response.completed is emitted |
Key differences from Anthropic SSE:
- No
data: [DONE]— OpenAI Responses clients terminate onresponse.completed - Disconnect safety net: when the upstream transport drops or the upstream omits
message_stop, cc-router still emits aresponse.completedso the OpenAI SDK does not hang waiting for a terminal event
Streaming request example
curl -N http://127.0.0.1:23456/v1/responses \
-H 'Content-Type: application/json' \
-d '{
"model": "gpt-5.4",
"stream": true,
"max_output_tokens": 256,
"input": [
{ "type": "message", "role": "user",
"content": [{ "type": "input_text", "text": "ping" }] }
]
}'
The stream terminates with event: response.completed (no data: [DONE]).
Dispatch
Identical to /v1/messages — /v1/responses resolves the client’s model field to a virtual model and runs the same dispatch + retry logic.
That means:
gpt-5.5/gpt-5.4/gpt-5.4-miniandmodel-sonnet/claude-sonnet-4-6and any other virtual model alias can all enter via/v1/responses- Upstreams can be any of the 9 Anthropic providers or codex / openai / gemini / kiro — clients can’t tell
/v1/responsescannot bind the OAuth-only codex provider as fallback (same constraint as/v1/messages)
See the full table at Anthropic /v1/messages → Virtual model mapping.
Error responses
Errors produced inside /v1/responses follow the OpenAI Responses shape:
{
"error": {
"message": "...",
"type": "<kind>",
"code": null
}
}
| Status | Trigger | Source |
|---|---|---|
400 | Request body is not valid JSON | handler::responses |
400 | Missing model field | handler::responses |
400 | OpenAI → Anthropic translation failed (e.g. malformed input) | request_to_anthropic |
401 | Auth failure | Anthropic-shaped (auth_layer runs before the handler) |
500 | Internal pipeline error | handler::responses |
500 | Failed to read pipeline JSON body / failed to parse upstream response | translate_json_to_responses |
| Upstream status passthrough | The pipeline’s Anthropic error body is auto-translated to OpenAI shape | translate_json_to_responses |
Unimplemented OpenAI endpoints
The following official OpenAI endpoints are not implemented in cc-router by design:
POST /v1/chat/completions(Chat Completions API) — cc-router’s OpenAI compatibility layer only does Responses APIGET /v1/responses/{id},POST /v1/responses/{id}/cancel, and other Responses state endpoints — cc-router is a stateless proxy; it doesn’t cache responses, so clients terminate on the streamingresponse.completed- OpenAI Assistants / Threads / Files API
cc-router’s intended clients are Codex CLI / Claude Code style stateless conversation clients that only need POST /v1/responses or POST /v1/messages plus GET /v1/models.