OpenAI /v1/responses

POST /v1/responses is the OpenAI Responses compatible entry point added in cc-router v2.3+. The design intent: let Codex CLI and any OpenAI Responses style client reuse every upstream subscription through cc-router without writing a second adapter.

Applies to cc-router v3.0.0 and later.

Protocol position and translation flow

Client (OpenAI Responses)                            cc-router                         Upstream
────────────────────────                  ─────────────────────────────────         ─────────────
POST /v1/responses ─────────────────────►│ handler::responses                  │
  body: OpenAI Responses format          │   ↓ request_to_anthropic            │
                                         │ Anthropic Messages format           │
                                         │   ↓ pipeline::dispatch              │──► pick sub / rewrite model
                                         │ pipeline returns Anthropic SSE/JSON │◄── upstream response
                                         │   ↓ translate_*_to_responses        │
                                         │ OpenAI Responses format             │
◄──────────────────── HTTP response──────│                                     │

The dispatch pipeline is untouched — every upstream provider path (including the codex / openai / gemini / kiro providers that already do protocol translation) is reused as-is. /v1/responses just wraps an inbound translator around the same pipeline.


Request

POST /v1/responses
Content-Type: application/json
HeaderRequiredNotes
Content-Type: application/jsonYesThe body must be JSON
x-api-key or Authorization: Bearer ...Per auth settings/v1/responses is not allowlisted — required once auth is enabled
Other OpenAI SDK headersNoNot consumed; not forwarded upstream (upstream protocol is Anthropic)

Auth caveat: the same token check as /v1/messages, but the 401 error body is still Anthropic-shaped (auth_layer runs before the handler). Only 4xx/5xx produced inside the responses handler use the OpenAI shape.

The request body is a subset of the OpenAI Responses API. cc-router consumes and translates these fields:

FieldTypeRequiredBehavior
modelstringYesResolved to a virtual model; supports gpt-5.5 / gpt-5.4 / gpt-5.4-mini aliases. Missing this returns 400
streambooleanNo (defaults to false)true runs SSE translation; false runs JSON translation
instructionsstringNoTranslated to Anthropic system; if input also carries developer/system role text, the two are concatenated
inputarrayYesFlat item stream (message / function_call / function_call_output / reasoning), translated into Anthropic messages
max_output_tokensintegerNoTranslated to Anthropic max_tokens (defaults to 4096 when omitted — Anthropic requires it)
reasoning.effortstringNoTranslated to Anthropic thinking { type: enabled, budget_tokens } with budget mapping: minimal → 0 / low → 1024 / medium → 8192 / high → 16384
toolsarrayNoTranslated to Anthropic tool schema
tool_choiceobjectNoTranslated to Anthropic tool_choice

Non-streaming request example

curl http://127.0.0.1:23456/v1/responses \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gpt-5.4",
    "max_output_tokens": 256,
    "instructions": "You are concise.",
    "input": [
      { "type": "message", "role": "user",
        "content": [{ "type": "input_text", "text": "Explain cc-router in one sentence" }] }
    ]
  }' | jq

Response (non-streaming)

  • 200 OK, Content-Type: application/json
  • The body is the standard OpenAI Responses response JSON
    • cc-router translates the Anthropic message returned by the pipeline into OpenAI Responses form
    • output[] is flattened: text content_block → message item; tool_use → function_call item; thinking → reasoning item (encrypted_content round-trips through the Anthropic signature)
    • usage.input_tokens / output_tokens are translated to OpenAI field names
    • stop_reason is translated to status / incomplete_details (e.g. max_tokensincomplete_details.reason = max_output_tokens)
{
  "id": "resp_xxx",
  "object": "response",
  "created_at": 1767225600,
  "status": "completed",
  "model": "gpt-5.5",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [{ "type": "output_text", "text": "..." }]
    }
  ],
  "usage": { "input_tokens": 42, "output_tokens": 128, "total_tokens": 170 }
}

Response (streaming SSE)

  • 200 OK, Content-Type: text/event-stream (only this header is set — cache-control / transfer-encoding are not, so axum manages chunked encoding to avoid IncompleteMessage on HTTPS+rustls paths)
  • The stream is the OpenAI Responses SSE protocol, produced by an internal converter translating from upstream Anthropic SSE in real time.

Event mapping

Anthropic eventTranslated to OpenAI Responses event
message_startresponse.created + response.in_progress
content_block_start (text)response.output_item.added + response.content_part.added
content_block_delta (text_delta)response.output_text.delta
content_block_start (thinking)response.output_item.added (reasoning item)
content_block_delta (thinking_delta)response.reasoning_summary_text.delta
content_block_start (tool_use)response.output_item.added (function_call item)
content_block_delta (input_json_delta)response.function_call_arguments.delta
content_block_stopresponse.content_part.done + response.output_item.done
message_delta(usage extracted side-channel; no direct frame emitted)
message_stopresponse.completed
Upstream disconnect / missing message_stopA best-effort response.completed is emitted

Key differences from Anthropic SSE:

  • No data: [DONE] — OpenAI Responses clients terminate on response.completed
  • Disconnect safety net: when the upstream transport drops or the upstream omits message_stop, cc-router still emits a response.completed so the OpenAI SDK does not hang waiting for a terminal event

Streaming request example

curl -N http://127.0.0.1:23456/v1/responses \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gpt-5.4",
    "stream": true,
    "max_output_tokens": 256,
    "input": [
      { "type": "message", "role": "user",
        "content": [{ "type": "input_text", "text": "ping" }] }
    ]
  }'

The stream terminates with event: response.completed (no data: [DONE]).


Dispatch

Identical to /v1/messages/v1/responses resolves the client’s model field to a virtual model and runs the same dispatch + retry logic.

That means:

  • gpt-5.5 / gpt-5.4 / gpt-5.4-mini and model-sonnet / claude-sonnet-4-6 and any other virtual model alias can all enter via /v1/responses
  • Upstreams can be any of the 9 Anthropic providers or codex / openai / gemini / kiro — clients can’t tell
  • /v1/responses cannot bind the OAuth-only codex provider as fallback (same constraint as /v1/messages)

See the full table at Anthropic /v1/messages → Virtual model mapping.


Error responses

Errors produced inside /v1/responses follow the OpenAI Responses shape:

{
  "error": {
    "message": "...",
    "type": "<kind>",
    "code": null
  }
}
StatusTriggerSource
400Request body is not valid JSONhandler::responses
400Missing model fieldhandler::responses
400OpenAI → Anthropic translation failed (e.g. malformed input)request_to_anthropic
401Auth failureAnthropic-shaped (auth_layer runs before the handler)
500Internal pipeline errorhandler::responses
500Failed to read pipeline JSON body / failed to parse upstream responsetranslate_json_to_responses
Upstream status passthroughThe pipeline’s Anthropic error body is auto-translated to OpenAI shapetranslate_json_to_responses

Unimplemented OpenAI endpoints

The following official OpenAI endpoints are not implemented in cc-router by design:

  • POST /v1/chat/completions (Chat Completions API) — cc-router’s OpenAI compatibility layer only does Responses API
  • GET /v1/responses/{id}, POST /v1/responses/{id}/cancel, and other Responses state endpoints — cc-router is a stateless proxy; it doesn’t cache responses, so clients terminate on the streaming response.completed
  • OpenAI Assistants / Threads / Files API

cc-router’s intended clients are Codex CLI / Claude Code style stateless conversation clients that only need POST /v1/responses or POST /v1/messages plus GET /v1/models.