Anthropic /v1/messages

cc-router starts a local HTTP proxy that exposes two protocol entry points side by side: the Anthropic Messages API (primary) and the OpenAI Responses API (v2.3+ compatibility). This page is the full reference for Anthropic /v1/messages — the preferred entry point for Claude Code and any client that speaks the Anthropic Messages protocol.

Applies to cc-router v3.0.0 and later.

Listening address and ports

SettingDefaultNotes
Bind address127.0.0.1Toggling “listen on all interfaces” switches to 0.0.0.0 (UI shows a red warning)
HTTP port23456If busy, cc-router probes +1 up to 100 times
HTTPS portper the https_port settingEnabled per the proxy mode
Address/port changesRequire an app restartThe proxy does not hot-reload

Minimal client configuration

export ANTHROPIC_BASE_URL=http://127.0.0.1:23456
# Skip the next line if authentication is disabled (the default)
export ANTHROPIC_API_KEY=<token from the cc-router settings page>

# Point Claude Code's three model slots at cc-router's virtual models
export ANTHROPIC_DEFAULT_OPUS_MODEL=model-opus
export ANTHROPIC_DEFAULT_SONNET_MODEL=model-sonnet
export ANTHROPIC_DEFAULT_HAIKU_MODEL=model-haiku

Authentication

  • Disabled by default: any request goes through, no token required.
  • When enabled, cc-router reads the token from either header (either one is enough):
    • x-api-key: <token> — Claude Code’s ANTHROPIC_API_KEY lands here (preferred)
    • Authorization: Bearer <token> — Claude Code’s ANTHROPIC_AUTH_TOKEN lands here
  • The extracted token must match exactly the auth_token configured in the settings page; otherwise cc-router returns 401.
  • Allowlist: /v1/models, /health, and all OPTIONS preflight requests always pass through, even with authentication enabled.
    • Rationale: clients need to list models at startup; browsers need to probe without auth blocking them. The endpoints that actually consume quota (/v1/messages and /v1/responses) are the ones that require auth.

The token cc-router asks for here is for cc-router itself, unrelated to your upstream providers’ real API keys — those are swapped in by cc-router according to virtual-model dispatch rules.

CORS is on by default: Access-Control-Allow-Origin: *, methods GET / POST / OPTIONS, all headers allowed, preflight returns 204. Even 401 responses carry CORS headers, so a browser fetch can read the response body.


Request

POST /v1/messages
Content-Type: application/json
HeaderRequiredNotes
Content-Type: application/jsonYesThe body must be JSON
x-api-key or Authorization: Bearer ...Per auth settingsOne of the two when auth is enabled
anthropic-version / anthropic-beta / …Nocc-router does not consume these; passed verbatim to upstream

The request body uses the standard Anthropic Messages API format. cc-router reads only two fields for dispatch; every other field (messages / system / tools / temperature / max_tokens / thinking / …) is passed through unchanged to the upstream.

FieldTypeRequiredBehavior
modelstringYesResolved to a virtual model — see the mapping table below. Missing this returns 400
streambooleanNo (defaults to false)true uses SSE; false is non-streaming

cc-router rewrites the model field in the body:

  • Resolves to model-opus / model-sonnet / model-haiku → rewritten to the real model name bound in that slot (e.g. glm-4.6, qwen3-max)
  • Resolves to fallback → not rewritten; passed through as-is to the upstream

Non-streaming request example

curl http://127.0.0.1:23456/v1/messages \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "model-sonnet",
    "max_tokens": 256,
    "messages": [
      { "role": "user", "content": "Explain cc-router in one sentence" }
    ]
  }' | jq

Response (non-streaming)

  • 200 OK, Content-Type: application/json
  • The body is the standard Anthropic message JSON
  • cc-router rewrites message.model back to the virtual model name (fallback mode skips this) so clients can aggregate caching and stats by virtual model
  • usage.* (including cache_creation_input_tokens / cache_read_input_tokens) is passed through; cc-router also extracts a copy for internal accounting
{
  "id": "msg_xxx",
  "type": "message",
  "role": "assistant",
  "model": "model-sonnet",
  "content": [
    { "type": "text", "text": "..." }
  ],
  "stop_reason": "end_turn",
  "usage": { "input_tokens": 42, "output_tokens": 128 }
}

Response (streaming SSE)

  • 200 OK, Content-Type: text/event-stream
  • Upstream SSE frames are passed through byte-for-byte, with two exceptions:
Eventcc-router behavior
message_startParses the JSON, rewrites message.model to the virtual model name (fallback skips this), extracts usage.* for accounting, re-serializes, and writes out
message_deltaNot rewritten — cc-router only side-channels usage.output_tokens etc. for accounting; bytes pass through
All other events (content_block_* / message_stop / ping, …)Passed through unchanged

Streaming request example

curl -N http://127.0.0.1:23456/v1/messages \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "model-sonnet",
    "max_tokens": 256,
    "stream": true,
    "messages": [
      { "role": "user", "content": "ping" }
    ]
  }'

-N disables curl buffering so SSE frames print live.

First-frame lookahead: when the upstream returns 200 but the very first event is actually event: error (typical case: GLM 1302/1308 quota exhaustion disguised as 200), cc-router does not forward that frame. It silently triggers a retry against the next subscription. The client only ever sees one successful completion or one final failure.

Mid-stream disconnect: when the upstream connection drops mid-stream, cc-router appends an event: error frame plus data: [DONE] after the frames already sent, so the client can observe the interruption instead of hanging.


Virtual model mapping

cc-router maps the request’s model field to one virtual model, then tries the subscriptions bound to that virtual model in order according to its dispatch mode (sequential / round-robin).

Client-sent modelResolved virtual model
model-opus, claude-opus-4-7, gpt-5.5, anthropic/model-opus, anthropic/claude-opus-4-7, openai/gpt-5.5model-opus
model-sonnet, claude-sonnet-4-6, gpt-5.4, anthropic/model-sonnet, anthropic/claude-sonnet-4-6, openai/gpt-5.4model-sonnet
model-haiku, claude-haiku-4-5, gpt-5.4-mini, anthropic/model-haiku, anthropic/claude-haiku-4-5, openai/gpt-5.4-minimodel-haiku
model-fallback, anthropic/model-fallbackFallback (explicit)
Any other value (custom model names, etc.)Fallback (implicit; model is passed through verbatim)
  • The anthropic/ prefix is supported for LiteLLM-style vendor-prefixed naming.
  • The openai/ prefix is the same idea, primarily for clients hitting the OpenAI /v1/responses entry point.
  • gpt-5.5 / gpt-5.4 / gpt-5.4-mini are OpenAI-flavored aliases that cc-router deliberately reuse the Opus / Sonnet / Haiku slots — no new virtual model is introduced.

Error responses

Errors from /v1/messages follow the Anthropic shape:

{
  "type": "error",
  "error": {
    "type": "<kind>",
    "message": "<human-readable message>"
  }
}

Errors produced by cc-router itself:

HTTP statuskindTrigger
400invalid_request_errorJSON parse failure / missing model field
401authentication_errorToken mismatch when auth is enabled
500api_errorInternal pipeline error (e.g. every subscription failed)
503overloaded_errorThe fallback virtual model has no bound subscriptions
4xx / 5xxDepends on upstreamWhen every subscription fails, cc-router forwards the last upstream’s status and error body

SSE error frames inside the stream:

  • If the very first event is event: error → cc-router intercepts and automatically retries the next subscription; the client never sees it
  • An error frame appearing mid-stream → passed through to the client; kind is fixed to upstream_error

Unimplemented Anthropic endpoints

The following official Anthropic endpoints are not implemented in cc-router by design:

  • POST /v1/messages/count_tokens
  • POST /v1/messages/batches and every batches-related endpoint
  • POST /v1/files (Files API)
  • Workbench / Admin API

cc-router targets Claude-Code-style real-time conversation proxying; Claude Code only depends on POST /v1/messages and GET /v1/models, so the rest is not implemented. To estimate token counts client-side, use a local library such as tiktoken, or send a single /v1/messages call and read usage.input_tokens from the response.

If your client speaks the OpenAI Responses protocol (e.g. Codex CLI), use the OpenAI /v1/responses entry point instead — cc-router translates the request into Anthropic Messages and runs it through the same dispatch pipeline.