Anthropic /v1/messages

cc-router starts a local HTTP proxy that exposes two protocol entry points side by side: the Anthropic Messages API (primary) and the OpenAI Responses API (v2.3+ compatibility). This page is the full reference for Anthropic /v1/messages — the preferred entry point for Claude Code and any client that speaks the Anthropic Messages protocol.

Applies to cc-router v3.0.0 and later.

Listening address and ports

Setting	Default	Notes
Bind address	`127.0.0.1`	Toggling “listen on all interfaces” switches to `0.0.0.0` (UI shows a red warning)
HTTP port	`23456`	If busy, cc-router probes `+1` up to 100 times
HTTPS port	per the `https_port` setting	Enabled per the proxy mode
Address/port changes	Require an app restart	The proxy does not hot-reload

Minimal client configuration

export ANTHROPIC_BASE_URL=http://127.0.0.1:23456
# Skip the next line if authentication is disabled (the default)
export ANTHROPIC_API_KEY=<token from the cc-router settings page>

# Point Claude Code's three model slots at cc-router's virtual models
export ANTHROPIC_DEFAULT_OPUS_MODEL=model-opus
export ANTHROPIC_DEFAULT_SONNET_MODEL=model-sonnet
export ANTHROPIC_DEFAULT_HAIKU_MODEL=model-haiku

Authentication

Disabled by default: any request goes through, no token required.
When enabled, cc-router reads the token from either header (either one is enough):
- x-api-key: <token> — Claude Code’s ANTHROPIC_API_KEY lands here (preferred)
- Authorization: Bearer <token> — Claude Code’s ANTHROPIC_AUTH_TOKEN lands here
The extracted token must match exactly the auth_token configured in the settings page; otherwise cc-router returns 401.
Allowlist: /v1/models, /health, and all OPTIONS preflight requests always pass through, even with authentication enabled.
- Rationale: clients need to list models at startup; browsers need to probe without auth blocking them. The endpoints that actually consume quota (/v1/messages and /v1/responses) are the ones that require auth.

The token cc-router asks for here is for cc-router itself, unrelated to your upstream providers’ real API keys — those are swapped in by cc-router according to virtual-model dispatch rules.

CORS is on by default: Access-Control-Allow-Origin: *, methods GET / POST / OPTIONS, all headers allowed, preflight returns 204. Even 401 responses carry CORS headers, so a browser fetch can read the response body.

Request

POST /v1/messages
Content-Type: application/json

Header	Required	Notes
`Content-Type: application/json`	Yes	The body must be JSON
`x-api-key` or `Authorization: Bearer ...`	Per auth settings	One of the two when auth is enabled
`anthropic-version` / `anthropic-beta` / …	No	cc-router does not consume these; passed verbatim to upstream

The request body uses the standard Anthropic Messages API format. cc-router reads only two fields for dispatch; every other field (messages / system / tools / temperature / max_tokens / thinking / …) is passed through unchanged to the upstream.

Field	Type	Required	Behavior
`model`	string	Yes	Resolved to a virtual model — see the mapping table below. Missing this returns `400`
`stream`	boolean	No (defaults to `false`)	`true` uses SSE; `false` is non-streaming

cc-router rewrites the model field in the body:

Resolves to model-opus / model-sonnet / model-haiku → rewritten to the real model name bound in that slot (e.g. glm-4.6, qwen3-max)
Resolves to fallback → not rewritten; passed through as-is to the upstream

Non-streaming request example

curl http://127.0.0.1:23456/v1/messages \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "model-sonnet",
    "max_tokens": 256,
    "messages": [
      { "role": "user", "content": "Explain cc-router in one sentence" }
    ]
  }' | jq

Response (non-streaming)

200 OK, Content-Type: application/json
The body is the standard Anthropic message JSON
cc-router rewrites message.model back to the virtual model name (fallback mode skips this) so clients can aggregate caching and stats by virtual model
usage.* (including cache_creation_input_tokens / cache_read_input_tokens) is passed through; cc-router also extracts a copy for internal accounting

{
  "id": "msg_xxx",
  "type": "message",
  "role": "assistant",
  "model": "model-sonnet",
  "content": [
    { "type": "text", "text": "..." }
  ],
  "stop_reason": "end_turn",
  "usage": { "input_tokens": 42, "output_tokens": 128 }
}

Response (streaming SSE)

200 OK, Content-Type: text/event-stream
Upstream SSE frames are passed through byte-for-byte, with two exceptions:

Event	cc-router behavior
`message_start`	Parses the JSON, rewrites `message.model` to the virtual model name (fallback skips this), extracts `usage.*` for accounting, re-serializes, and writes out
`message_delta`	Not rewritten — cc-router only side-channels `usage.output_tokens` etc. for accounting; bytes pass through
All other events (`content_block_*` / `message_stop` / `ping`, …)	Passed through unchanged

Streaming request example

curl -N http://127.0.0.1:23456/v1/messages \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "model-sonnet",
    "max_tokens": 256,
    "stream": true,
    "messages": [
      { "role": "user", "content": "ping" }
    ]
  }'

-N disables curl buffering so SSE frames print live.

First-frame lookahead: when the upstream returns 200 but the very first event is actually event: error (typical case: GLM 1302/1308 quota exhaustion disguised as 200), cc-router does not forward that frame. It silently triggers a retry against the next subscription. The client only ever sees one successful completion or one final failure.

Mid-stream disconnect: when the upstream connection drops mid-stream, cc-router appends an event: error frame plus data: [DONE] after the frames already sent, so the client can observe the interruption instead of hanging.

Virtual model mapping

cc-router maps the request’s model field to one virtual model, then tries the subscriptions bound to that virtual model in order according to its dispatch mode (sequential / round-robin).

Client-sent `model`	Resolved virtual model
`model-opus`, `claude-opus-4-7`, `gpt-5.5`, `anthropic/model-opus`, `anthropic/claude-opus-4-7`, `openai/gpt-5.5`	`model-opus`
`model-sonnet`, `claude-sonnet-4-6`, `gpt-5.4`, `anthropic/model-sonnet`, `anthropic/claude-sonnet-4-6`, `openai/gpt-5.4`	`model-sonnet`
`model-haiku`, `claude-haiku-4-5`, `gpt-5.4-mini`, `anthropic/model-haiku`, `anthropic/claude-haiku-4-5`, `openai/gpt-5.4-mini`	`model-haiku`
`model-fallback`, `anthropic/model-fallback`	Fallback (explicit)
Any other value (custom model names, etc.)	Fallback (implicit; `model` is passed through verbatim)

The anthropic/ prefix is supported for LiteLLM-style vendor-prefixed naming.

The openai/ prefix is the same idea, primarily for clients hitting the OpenAI /v1/responses entry point.

gpt-5.5 / gpt-5.4 / gpt-5.4-mini are OpenAI-flavored aliases that cc-router deliberately reuse the Opus / Sonnet / Haiku slots — no new virtual model is introduced.

Error responses

Errors from /v1/messages follow the Anthropic shape:

{
  "type": "error",
  "error": {
    "type": "<kind>",
    "message": "<human-readable message>"
  }
}

Errors produced by cc-router itself:

HTTP status	`kind`	Trigger
`400`	`invalid_request_error`	JSON parse failure / missing `model` field
`401`	`authentication_error`	Token mismatch when auth is enabled
`500`	`api_error`	Internal pipeline error (e.g. every subscription failed)
`503`	`overloaded_error`	The fallback virtual model has no bound subscriptions
4xx / 5xx	Depends on upstream	When every subscription fails, cc-router forwards the last upstream’s status and error body

SSE error frames inside the stream:

If the very first event is event: error → cc-router intercepts and automatically retries the next subscription; the client never sees it
An error frame appearing mid-stream → passed through to the client; kind is fixed to upstream_error

Unimplemented Anthropic endpoints

The following official Anthropic endpoints are not implemented in cc-router by design:

POST /v1/messages/count_tokens
POST /v1/messages/batches and every batches-related endpoint
POST /v1/files (Files API)
Workbench / Admin API

cc-router targets Claude-Code-style real-time conversation proxying; Claude Code only depends on POST /v1/messages and GET /v1/models, so the rest is not implemented. To estimate token counts client-side, use a local library such as tiktoken, or send a single /v1/messages call and read usage.input_tokens from the response.

If your client speaks the OpenAI Responses protocol (e.g. Codex CLI), use the OpenAI /v1/responses entry point instead — cc-router translates the request into Anthropic Messages and runs it through the same dispatch pipeline.