Anthropic /v1/messages
cc-router starts a local HTTP proxy that exposes two protocol entry points side by side: the Anthropic Messages API (primary) and the OpenAI Responses API (v2.3+ compatibility). This page is the full reference for Anthropic /v1/messages — the preferred entry point for Claude Code and any client that speaks the Anthropic Messages protocol.
Applies to cc-router v3.0.0 and later.
Listening address and ports
| Setting | Default | Notes |
|---|---|---|
| Bind address | 127.0.0.1 | Toggling “listen on all interfaces” switches to 0.0.0.0 (UI shows a red warning) |
| HTTP port | 23456 | If busy, cc-router probes +1 up to 100 times |
| HTTPS port | per the https_port setting | Enabled per the proxy mode |
| Address/port changes | Require an app restart | The proxy does not hot-reload |
Minimal client configuration
export ANTHROPIC_BASE_URL=http://127.0.0.1:23456
# Skip the next line if authentication is disabled (the default)
export ANTHROPIC_API_KEY=<token from the cc-router settings page>
# Point Claude Code's three model slots at cc-router's virtual models
export ANTHROPIC_DEFAULT_OPUS_MODEL=model-opus
export ANTHROPIC_DEFAULT_SONNET_MODEL=model-sonnet
export ANTHROPIC_DEFAULT_HAIKU_MODEL=model-haiku
Authentication
- Disabled by default: any request goes through, no token required.
- When enabled, cc-router reads the token from either header (either one is enough):
x-api-key: <token>— Claude Code’sANTHROPIC_API_KEYlands here (preferred)Authorization: Bearer <token>— Claude Code’sANTHROPIC_AUTH_TOKENlands here
- The extracted token must match exactly the
auth_tokenconfigured in the settings page; otherwise cc-router returns401. - Allowlist:
/v1/models,/health, and allOPTIONSpreflight requests always pass through, even with authentication enabled.- Rationale: clients need to list models at startup; browsers need to probe without auth blocking them. The endpoints that actually consume quota (
/v1/messagesand/v1/responses) are the ones that require auth.
- Rationale: clients need to list models at startup; browsers need to probe without auth blocking them. The endpoints that actually consume quota (
The token cc-router asks for here is for cc-router itself, unrelated to your upstream providers’ real API keys — those are swapped in by cc-router according to virtual-model dispatch rules.
CORS is on by default: Access-Control-Allow-Origin: *, methods GET / POST / OPTIONS, all headers allowed, preflight returns 204. Even 401 responses carry CORS headers, so a browser fetch can read the response body.
Request
POST /v1/messages
Content-Type: application/json
| Header | Required | Notes |
|---|---|---|
Content-Type: application/json | Yes | The body must be JSON |
x-api-key or Authorization: Bearer ... | Per auth settings | One of the two when auth is enabled |
anthropic-version / anthropic-beta / … | No | cc-router does not consume these; passed verbatim to upstream |
The request body uses the standard Anthropic Messages API format. cc-router reads only two fields for dispatch; every other field (messages / system / tools / temperature / max_tokens / thinking / …) is passed through unchanged to the upstream.
| Field | Type | Required | Behavior |
|---|---|---|---|
model | string | Yes | Resolved to a virtual model — see the mapping table below. Missing this returns 400 |
stream | boolean | No (defaults to false) | true uses SSE; false is non-streaming |
cc-router rewrites the model field in the body:
- Resolves to
model-opus/model-sonnet/model-haiku→ rewritten to the real model name bound in that slot (e.g.glm-4.6,qwen3-max) - Resolves to fallback → not rewritten; passed through as-is to the upstream
Non-streaming request example
curl http://127.0.0.1:23456/v1/messages \
-H 'Content-Type: application/json' \
-d '{
"model": "model-sonnet",
"max_tokens": 256,
"messages": [
{ "role": "user", "content": "Explain cc-router in one sentence" }
]
}' | jq
Response (non-streaming)
200 OK,Content-Type: application/json- The body is the standard Anthropic
messageJSON - cc-router rewrites
message.modelback to the virtual model name (fallback mode skips this) so clients can aggregate caching and stats by virtual model usage.*(includingcache_creation_input_tokens/cache_read_input_tokens) is passed through; cc-router also extracts a copy for internal accounting
{
"id": "msg_xxx",
"type": "message",
"role": "assistant",
"model": "model-sonnet",
"content": [
{ "type": "text", "text": "..." }
],
"stop_reason": "end_turn",
"usage": { "input_tokens": 42, "output_tokens": 128 }
}
Response (streaming SSE)
200 OK,Content-Type: text/event-stream- Upstream SSE frames are passed through byte-for-byte, with two exceptions:
| Event | cc-router behavior |
|---|---|
message_start | Parses the JSON, rewrites message.model to the virtual model name (fallback skips this), extracts usage.* for accounting, re-serializes, and writes out |
message_delta | Not rewritten — cc-router only side-channels usage.output_tokens etc. for accounting; bytes pass through |
All other events (content_block_* / message_stop / ping, …) | Passed through unchanged |
Streaming request example
curl -N http://127.0.0.1:23456/v1/messages \
-H 'Content-Type: application/json' \
-d '{
"model": "model-sonnet",
"max_tokens": 256,
"stream": true,
"messages": [
{ "role": "user", "content": "ping" }
]
}'
-N disables curl buffering so SSE frames print live.
First-frame lookahead: when the upstream returns
200but the very first event is actuallyevent: error(typical case: GLM 1302/1308 quota exhaustion disguised as 200), cc-router does not forward that frame. It silently triggers a retry against the next subscription. The client only ever sees one successful completion or one final failure.Mid-stream disconnect: when the upstream connection drops mid-stream, cc-router appends an
event: errorframe plusdata: [DONE]after the frames already sent, so the client can observe the interruption instead of hanging.
Virtual model mapping
cc-router maps the request’s model field to one virtual model, then tries the subscriptions bound to that virtual model in order according to its dispatch mode (sequential / round-robin).
Client-sent model | Resolved virtual model |
|---|---|
model-opus, claude-opus-4-7, gpt-5.5, anthropic/model-opus, anthropic/claude-opus-4-7, openai/gpt-5.5 | model-opus |
model-sonnet, claude-sonnet-4-6, gpt-5.4, anthropic/model-sonnet, anthropic/claude-sonnet-4-6, openai/gpt-5.4 | model-sonnet |
model-haiku, claude-haiku-4-5, gpt-5.4-mini, anthropic/model-haiku, anthropic/claude-haiku-4-5, openai/gpt-5.4-mini | model-haiku |
model-fallback, anthropic/model-fallback | Fallback (explicit) |
| Any other value (custom model names, etc.) | Fallback (implicit; model is passed through verbatim) |
- The
anthropic/prefix is supported for LiteLLM-style vendor-prefixed naming.- The
openai/prefix is the same idea, primarily for clients hitting the OpenAI/v1/responsesentry point.gpt-5.5 / gpt-5.4 / gpt-5.4-miniare OpenAI-flavored aliases that cc-router deliberately reuse the Opus / Sonnet / Haiku slots — no new virtual model is introduced.
Error responses
Errors from /v1/messages follow the Anthropic shape:
{
"type": "error",
"error": {
"type": "<kind>",
"message": "<human-readable message>"
}
}
Errors produced by cc-router itself:
| HTTP status | kind | Trigger |
|---|---|---|
400 | invalid_request_error | JSON parse failure / missing model field |
401 | authentication_error | Token mismatch when auth is enabled |
500 | api_error | Internal pipeline error (e.g. every subscription failed) |
503 | overloaded_error | The fallback virtual model has no bound subscriptions |
| 4xx / 5xx | Depends on upstream | When every subscription fails, cc-router forwards the last upstream’s status and error body |
SSE error frames inside the stream:
- If the very first event is
event: error→ cc-router intercepts and automatically retries the next subscription; the client never sees it - An
errorframe appearing mid-stream → passed through to the client;kindis fixed toupstream_error
Unimplemented Anthropic endpoints
The following official Anthropic endpoints are not implemented in cc-router by design:
POST /v1/messages/count_tokensPOST /v1/messages/batchesand every batches-related endpointPOST /v1/files(Files API)- Workbench / Admin API
cc-router targets Claude-Code-style real-time conversation proxying; Claude Code only depends on POST /v1/messages and GET /v1/models, so the rest is not implemented. To estimate token counts client-side, use a local library such as tiktoken, or send a single /v1/messages call and read usage.input_tokens from the response.
If your client speaks the OpenAI Responses protocol (e.g. Codex CLI), use the OpenAI /v1/responses entry point instead — cc-router translates the request into Anthropic Messages and runs it through the same dispatch pipeline.