refactor: 实现 ConversionEngine 协议转换引擎，替代旧 protocol 包

- 新增 ConversionEngine 核心引擎，支持 OpenAI 和 Anthropic 协议转换 - 添加 stream decoder/encoder 实现 - 更新 provider client 支持新引擎 - 补充单元测试和集成测试 - 更新 specs 文档
2026-04-20 13:01:05 +08:00
parent 1dac347d3b
commit bc1ee612d9
39 changed files with 11177 additions and 995 deletions
--- a/openspec/specs/protocol-adapter-anthropic/spec.md
+++ b/openspec/specs/protocol-adapter-anthropic/spec.md
@@ -0,0 +1,271 @@
+# Protocol Adapter - Anthropic
+
+## ADDED Requirements
+
+### Requirement: 实现 Anthropic ProtocolAdapter
+
+系统 SHALL 全新实现 Anthropic 协议的完整 ProtocolAdapter，对照 `docs/conversion_anthropic.md`。不沿用旧 `internal/protocol/anthropic/` 代码。
+
+- `protocolName()` SHALL 返回 `"anthropic"`
+- `supportsPassthrough()` SHALL 返回 true
+- `buildHeaders(provider)` SHALL 构建 `x-api-key`、`anthropic-version`、`anthropic-beta`、`Content-Type`
+- `buildUrl(nativePath, interfaceType)` SHALL 按接口类型映射 URL 路径
+- `supportsInterface()` SHALL 对 CHAT、MODELS、MODEL_INFO 返回 true，对 EMBEDDINGS、RERANK 返回 false
+
+#### Scenario: 认证 Header 构建
+
+- **WHEN** 调用 buildHeaders(provider)
+- **THEN** SHALL 设置 `x-api-key: <provider.api_key>`
+- **THEN** SHALL 设置 `anthropic-version`（默认 `"2023-06-01"`，从 adapter_config 可覆盖）
+- **WHEN** adapter_config 包含 anthropic_beta
+- **THEN** SHALL 以逗号拼接为 `anthropic-beta` Header
+
+#### Scenario: URL 映射
+
+- **WHEN** interfaceType == CHAT
+- **THEN** SHALL 映射为 `/v1/messages`
+- **WHEN** interfaceType == MODELS
+- **THEN** SHALL 映射为 `/v1/models`
+- **WHEN** interfaceType == EMBEDDINGS 或 RERANK
+- **THEN** SHALL NOT 调用 buildUrl（supportsInterface 返回 false，引擎走透传）
+
+### Requirement: Anthropic 请求解码（Anthropic → Canonical）
+
+系统 SHALL 实现完整的 Anthropic MessagesRequest 到 CanonicalRequest 的解码。
+
+#### Scenario: System 消息提取
+
+- **WHEN** Anthropic 请求包含顶层 `system` 字段
+- **THEN** String 类型 SHALL 直接提取为 canonical.system
+- **THEN** SystemBlock 数组 SHALL 提取为 canonical.system（Array）
+
+#### Scenario: User 消息中 tool_result 拆分
+
+- **WHEN** Anthropic user 消息的 content 中包含 tool_result 块
+- **THEN** SHALL 将 tool_result 块拆分为独立的 CanonicalMessage{role: "tool"}
+- **THEN** 非 tool_result 块 SHALL 保留为独立的 CanonicalMessage{role: "user"}
+- **THEN** 仅 tool_result 块时 SHALL 只产出 tool 角色消息
+
+#### Scenario: 参数映射
+
+- **WHEN** 解码 Anthropic 请求参数
+- **THEN** max_tokens SHALL 直接映射
+- **THEN** temperature/top_p/top_k SHALL 直接映射
+- **THEN** stop_sequences SHALL 直接映射
+
+#### Scenario: ThinkingConfig 解码
+
+- **WHEN** 解码 Anthropic thinking 字段
+- **THEN** type="enabled" SHALL 映射为 Canonical thinking.type="enabled"
+- **THEN** type="disabled" SHALL 映射为 Canonical thinking.type="disabled"
+- **THEN** type="adaptive" SHALL 映射为 Canonical thinking.type="adaptive"
+- **THEN** budget_tokens 和 output_config.effort SHALL 直接映射
+
+#### Scenario: 公共字段提取
+
+- **WHEN** 解码 Anthropic 公共字段
+- **THEN** metadata.user_id SHALL 提取为 user_id
+- **THEN** output_config.format（json_schema 类型）SHALL 提取为 output_format
+- **THEN** disable_parallel_tool_use SHALL 反转映射为 parallel_tool_use（true → false）
+
+#### Scenario: 协议特有字段处理
+
+- **WHEN** 解码遇到 redacted_thinking
+- **THEN** SHALL 丢弃，不在中间层保留
+- **WHEN** 解码遇到 cache_control
+- **THEN** SHALL 忽略，不晋升为公共字段
+
+### Requirement: Anthropic 请求编码（Canonical → Anthropic）
+
+系统 SHALL 实现完整的 CanonicalRequest 到 Anthropic MessagesRequest 的编码。
+
+#### Scenario: System 消息注入
+
+- **WHEN** canonical.system 不为空
+- **THEN** SHALL 编码为 Anthropic 顶层 `system` 字段
+
+#### Scenario: Tool 角色合并到 User 消息
+
+- **WHEN** CanonicalMessage{role: "tool"} 出现在消息序列中
+- **THEN** SHALL 将其 tool_result 块合并到相邻的 Anthropic user 消息的 content 数组中
+- **WHEN** 相邻前一条不是 user 消息
+- **THEN** SHALL 创建新的 user 消息来承载 tool_result 块
+
+#### Scenario: 首消息 user 保证
+
+- **WHEN** 编码后的 Anthropic messages 数组首条消息不是 user 角色
+- **THEN** SHALL 自动注入一条空 user 消息到头部
+
+#### Scenario: 角色交替约束
+
+- **WHEN** 编码后存在连续同角色消息
+- **THEN** SHALL 合并为单条消息（content 数组拼接）
+
+#### Scenario: 参数编码
+
+- **WHEN** 编码 CanonicalRequest 参数
+- **THEN** parameters.max_tokens SHALL 直接映射（Anthropic 必填）
+- **THEN** parameters.top_k SHALL 直接映射
+- **THEN** canonical.thinking.type="enabled" SHALL 映射为 thinking{type: "enabled", budget_tokens}
+- **THEN** canonical.thinking.type="adaptive" SHALL 映射为 thinking{type: "adaptive"}
+
+#### Scenario: 公共字段编码
+
+- **WHEN** canonical.user_id 不为空
+- **THEN** SHALL 编码为 metadata.user_id
+- **WHEN** canonical.parallel_tool_use == false
+- **THEN** SHALL 编码为 disable_parallel_tool_use: true
+- **WHEN** canonical.output_format 存在
+- **THEN** SHALL 编码为 output_config.format
+
+#### Scenario: 降级处理
+
+- **WHEN** canonical.output_format.type == "json_object"
+- **THEN** SHALL 降级为 output_config.format{type: "json_schema", schema: {type: "object"}}
+- **WHEN** canonical.output_format.type == "text"
+- **THEN** SHALL 丢弃，不设置 output_config
+
+### Requirement: Anthropic 响应解码（Anthropic → Canonical）
+
+系统 SHALL 实现 Anthropic MessagesResponse 到 CanonicalResponse 的解码。
+
+#### Scenario: 内容块解码
+
+- **WHEN** Anthropic response 包含 text 块
+- **THEN** SHALL 解码为 TextBlock
+- **WHEN** 包含 tool_use 块
+- **THEN** SHALL 解码为 ToolUseBlock
+- **WHEN** 包含 thinking 块
+- **THEN** SHALL 解码为 ThinkingBlock
+- **WHEN** 包含 redacted_thinking 块
+- **THEN** SHALL 丢弃
+
+#### Scenario: 停止原因映射
+
+- **WHEN** 解码 stop_reason
+- **THEN** "end_turn" SHALL 映射为 "end_turn"
+- **THEN** "max_tokens" SHALL 映射为 "max_tokens"
+- **THEN** "tool_use" SHALL 映射为 "tool_use"
+- **THEN** "stop_sequence" SHALL 映射为 "stop_sequence"
+- **THEN** "refusal" SHALL 映射为 "refusal"
+- **THEN** "pause_turn" SHALL 映射为 "pause_turn"
+
+#### Scenario: Usage 映射
+
+- **WHEN** 解码 Anthropic usage
+- **THEN** input_tokens SHALL 直接映射
+- **THEN** output_tokens SHALL 直接映射
+- **THEN** cache_read_input_tokens SHALL 映射为 cache_read_tokens
+- **THEN** cache_creation_input_tokens SHALL 映射为 cache_creation_tokens
+
+### Requirement: Anthropic 响应编码（Canonical → Anthropic）
+
+系统 SHALL 实现 CanonicalResponse 到 Anthropic MessagesResponse 的编码。
+
+#### Scenario: 降级处理
+
+- **WHEN** canonical.stop_reason 为 "content_filter"
+- **THEN** SHALL 降级映射为 "end_turn"
+- **WHEN** canonical.reasoning_tokens 不为空
+- **THEN** SHALL 丢弃（Anthropic 无此字段）
+
+### Requirement: Anthropic 流式解码器
+
+系统 SHALL 实现 AnthropicStreamDecoder，将 Anthropic 命名 SSE 事件转换为 CanonicalStreamEvent。
+
+Decoder 几乎 1:1 映射，维护最小状态机：
+- messageStarted: 是否已发送 MessageStartEvent
+- redactedBlocks: 需要丢弃的 block index 集合
+- utf8Remainder: UTF-8 跨 chunk 安全缓冲
+
+#### Scenario: 命名事件 1:1 映射
+
+- **WHEN** 收到 `event: message_start`
+- **THEN** SHALL 发出 MessageStartEvent
+- **WHEN** 收到 `event: content_block_start`
+- **THEN** SHALL 发出 ContentBlockStartEvent
+- **WHEN** 收到 `event: content_block_delta`
+- **THEN** SHALL 发出 ContentBlockDeltaEvent
+- **WHEN** 收到 `event: content_block_stop`
+- **THEN** SHALL 发出 ContentBlockStopEvent
+- **WHEN** 收到 `event: message_delta`
+- **THEN** SHALL 发出 MessageDeltaEvent
+- **WHEN** 收到 `event: message_stop`
+- **THEN** SHALL 发出 MessageStopEvent
+- **WHEN** 收到 `event: ping`
+- **THEN** SHALL 发出 PingEvent
+- **WHEN** 收到 `event: error`
+- **THEN** SHALL 发出 ErrorEvent
+
+#### Scenario: redacted_thinking 块丢弃
+
+- **WHEN** content_block_start 事件中 content_block.type 为 "redacted_thinking"
+- **THEN** SHALL 将 index 加入 redactedBlocks
+- **THEN** 后续该 index 的 delta 和 stop 事件 SHALL 丢弃
+
+#### Scenario: 协议特有 delta 丢弃
+
+- **WHEN** delta 类型为 citations_delta 或 signature_delta
+- **THEN** SHALL 丢弃，不影响 block 生命周期
+
+#### Scenario: 服务端工具块丢弃
+
+- **WHEN** content_block_start 事件中类型为 server_tool_use / web_search_tool_result 等
+- **THEN** SHALL 丢弃整个 block
+
+### Requirement: Anthropic 流式编码器
+
+系统 SHALL 实现 AnthropicStreamEncoder，将 CanonicalStreamEvent 编码为 Anthropic 命名 SSE 事件。
+
+#### Scenario: 直接映射，无缓冲
+
+- **WHEN** 收到任意 CanonicalStreamEvent
+- **THEN** SHALL 直接编码为对应的 Anthropic SSE 事件
+- **THEN** SHALL NOT 缓冲等待（与 OpenAI 编码器不同）
+
+#### Scenario: SSE 格式
+
+- **WHEN** 编码输出
+- **THEN** SHALL 使用 `event: <type>\ndata: <json>\n\n` 格式
+
+#### Scenario: Delta 类型编码
+
+- **WHEN** delta.type == "text_delta"
+- **THEN** SHALL 编码为 Anthropic text_delta
+- **WHEN** delta.type == "input_json_delta"
+- **THEN** SHALL 编码为 Anthropic input_json_delta
+- **WHEN** delta.type == "thinking_delta"
+- **THEN** SHALL 编码为 Anthropic thinking_delta
+
+### Requirement: Anthropic 错误编码
+
+系统 SHALL 实现 Anthropic 协议的错误编码。
+
+#### Scenario: 错误响应格式
+
+- **WHEN** 调用 encodeError(conversionError)
+- **THEN** SHALL 返回 `{type: "error", error: {type: <error_code>, message: <message>}}`
+
+### Requirement: Anthropic 扩展层接口编解码
+
+系统 SHALL 实现 Anthropic 协议的扩展层接口编解码（仅 Models）。
+
+#### Scenario: /models 列表接口
+
+- **WHEN** 解码 Anthropic models 响应
+- **THEN** data[].display_name SHALL 映射为 models[].name
+- **THEN** data[].created_at（RFC 3339）SHALL 转换为 models[].created（Unix 时间戳）
+- **WHEN** 编码 CanonicalModelList 为 Anthropic 格式
+- **THEN** SHALL 输出包含 has_more、first_id、last_id 的结构
+- **THEN** models[].created（Unix 时间戳）SHALL 转换为 RFC 3339 字符串
+
+#### Scenario: /models 详情接口
+
+- **WHEN** 解码/编码 model 详情
+- **THEN** SHALL 处理 display_name ↔ name 和 RFC 3339 ↔ Unix 时间戳的转换
+
+#### Scenario: EMBEDDINGS 和 RERANK 不支持
+
+- **WHEN** interfaceType 为 EMBEDDINGS 或 RERANK
+- **THEN** supportsInterface SHALL 返回 false
+- **THEN** 引擎 SHALL 走透传或返回空响应