AI应用接入大模型：直连、代理、网关三种方案的工程权衡

发布时间：2026/7/2 5:22:20

调用大模型API时直连走代理自建网关三条路各有取舍。本文不讲哪个最好只讲每种方案在什么场景下合理、什么场景下踩坑以及工程上需要额外处理什么。一、三条路径的本质区别先厘清概念。假设你要调用GPT-4o方案A应用 → api.openai.com 直连方案B应用 → 自建代理 → api.openai.com 代理转发方案C应用 → 自建网关 → 多个上游网关路由三者的核心差异不在于中间多了几跳而在于你把复杂度放在了哪里维度直连代理网关复杂度位置应用代码内代理服务网关服务多模型支持需自己在代码里切换代理可做协议转换网关做路由协议转换故障切换应用层重试代理可做上游切换网关做多通道负载均衡计费统计需自己实现代理可统一统计网关内置计费模块运维成本最低中等最高没有银弹。下面逐个拆解。二、方案A直连——最简单但天花板最低适用场景服务器能稳定访问目标API比如部署在海外只用一个厂商的模型调用量不大不需要复杂的计费和监控代码示例pythonfrom openai import OpenAI client OpenAI( api_keyyour-key, # base_url 不填默认指向 api.openai.com ) response client.chat.completions.create( modelgpt-4o, messages[{role: user, content: 你好}] )直连需要自己处理的工程问题1. 网络不稳定直连海外API国内服务器延迟波动大。需要在应用层做超时和重试pythonimport asyncio from openai import AsyncOpenAI, APITimeoutError, APIConnectionError client AsyncOpenAI(api_keyyour-key, timeout30.0) async def robust_chat(model, messages, max_retries3): for attempt in range(max_retries): try: return await client.chat.completions.create( modelmodel, messagesmessages ) except (APITimeoutError, APIConnectionError) as e: if attempt max_retries - 1: raise wait 2 ** attempt # 1s, 2s, 4s await asyncio.sleep(wait)2. 多模型切换如果同时用GPT和Claude需要维护两套客户端pythonfrom openai import AsyncOpenAI import anthropic openai_client AsyncOpenAI(api_keyopenai-key) anthropic_client anthropic.AsyncAnthropic(api_keyanthropic-key) async def chat(model, messages): if model.startswith(gpt): return await openai_client.chat.completions.create( modelmodel, messagesmessages ) elif model.startswith(claude): # Anthropic的system是独立字段需要转换 system user_messages [] for msg in messages: if msg[role] system: system msg[content] else: user_messages.append(msg) return await anthropic_client.messages.create( modelmodel, systemsystem, messagesuser_messages, max_tokens2000 )注意Anthropic和OpenAI的请求格式差异system字段独立、max_tokens必填这个适配逻辑在直连方案下需要自己写。3. 计费统计直连不提供统一的用量看板需要自己记录pythonfrom collections import defaultdict usage_log defaultdict(lambda: {input: 0, output: 0}) async def tracked_chat(model, messages): response await client.chat.completions.create( modelmodel, messagesmessages ) usage_log[model][input] response.usage.prompt_tokens usage_log[model][output] response.usage.completion_tokens return response def get_usage_report(): return {m: dict(u) for m, u in usage_log.items()}直连的局限网络问题只能靠重试兜底不能换通道多模型适配代码侵入业务逻辑没有统一的计费和监控入口调用量增长后这些局限会推动你往代理或网关方案迁移。三、方案B自建代理——中间层做协议适配适用场景需要访问多个厂商的API想把网络适配、协议转换从业务代码中剥离不需要复杂的多通道负载均衡架构应用 → 自建代理服务 → OpenAI / Anthropic / Google │ ├─ 协议转换统一为OpenAI格式 ├─ 超时重试 └─ 用量日志代理服务示例用FastAPI写一个轻量代理pythonfrom fastapi import FastAPI, Request from fastapi.responses import JSONResponse from openai import AsyncOpenAI import anthropic import logging app FastAPI() logger logging.getLogger(__name__) # 各厂商客户端 clients { openai: AsyncOpenAI(api_keyopenai-key), anthropic: anthropic.AsyncAnthropic(api_keyanthropic-key), } app.post(/v1/chat/completions) async def proxy_chat(request: Request): 统一入口按model字段路由到不同厂商 body await request.json() model body[model] messages body[messages] start time.time() if model.startswith(gpt): # OpenAI格式直接转发 response await clients[openai].chat.completions.create( modelmodel, messagesmessages ) result response.model_dump() elif model.startswith(claude): # 转换为Anthropic格式 system user_msgs [] for msg in messages: if msg[role] system: system msg[content] else: user_msgs.append(msg) response await clients[anthropic].messages.create( modelmodel, systemsystem, messagesuser_msgs, max_tokensbody.get(max_tokens, 2000) ) # 转换回OpenAI格式 result convert_to_openai_format(response) # 记录用量 latency time.time() - start logger.info(fmodel{model} latency{latency:.2f}s tokens{result.get(usage)}) return JSONResponse(result)代理方案的优势业务代码变成统一的OpenAI格式不再关心后端是哪个厂商python# 应用层代码——不关心后端是GPT还是Claude import httpx async def chat(model, messages): async with httpx.AsyncClient() as client: resp await client.post( http://your-proxy:8000/v1/chat/completions, json{model: model, messages: messages} ) return resp.json()代理方案的踩坑点坑1流式响应转发代理转发SSE流时要注意不能缓冲。FastAPI的StreamingResponse默认不缓冲但如果前面套了Nginx需要关闭proxy_bufferingnginxlocation /v1/chat/completions { proxy_pass http://proxy:8000; proxy_buffering off; # 关键 proxy_cache off; proxy_read_timeout 300s; chunked_transfer_encoding on; }坑2错误格式不统一OpenAI返回的错误格式和Anthropic不同。代理层需要统一错误响应pythonapp.exception_handler(Exception) async def error_handler(request, exc): if isinstance(exc, anthropic.APIStatusError): return JSONResponse( status_codeexc.status_code, content{error: {message: exc.message, type: api_error}} ) return JSONResponse( status_code500, content{error: {message: str(exc), type: internal_error}} )四、方案C自建网关——完整的API管理平台适用场景多团队、多应用共享AI能力需要多通道负载均衡和自动故障切换需要按应用/用户分别计费调用量大对可用性要求高网关的完整模块┌─ 认证API Key验证权限 ├─ 限流令牌桶 / 滑动窗口请求 → 网关 ────────├─ 路由model → 上游映射 ├─ 熔断错误率超阈值自动切断 ├─ 负载均衡多通道加权轮询 ├─ 计费按应用/用户统计Token ├─ 审核输入/输出内容安全 ├─ 缓存相同请求复用结果 └─ 监控延迟/错误率/用量看板核心模块代码路由负载均衡pythonimport random from collections import defaultdict class GatewayRouter: 网关路由模型→多上游通道映射 def __init__(self): # model → [通道列表] self.routes defaultdict(list) def add_route(self, model, channel_name, client, weight1): self.routes[model].append({ name: channel_name, client: client, weight: weight, health: 1.0 # 健康分0-1 }) async def route(self, model, messages, **kwargs): channels self.routes.get(model, []) if not channels: raise ValueError(f模型 {model} 无可用通道) # 按健康分过滤 healthy [c for c in channels if c[health] 0.3] if not healthy: healthy channels # 全不健康也得试 # 加权随机选择 total_weight sum(c[weight] * c[health] for c in healthy) r random.uniform(0, total_weight) cumulative 0 for channel in healthy: cumulative channel[weight] * channel[health] if r cumulative: return channel return healthy[0]熔断器pythonimport time from collections import deque class CircuitBreaker: 滑动窗口错误率熔断 def __init__(self, threshold0.3, window60, min_calls20): self.threshold threshold self.window window self.min_calls min_calls self.records deque() # [(timestamp, success)] self.state closed # closed / open / half_open self.opened_at 0 def record(self, success): now time.time() self.records.append((now, success)) # 清理过期记录 while self.records and self.records[0][0] now - self.window: self.records.popleft() self._evaluate() def _evaluate(self): if len(self.records) self.min_calls: return errors sum(1 for _, s in self.records if not s) error_rate errors / len(self.records) if error_rate self.threshold: self.state open self.opened_at time.time() elif self.state half_open and error_rate self.threshold / 2: self.state closed def allow(self): if self.state closed: return True if self.state open: if time.time() - self.opened_at 10: # 10秒后试探 self.state half_open return True return False return True # half_open放行计费模块pythonclass BillingTracker: 按应用模型统计Token用量 def __init__(self): self.usage defaultdict(lambda: defaultdict(lambda: { input_tokens: 0, output_tokens: 0, requests: 0 })) def record(self, app_id, model, input_tokens, output_tokens): stats self.usage[app_id][model] stats[input_tokens] input_tokens stats[output_tokens] output_tokens stats[requests] 1 def get_report(self, app_idNone): if app_id: return dict(self.usage.get(app_id, {})) return {app: dict(models) for app, models in self.usage.items()}开源方案自建网关不用从零写有成熟的开源方案项目语言特点one-apiGo多渠道管理、负载均衡、计费new-apiGoone-api增强版UI更好LiteLLMPython轻量级100模型支持FastGPTTypeScript带知识库和Agent能力基于开源方案做二次开发比自己从零搭快10倍。网关方案的代价运维成本网关本身需要监控、备份、升级延迟增加多一跳网络增加10-50ms单点风险网关挂了所有应用都受影响需要做网关的高可用五、三种方案的对比总结维度直连代理网关开发成本低中高可用开源方案降低运维成本低中高多模型支持需自己适配代理层适配网关层适配故障切换应用层重试代理可切换多通道自动切换计费统计自己实现代理日志内置计费模块延迟最低5-20ms10-50ms适合阶段原型/小规模中等规模大规模/多团队六、迁移路径大部分团队的演进路线是直连 → 代理 → 网关。阶段10-1万次/天直连快速验证业务 ↓ 网络不稳定、多模型需求出现阶段21-10万次/天加代理统一协议和重试 ↓ 多团队共享、计费需求、高可用需求出现阶段310万次/天上网关完整的管理平台不要一开始就上网关——过早的架构复杂度比技术债更危险。先直连跑通业务等痛点出现了再迁移。迁移时可以灰度新请求走网关老请求继续走直连逐步切换。七、选型检查清单服务器能稳定访问目标API吗用几个厂商的模型日调用量多少需要按应用/用户计费吗有专职运维吗对可用性要求多高99% / 99.9% / 99.99%有多团队共享需求吗前两个问题决定了直连够不够用后面的问题决定了是否需要代理或网关。八、总结三种方案不是优劣关系是适用阶段不同直连适合起步阶段简单直接复杂度在应用代码内代理适合中等规模把协议适配和网络处理从业务中剥离网关适合大规模和多团队场景提供完整的管理能力工程选型的核心原则用当前阶段最简单的方案保留迁移的可能性。不要为了未来可能需要而过早引入复杂架构。

AI应用接入大模型：直连、代理、网关三种方案的工程权衡

相关新闻

一篇文章帮你彻底搞清楚“I/O多路复用”和“异步I/O”的前世今生

什么是云拨测？有哪些专业的云拨测产品？

Python 开发教程

AI 时代大龄程序员的优势凸显：从技术执行者到系统编排者的历史性跃迁

拒绝盲目踩坑！6款经过市场验证的高性价比新手吉他推荐

基于LENA-R8与STM32的全球物联网高精度定位方案

企业级文本到SQL技术：CSR-RAG高效检索系统解析

MeEdu开源网校系统：如何构建高可用、低成本的视频点播平台架构

HP ProLiant Gen10 Plus 支持 PCIe4.0 吗？扩展插槽规格完整说明

管理者的六个层次

AI Coding 六个月真实ROI账本：产品经理的血泪教训，研发的冷静忠告

审计来了，数据权限全开——审计走了，怎么确保权限全部关掉？

告别 AccessKey：多云平台 CLI OAuth 免密认证完全指南

基于13DOF传感器与PIC32MZ的高精度嵌入式导航系统设计

UnblockNeteaseMusic终极教程：3分钟解锁网易云音乐灰色歌曲的完整方案

Coze与Dify对比指南：低代码AI应用开发从入门到实战

AI生图工具怎么选？2026年6月版实测对比

国产DSP FT-M6678 DDR3配置避坑指南：从PLL时钟到PHY寄存器，手把手调通你的第一块板