AI Agent 是什么：从概念到架构

什么是 AI Agent

AI Agent（智能体）是一个能够自主感知环境、做出决策、执行行动的 AI 系统。它不只是”你问我答”的聊天机器人，而是能够主动规划、使用工具、完成复杂任务的”数字员工”。

Chatbot:
  用户: "帮我查一下北京明天的天气"
  Bot:  "抱歉，我无法访问实时天气数据"

AI Agent:
  用户: "帮我查一下北京明天的天气，如果下雨就帮我取消明天的户外会议"
  Agent: 思考 → 调用天气 API → 发现明天有雨 → 调用日历 API → 取消会议 → 发送通知

关键区别：Agent 能够自主决定”下一步做什么”，而不是等用户一步步指示。

Agent vs Chatbot vs Copilot

特性	Chatbot	Copilot	Agent
交互方式	一问一答	辅助建议	自主执行
决策能力	无	有限	强
工具使用	无	有限	丰富
任务复杂度	简单问答	单步辅助	多步复杂任务
自主性	被动响应	半主动	主动规划执行
典型产品	客服机器人	GitHub Copilot	Claude Code, Devin

Agent 的核心组件

一个完整的 AI Agent 由四个核心部分组成：

┌─────────────────────────────────────┐
│              AI Agent               │
│                                     │
│  ┌─────────┐    ┌──────────────┐   │
│  │  大脑    │    │    记忆       │   │
│  │  (LLM)  │◄──►│  (Memory)    │   │
│  └────┬────┘    └──────────────┘   │
│       │                             │
│  ┌────▼────┐    ┌──────────────┐   │
│  │  规划    │    │    工具       │   │
│  │(Planning)│──►│  (Tools)     │   │
│  └─────────┘    └──────────────┘   │
│                                     │
└─────────────────────────────────────┘

1. 大脑 (LLM)

LLM 是 Agent 的核心推理引擎。它负责：

理解用户意图
分析当前状态
决定下一步行动
生成最终回答

不是所有 LLM 都适合做 Agent 的大脑。好的 Agent LLM 需要：

强大的推理能力
可靠的指令遵循
稳定的工具调用能力
足够大的上下文窗口

2. 工具 (Tools)

工具是 Agent 与外部世界交互的接口。没有工具的 Agent 就像没有手的人——能想但不能做。

常见工具类型：

类型	示例	用途
搜索	Google Search, Bing	获取实时信息
代码执行	Python REPL, Shell	运行代码、执行命令
API 调用	天气 API, 日历 API	访问外部服务
文件操作	读写文件、创建目录	操作本地文件系统
数据库	SQL 查询	查询和修改数据
浏览器	Playwright, Selenium	浏览网页、填写表单

3. 记忆 (Memory)

Agent 需要记住之前发生了什么，才能做出连贯的决策。

短期记忆 (Short-term Memory):
  - 当前对话的上下文
  - 最近几步的行动和结果
  - 实现方式: 直接放在 LLM 的 context window 中

长期记忆 (Long-term Memory):
  - 用户偏好和历史
  - 之前任务的经验
  - 实现方式: 向量数据库 + 检索

工作记忆 (Working Memory):
  - 当前任务的中间状态
  - 待办事项列表
  - 实现方式: 结构化的状态对象

4. 规划 (Planning)

面对复杂任务，Agent 需要把大任务拆解成小步骤，然后逐步执行。

用户: "帮我写一篇关于 RAG 的技术博客，发布到我的网站上"

Agent 的规划:
  1. 研究 RAG 的最新进展（搜索工具）
  2. 整理大纲
  3. 撰写文章内容
  4. 生成配图描述
  5. 格式化为 Markdown
  6. 通过 API 发布到网站
  7. 验证发布成功

ReAct 模式

ReAct (Reasoning + Acting) 是目前最主流的 Agent 执行模式。它让 LLM 交替进行”思考”和”行动”：

循环:
  Thought: 我需要做什么？当前状态是什么？
  Action:  调用某个工具
  Observation: 工具返回的结果
  ... (重复直到任务完成)
  Final Answer: 最终回答

一个具体的例子

用户: "苹果公司最新的市值是多少？换算成人民币是多少？"

Thought: 我需要查询苹果公司的最新市值，然后查询美元兑人民币汇率，最后计算。

Action: search("Apple Inc market cap 2026")
Observation: Apple's market cap is approximately $3.8 trillion as of March 2026.

Thought: 现在我知道市值是 3.8 万亿美元，需要查汇率。

Action: search("USD to CNY exchange rate today")
Observation: 1 USD = 7.23 CNY

Thought: 现在可以计算了。3.8 万亿 × 7.23 = 27.474 万亿人民币。

Final Answer: 苹果公司最新市值约 3.8 万亿美元，按当前汇率（1 美元 ≈ 7.23 人民币）换算，约为 27.47 万亿人民币。

工具调用 (Function Calling)

现代 LLM 都支持 Function Calling，这是 Agent 使用工具的标准方式。

定义工具

# 定义工具的 schema
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "获取指定城市的天气信息",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "城市名称，如 '北京'"
                    },
                    "date": {
                        "type": "string",
                        "description": "日期，格式 YYYY-MM-DD"
                    }
                },
                "required": ["city"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "send_email",
            "description": "发送邮件",
            "parameters": {
                "type": "object",
                "properties": {
                    "to": {"type": "string", "description": "收件人邮箱"},
                    "subject": {"type": "string", "description": "邮件主题"},
                    "body": {"type": "string", "description": "邮件正文"}
                },
                "required": ["to", "subject", "body"]
            }
        }
    }
]

LLM 调用工具

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "北京明天天气怎么样？"}
    ],
    tools=tools,
    tool_choice="auto"  # 让模型自己决定是否调用工具
)

# 模型会返回工具调用请求
tool_call = response.choices[0].message.tool_calls[0]
print(f"调用工具: {tool_call.function.name}")
print(f"参数: {tool_call.function.arguments}")
# 输出: 调用工具: get_weather
# 输出: 参数: {"city": "北京", "date": "2026-03-12"}

Claude 的工具调用

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[
        {
            "name": "get_weather",
            "description": "获取指定城市的天气信息",
            "input_schema": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "城市名称"},
                    "date": {"type": "string", "description": "日期"}
                },
                "required": ["city"]
            }
        }
    ],
    messages=[
        {"role": "user", "content": "北京明天天气怎么样？"}
    ]
)

实战：构建一个简单的 Agent

我们来从零构建一个最简单的 ReAct Agent，不依赖任何框架：

"""
简易 ReAct Agent
"""
import json
from openai import OpenAI

client = OpenAI()

# ---- 定义工具 ----

def calculator(expression: str) -> str:
    """安全的计算器"""
    try:
        # 只允许数学运算
        allowed = set("0123456789+-*/.() ")
        if not all(c in allowed for c in expression):
            return "错误：只支持数学表达式"
        result = eval(expression)
        return str(result)
    except Exception as e:
        return f"计算错误: {e}"

def get_current_time() -> str:
    """获取当前时间"""
    from datetime import datetime
    return datetime.now().strftime("%Y-%m-%d %H:%M:%S")

# 工具注册表
TOOLS = {
    "calculator": calculator,
    "get_current_time": get_current_time,
}

# 工具 schema
TOOL_SCHEMAS = [
    {
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "计算数学表达式",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "数学表达式，如 '2 + 3 * 4'"
                    }
                },
                "required": ["expression"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_current_time",
            "description": "获取当前日期和时间",
            "parameters": {
                "type": "object",
                "properties": {}
            }
        }
    }
]

# ---- Agent 主循环 ----

def run_agent(user_message: str, max_steps: int = 5):
    """运行 Agent"""
    messages = [
        {
            "role": "system",
            "content": "你是一个有用的助手。你可以使用工具来帮助回答问题。"
        },
        {"role": "user", "content": user_message}
    ]

    for step in range(max_steps):
        print(f"\n--- 第 {step + 1} 步 ---")

        # 调用 LLM
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=TOOL_SCHEMAS,
            tool_choice="auto"
        )

        message = response.choices[0].message

        # 如果没有工具调用，说明 Agent 已经得出结论
        if not message.tool_calls:
            print(f"最终回答: {message.content}")
            return message.content

        # 处理工具调用
        messages.append(message)

        for tool_call in message.tool_calls:
            func_name = tool_call.function.name
            func_args = json.loads(tool_call.function.arguments)

            print(f"调用工具: {func_name}({func_args})")

            # 执行工具
            if func_name in TOOLS:
                result = TOOLS[func_name](**func_args)
            else:
                result = f"未知工具: {func_name}"

            print(f"工具结果: {result}")

            # 把工具结果加入对话
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result
            })

    return "达到最大步数限制"

# ---- 使用 ----
run_agent("现在几点了？如果现在是下午，帮我算一下距离 18:00 还有多少分钟")

运行效果：

--- 第 1 步 ---
调用工具: get_current_time({})
工具结果: 2026-03-11 15:30:00

--- 第 2 步 ---
调用工具: calculator({"expression": "(18 - 15) * 60 - 30"})
工具结果: 150

--- 第 3 步 ---
最终回答: 现在是下午 15:30，距离 18:00 还有 150 分钟（2 小时 30 分钟）。

记忆机制详解

短期记忆：对话上下文

最简单的记忆就是把对话历史放在 messages 里：

messages = [
    {"role": "user", "content": "我叫张三"},
    {"role": "assistant", "content": "你好张三！"},
    {"role": "user", "content": "我叫什么？"},
    # LLM 可以从上下文中找到答案
]

问题：上下文窗口有限，对话太长会超出限制。

解决方案：

def manage_context(messages, max_tokens=4000):
    """简单的上下文管理：保留系统消息和最近的对话"""
    system_msg = [m for m in messages if m["role"] == "system"]
    other_msgs = [m for m in messages if m["role"] != "system"]

    # 保留最近的消息
    while estimate_tokens(other_msgs) > max_tokens:
        other_msgs.pop(0)  # 移除最早的消息

    return system_msg + other_msgs

长期记忆：向量数据库

把重要的对话和经验存入向量数据库，需要时检索：

import chromadb

memory_db = chromadb.Client()
memory_collection = memory_db.create_collection("agent_memory")

def save_memory(content, metadata=None):
    """保存记忆"""
    memory_collection.add(
        documents=[content],
        metadatas=[metadata or {}],
        ids=[f"mem_{memory_collection.count()}"]
    )

def recall_memory(query, top_k=3):
    """回忆相关记忆"""
    results = memory_collection.query(
        query_texts=[query],
        n_results=top_k
    )
    return results["documents"][0]

# 使用
save_memory("用户偏好：喜欢简洁的回答风格", {"type": "preference"})
save_memory("上次帮用户修复了 Python 的 import 错误", {"type": "experience"})

# 后续对话中
relevant_memories = recall_memory("用户喜欢什么风格")
# → ["用户偏好：喜欢简洁的回答风格"]

多 Agent 系统

当任务足够复杂时，一个 Agent 可能不够。多 Agent 系统让多个专业 Agent 协作完成任务。

常见架构

1. 主从模式 (Orchestrator)
   ┌──────────┐
   │ 主 Agent  │ ← 负责分配任务和汇总结果
   └────┬─────┘
   ┌────┼─────┐
   ▼    ▼     ▼
  研究  编码  审查   ← 各自负责专业领域
  Agent Agent Agent

2. 对等模式 (Peer-to-peer)
   Agent A ◄──► Agent B
      ▲            ▲
      └────► Agent C ◄┘
   各 Agent 平等协作，互相交流

3. 流水线模式 (Pipeline)
   Agent A → Agent B → Agent C → 最终结果
   每个 Agent 处理一个阶段

实际应用

软件开发团队:
  - PM Agent: 分析需求，拆解任务
  - Developer Agent: 编写代码
  - Reviewer Agent: 代码审查
  - Tester Agent: 编写和运行测试

研究团队:
  - Researcher Agent: 搜索和整理资料
  - Analyst Agent: 分析数据
  - Writer Agent: 撰写报告

主流 Agent 框架

框架	特点	适合场景
LangGraph	LangChain 生态，图结构工作流	复杂的多步骤 Agent
CrewAI	多 Agent 协作，角色扮演	团队协作场景
AutoGen	微软出品，多 Agent 对话	研究和实验
Semantic Kernel	微软出品，企业级	.NET/Java 生态
Claude Code	Anthropic 出品，编程 Agent	软件开发

LangGraph 示例

from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated

class AgentState(TypedDict):
    messages: list
    next_action: str

def reasoning_node(state):
    """推理节点：决定下一步做什么"""
    # LLM 推理逻辑
    return {"next_action": "use_tool"}

def tool_node(state):
    """工具节点：执行工具调用"""
    # 工具执行逻辑
    return {"messages": state["messages"] + [tool_result]}

def should_continue(state):
    """判断是否继续"""
    if state["next_action"] == "finish":
        return END
    return "tool_node"

# 构建图
graph = StateGraph(AgentState)
graph.add_node("reasoning", reasoning_node)
graph.add_node("tool_node", tool_node)
graph.add_edge("tool_node", "reasoning")
graph.add_conditional_edges("reasoning", should_continue)
graph.set_entry_point("reasoning")

agent = graph.compile()

Agent 的挑战与局限

1. 可靠性

Agent 的每一步都可能出错，步骤越多，出错概率越高：

单步成功率 95%
5 步任务成功率: 0.95^5 = 77%
10 步任务成功率: 0.95^10 = 60%
20 步任务成功率: 0.95^20 = 36%

2. 成本

Agent 需要多次调用 LLM，成本是普通对话的数倍甚至数十倍。

3. 安全性

Agent 能执行真实操作（删除文件、发送邮件、执行代码），需要严格的权限控制：

# 好的做法：需要用户确认危险操作
def execute_action(action):
    if action.is_destructive:
        confirm = input(f"Agent 想要执行: {action}，确认？(y/n) ")
        if confirm != "y":
            return "用户取消了操作"
    return action.execute()

4. 评估困难

Agent 的行为路径不确定，同一个任务可能有多种正确的执行方式，很难用传统指标评估。

总结

AI Agent 代表了 AI 应用的下一个阶段——从”对话”到”行动”。它的核心是让 LLM 不只是回答问题，而是能够自主规划、使用工具、完成复杂任务。

构建 Agent 的关键：

选择推理能力强的 LLM 作为大脑
设计清晰、可靠的工具接口
实现合理的记忆机制
加入安全防护和人工确认环节

Agent 让 AI 从”顾问”变成了”执行者”。我们正在见证一个新时代的开始——AI 不再只是回答”怎么做”，而是直接帮你”做到”。最后一篇，我们来看看 AI 如何理解文本之外的世界——多模态 AI。

加载中...