# 追踪与可观测性

AutoGen 内置了[追踪功能支持](https://microsoft.github.io/autogen/dev/user-guide/core-user-guide/framework/telemetry.html),可收集应用程序执行的全面记录。此功能对调试、性能分析和理解应用流程非常有用。

该能力由[OpenTelemetry](https://opentelemetry.io/)库提供支持,这意味着您可以使用任何兼容OpenTelemetry的后端来收集和分析追踪数据。

## 设置

首先需要安装OpenTelemetry Python包。可通过pip安装:

```bash
pip install opentelemetry-sdk opentelemetry-exporter-otlp-proto-grpc
```

安装SDK后,在AutoGen中设置追踪的最简单方法是:

1. 配置OpenTelemetry追踪提供程序
2. 设置导出器将追踪数据发送到后端
3. 将追踪提供程序连接到AutoGen运行时

## 遥测后端

要收集和查看追踪数据,需要设置遥测后端。有多个开源选项可用,包括Jaeger、Zipkin。本示例我们将使用Jaeger作为遥测后端。

快速启动方式是通过Docker本地运行Jaeger:

```bash
docker run -d --name jaeger \
 -e COLLECTOR_OTLP_ENABLED=true \
 -p 16686:16686 \
 -p 4317:4317 \
 -p 4318:4318 \
 jaegertracing/all-in-one:latest
```

此命令启动的Jaeger实例会在16686端口监听Jaeger UI,4317端口监听OpenTelemetry收集器。可通过`http://localhost:16686`访问Jaeger UI。

## 为AgentChat团队添加追踪

接下来我们将了解如何为AutoGen GroupChat团队启用追踪。AutoGen运行时已支持OpenTelemetry(自动记录消息元数据)。首先创建一个追踪服务用于检测AutoGen运行时。


In [None]:
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

otel_exporter = OTLPSpanExporter(endpoint="http://localhost:4317", insecure=True)
tracer_provider = TracerProvider(resource=Resource({"service.name": "autogen-test-agentchat"}))
span_processor = BatchSpanProcessor(otel_exporter)
tracer_provider.add_span_processor(span_processor)
trace.set_tracer_provider(tracer_provider)

# 稍后我们将通过服务名引用此追踪器 tracer = trace.get_tracer("autogen-test-agentchat")
# 

创建[团队](./tutorial/teams.ipynb)的所有代码您应该都已熟悉。需要注意的是,所有AgentChat代理和团队都使用AutoGen核心API运行时运行。而该运行时已内置检测功能,可记录[运行时消息事件(元数据)](https://github.com/microsoft/autogen/blob/main/python/packages/autogen-core/src/autogen_core/_telemetry/_tracing_config.py),包括:

- **create**:消息创建时
- **send**:消息发送时
- **publish**:消息发布时
- **receive**:消息接收时
- **intercept**:消息拦截时
- **process**:消息处理时
- **ack**:消息确认时


In [2]:
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.conditions import MaxMessageTermination, TextMentionTermination
from autogen_agentchat.teams import SelectorGroupChat
from autogen_agentchat.ui import Console
from autogen_core import SingleThreadedAgentRuntime
from autogen_ext.models.openai import OpenAIChatCompletionClient


def search_web_tool(query: str) -> str:
 if "2006-2007" in query:
 return """Here are the total points scored by Miami Heat players in the 2006-2007 season:
 Udonis Haslem: 844 points
 Dwayne Wade: 1397 points
 James Posey: 550 points
 ...
 """
 elif "2007-2008" in query:
 return "The number of total rebounds for Dwayne Wade in the Miami Heat season 2007-2008 is 214."
 elif "2008-2009" in query:
 return "The number of total rebounds for Dwayne Wade in the Miami Heat season 2008-2009 is 398."
 return "No data found."


def percentage_change_tool(start: float, end: float) -> float:
 return ((end - start) / start) * 100


async def main() -> None:
 model_client = OpenAIChatCompletionClient(model="gpt-4o")

 planning_agent = AssistantAgent(
 "PlanningAgent",
 description="An agent for planning tasks, this agent should be the first to engage when given a new task.",
 model_client=model_client,
 system_message="""
 You are a planning agent.
 Your job is to break down complex tasks into smaller, manageable subtasks.
 Your team members are:
 WebSearchAgent: Searches for information
 DataAnalystAgent: Performs calculations

 You only plan and delegate tasks - you do not execute them yourself.

 When assigning tasks, use this format:
 1. : 

 After all tasks are complete, summarize the findings and end with "TERMINATE".
 """,
 )

 web_search_agent = AssistantAgent(
 "WebSearchAgent",
 description="An agent for searching information on the web.",
 tools=[search_web_tool],
 model_client=model_client,
 system_message="""
 You are a web search agent.
 Your only tool is search_tool - use it to find information.
 You make only one search call at a time.
 Once you have the results, you never do calculations based on them.
 """,
 )

 data_analyst_agent = AssistantAgent(
 "DataAnalystAgent",
 description="An agent for performing calculations.",
 model_client=model_client,
 tools=[percentage_change_tool],
 system_message="""
 You are a data analyst.
 Given the tasks you have been assigned, you should analyze the data and provide results using the tools provided.
 If you have not seen the data, ask for it.
 """,
 )

 text_mention_termination = TextMentionTermination("TERMINATE")
 max_messages_termination = MaxMessageTermination(max_messages=25)
 termination = text_mention_termination | max_messages_termination

 selector_prompt = """Select an agent to perform task.

 {roles}

 Current conversation context:
 {history}

 Read the above conversation, then select an agent from {participants} to perform the next task.
 Make sure the planner agent has assigned tasks before other agents start working.
 Only select one agent.
 """

 task = "Who was the Miami Heat player with the highest points in the 2006-2007 season, and what was the percentage change in his total rebounds between the 2007-2008 and 2008-2009 seasons?"

 tracer = trace.get_tracer("autogen-test-agentchat")
 with tracer.start_as_current_span("runtime"):
 team = SelectorGroupChat(
 [planning_agent, web_search_agent, data_analyst_agent],
 model_client=model_client,
 termination_condition=termination,
 selector_prompt=selector_prompt,
 allow_repeated_speaker=True,
 )
 await Console(team.run_stream(task=task))

 await model_client.close()


# asyncio.run(main())

In [3]:
await main()

---------- user ----------
Who was the Miami Heat player with the highest points in the 2006-2007 season, and what was the percentage change in his total rebounds between the 2007-2008 and 2008-2009 seasons?
---------- PlanningAgent ----------
To accomplish this, we can break down the tasks as follows:

1. WebSearchAgent: Search for the Miami Heat player with the highest points during the 2006-2007 NBA season.
2. WebSearchAgent: Find the total rebounds for the identified player in both the 2007-2008 and 2008-2009 NBA seasons.
3. DataAnalystAgent: Calculate the percentage change in total rebounds for the player between the 2007-2008 and 2008-2009 seasons.

Once these tasks are complete, I will summarize the findings.
---------- WebSearchAgent ----------
[FunctionCall(id='call_PUhxZyR0CTlWCY4uwd5Zh3WO', arguments='{"query":"Miami Heat highest points scorer 2006-2007 season"}', name='search_web_tool')]
---------- WebSearchAgent ----------
[FunctionExecutionResult(content='Here are the tot

然后您可以使用Jaeger UI查看从上述应用运行中收集的追踪数据。

![Jaeger界面](jaeger.png)


## 自定义追踪

到目前为止,我们仅记录了由AutoGen运行时生成的默认事件(消息创建、发布等)。但您也可以创建自定义跨度来记录应用程序中的特定事件。

在下面的示例中,我们将展示如何通过围绕团队添加自定义跨度来记录运行时事件,以及记录团队生成的消息跨度,从而记录来自`RoundRobinGroupChat`团队的消息。


In [None]:
from autogen_agentchat.base import TaskResult
from autogen_agentchat.conditions import ExternalTermination
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_core import CancellationToken


async def run_agents() -> None:
 # 创建一个OpenAI模型客户端。
 model_client = OpenAIChatCompletionClient(model="gpt-4o-2024-08-06")

 # 创建主代理。
 primary_agent = AssistantAgent(
 "primary_agent",
 model_client=model_client,
 system_message="You are a helpful AI assistant.",
 )

 # 创建评论代理。
 critic_agent = AssistantAgent(
 "critic_agent",
 model_client=model_client,
 system_message="Provide constructive feedback. Respond with 'APPROVE' to when your feedbacks are addressed.",
 )

 # 定义一个终止条件,当评论者批准时停止任务。
 text_termination = TextMentionTermination("APPROVE")

 tracer = trace.get_tracer("autogen-test-agentchat")
 with tracer.start_as_current_span("runtime_round_robin_events"):
 team = RoundRobinGroupChat([primary_agent, critic_agent], termination_condition=text_termination)

 response_stream = team.run_stream(task="Write a 2 line haiku about the fall season")
 async for response in response_stream:
 async for response in response_stream:
 if not isinstance(response, TaskResult):
 print(f"\n-- {response.source} -- : {response.to_text()}")
 with tracer.start_as_current_span(f"agent_message.{response.source}") as message_span:
 message_span.set_attribute("agent.name", response.source)
 message_span.set_attribute("message.content", response.to_text())
 print(f"{response.source}: {response.to_text()}")

 await model_client.close()


await run_agents()


-- primary_agent -- : Leaves cascade like gold, 
Whispering winds cool the earth.
primary_agent: Leaves cascade like gold, 
Whispering winds cool the earth.

-- critic_agent -- : Your haiku beautifully captures the essence of the fall season with vivid imagery. However, it appears to have six syllables in the second line, which should traditionally be five. Here's a revised version keeping the 5-7-5 syllable structure:

Leaves cascade like gold, 
Whispering winds cool the air. 

Please adjust the second line to reflect a five-syllable count. Thank you!
critic_agent: Your haiku beautifully captures the essence of the fall season with vivid imagery. However, it appears to have six syllables in the second line, which should traditionally be five. Here's a revised version keeping the 5-7-5 syllable structure:

Leaves cascade like gold, 
Whispering winds cool the air. 

Please adjust the second line to reflect a five-syllable count. Thank you!

-- primary_agent -- : Leaves cascade like gol

在上面的代码中,我们为代理发送的每条消息创建了一个新的跨度。我们在跨度上设置了属性,包括代理名称和消息内容。这使我们能够追踪消息在应用程序中的流动,并理解它们是如何被处理的。
