🚀 快速安装
复制以下命令并运行,立即安装此 Skill:
npx skills add https://skills.sh/supercent-io/skills-template/langsmith
💡 提示:需要 Node.js 和 NPM
langsmith — LLM 可观测性、评估与提示管理 (LLM Observability, Evaluation & Prompt Management)
关键字 (Keyword):
langsmith·llm tracing·llm evaluation·@traceable·langsmith evaluateLangSmith 是一个与框架无关的平台,用于开发、调试和部署 LLM 应用。
它提供端到端的追踪、质量评估、提示版本控制和生产监控。
何时使用此技能 (When to use this skill)
- 为任何 LLM 流水线添加追踪(OpenAI、Anthropic、LangChain、自定义模型)
- 使用
evaluate()对精选数据集运行离线评估 - 设置生产监控和在线评估
- 在 Prompt Hub 中管理和版本控制提示
- 为回归测试和基准测试创建数据集
- 为追踪添加人工或自动反馈
- 使用
openevals进行 LLM 即评判打分 - 通过端到端追踪检查调试代理失败问题
操作说明 (Instructions)
- 安装 SDK:
pip install -U langsmith(Python)或npm install langsmith(TypeScript) - 设置环境变量:
LANGSMITH_TRACING=true,LANGSMITH_API_KEY=lsv2_... - 使用
@traceable装饰器或wrap_openai()包装器进行代码插桩 - 在 smith.langchain.com 查看追踪
- 关于评估设置,请参阅 references/python-sdk.md
- 关于 CLI 命令,请参阅 references/cli.md
- 运行
bash scripts/setup.sh自动配置环境
API 密钥 (API Key):从 smith.langchain.com → Settings → API Keys 获取
文档 (Docs):https://docs.langchain.com/langsmith
快速开始 (Quick Start)
Python
pip install -U langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."
export OPENAI_API_KEY="sk-..."
from langsmith import traceable
from langsmith.wrappers import wrap_openai
from openai import OpenAI
client = wrap_openai(OpenAI())
@traceable
def rag_pipeline(question: str) -> str:
"""自动在 LangSmith 中追踪 (Automatically traced in LangSmith)"""
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": question}]
)
return response.choices[0].message.content
result = rag_pipeline("What is LangSmith?")
TypeScript
npm install langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."
import { traceable } from "langsmith/traceable";
import { wrapOpenAI } from "langsmith/wrappers";
import { OpenAI } from "openai";
const client = wrapOpenAI(new OpenAI());
const pipeline = traceable(async (question: string): Promise<string> => {
const res = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: question }],
});
return res.choices[0].message.content ?? "";
}, { name: "RAG Pipeline" });
await pipeline("What is LangSmith?");
核心概念 (Core Concepts)
| 概念 (Concept) | 描述 (Description) |
|---|---|
| 运行 (Run) | 单个操作(LLM 调用、工具调用、检索)。是基本单位。 |
| 追踪 (Trace) | 来自单个用户请求的所有运行,通过 trace_id 关联。 |
| 线程 (Thread) | 对话中的多个追踪,通过 session_id 或 thread_id 关联。 |
| 项目 (Project) | 分组相关追踪的容器(通过 LANGSMITH_PROJECT 设置)。 |
| 数据集 (Dataset) | 用于离线评估的 {inputs, outputs} 示例集合。 |
| 实验 (Experiment) | 针对数据集运行 evaluate() 产生的结果集。 |
| 反馈 (Feedback) | 附加到运行的评分/标签——数值型、分类型或自由格式。 |
追踪 (Tracing)
@traceable 装饰器 (Python)
from langsmith import traceable
@traceable(
run_type="chain", # llm | chain | tool | retriever | embedding
name="My Pipeline",
tags=["production", "v2"],
metadata={"version": "2.1", "env": "prod"},
project_name="my-project"
)
def pipeline(question: str) -> str:
return generate_answer(question)
选择性追踪上下文 (Selective tracing context)
import langsmith as ls
# 仅为当前代码块启用追踪 (Enable tracing for this block only)
with ls.tracing_context(enabled=True, project_name="debug"):
result = chain.invoke({"input": "..."})
# 尽管 LANGSMITH_TRACING=true,也禁用追踪 (Disable tracing despite LANGSMITH_TRACING=true)
with ls.tracing_context(enabled=False):
result = chain.invoke({"input": "..."})
包装提供商客户端 (Wrap provider clients)
from langsmith.wrappers import wrap_openai, wrap_anthropic
from openai import OpenAI
import anthropic
openai_client = wrap_openai(OpenAI()) # 所有调用自动追踪 (All calls auto-traced)
anthropic_client = wrap_anthropic(anthropic.Anthropic())
分布式追踪(微服务) (Distributed tracing (microservices))
from langsmith.run_helpers import get_current_run_tree
import langsmith
@langsmith.traceable
def service_a(inputs):
rt = get_current_run_tree()
headers = rt.to_headers() # 传递给子服务 (Pass to child service)
return call_service_b(headers=headers)
@langsmith.traceable
def service_b(x, headers):
with langsmith.tracing_context(parent=headers):
return process(x)
评估 (Evaluation)
使用 evaluate() 进行基础评估 (Basic evaluation with evaluate())
from langsmith import Client
from langsmith.wrappers import wrap_openai
from openai import OpenAI
client = Client()
oai = wrap_openai(OpenAI())
# 1. 创建数据集 (Create dataset)
dataset = client.create_dataset("Geography QA")
client.create_examples(
dataset_id=dataset.id,
examples=[
{"inputs": {"q": "Capital of France?"}, "outputs": {"a": "Paris"}},
{"inputs": {"q": "Capital of Germany?"}, "outputs": {"a": "Berlin"}},
]
)
# 2. 目标函数 (Target function)
def target(inputs: dict) -> dict:
res = oai.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": inputs["q"]}]
)
return {"a": res.choices[0].message.content}
# 3. 评估器 (Evaluator)
def exact_match(inputs, outputs, reference_outputs):
return outputs["a"].strip().lower() == reference_outputs["a"].strip().lower()
# 4. 运行实验 (Run experiment)
results = client.evaluate(
target,
data="Geography QA",
evaluators=[exact_match],
experiment_prefix="gpt-4o-mini-v1",
max_concurrency=4
)
使用 openevals 进行 LLM 即评判 (LLM-as-judge with openevals)
pip install -U openevals
from openevals.llm import create_llm_as_judge
from openevals.prompts import CORRECTNESS_PROMPT
judge = create_llm_as_judge(
prompt=CORRECTNESS_PROMPT,
model="openai:o3-mini",
feedback_key="correctness",
)
results = client.evaluate(target, data="my-dataset", evaluators=[judge])
评估类型 (Evaluation types)
| 类型 (Type) | 使用场景 (When to use) |
|---|---|
| 代码/启发式 (Code/Heuristic) | 精确匹配、格式检查、基于规则 (Exact match, format checks, rule-based) |
| LLM 即评判 (LLM-as-judge) | 主观质量、安全性、无参考评分 (Subjective quality, safety, reference-free) |
| 人工 (Human) | 标注队列、成对比较 (Annotation queues, pairwise comparison) |
| 成对 (Pairwise) | 比较两个应用版本 (Compare two app versions) |
| 在线 (Online) | 生产追踪、真实流量 (Production traces, real traffic) |
提示中心 (Prompt Hub)
from langsmith import Client
from langchain_core.prompts import ChatPromptTemplate
client = Client()
# 推送提示 (Push a prompt)
prompt = ChatPromptTemplate([
("system", "You are a helpful assistant."),
("user", "{question}"),
])
client.push_prompt("my-assistant-prompt", object=prompt)
# 拉取并使用 (Pull and use)
prompt = client.pull_prompt("my-assistant-prompt")
# 拉取特定版本 (Pull specific version):
prompt = client.pull_prompt("my-assistant-prompt:abc123")
反馈 (Feedback)
from langsmith import Client
import uuid
client = Client()
# 为后续反馈链接自定义运行 ID (Custom run ID for later feedback linking)
my_run_id = str(uuid.uuid4())
result = chain.invoke({"input": "..."}, {"run_id": my_run_id})
# 附加反馈 (Attach feedback)
client.create_feedback(
key="correctness",
score=1, # 0-1 数值或分类 (0-1 numeric or categorical)
run_id=my_run_id,
comment="Accurate and concise"
)
参考资料 (References)
- Python SDK 参考 (Python SDK Reference) — 完整的 Client API、@traceable 签名、evaluate()
- TypeScript SDK 参考 (TypeScript SDK Reference) — Client、traceable、wrappers、evaluate
- CLI 参考 (CLI Reference) — langsmith CLI 命令 (langsmith CLI commands)
- 官方文档 (Official Docs) — langchain.com/langsmith
- SDK GitHub — MIT License, v0.7.17
- openevals — 预构建的 LLM 评估器 (Prebuilt LLM evaluators)
📄 原始文档
完整文档(英文):
https://skills.sh/supercent-io/skills-template/langsmith
💡 提示:点击上方链接查看 skills.sh 原始英文文档,方便对照翻译。
声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。

评论(0)