🚀 快速安装

复制以下命令并运行,立即安装此 Skill:

npx skills add https://skills.sh/aradotso/trending-skills/agent-browser-automation

💡 提示:需要 Node.js 和 NPM

agent-browser

技能来自 ara.so — Daily 2026 Skills 合集

agent-browser 是一个用 Rust 构建的无头浏览器自动化 CLI,专为 AI 智能体设计。它通过 Chrome DevTools 协议(CDP)包装 Chrome,并提供了一个快速、符合人体工程学的命令行界面,用于导航、交互、无障碍快照、截图、网络拦截等操作——无需 Node.js 或 Playwright 运行时。

安装 (Installation)

推荐(npm 全局安装)(Recommended – npm global)

npm install -g agent-browser
agent-browser install  # 首次下载测试用 Chrome (Download Chrome for Testing - first time only)

macOS(Homebrew)(macOS – Homebrew)

brew install agent-browser
agent-browser install

Rust / Cargo

cargo install agent-browser
agent-browser install

本地项目依赖 (Local project dependency)

npm install agent-browser
# 添加到 package.json 脚本或通过 npx 调用 (Add to package.json scripts or invoke via npx)

Linux(带系统依赖)(Linux – with system dependencies)

agent-browser install --with-deps

快速开始 (Quick Start)

agent-browser open https://example.com
agent-browser snapshot                        # 带 @ref 引用的无障碍树(最适合 AI)(Accessibility tree with @refs - best for AI)
agent-browser click @e2                       # 通过快照中的引用点击 (Click by ref from snapshot)
agent-browser fill @e3 "hello@example.com"   # 通过引用填充 (Fill by ref)
agent-browser get text @e1                    # 获取文本内容 (Get text content)
agent-browser screenshot page.png
agent-browser close

核心命令 (Core Commands)

导航 (Navigation)

agent-browser open <url>           # 导航(别名:goto, navigate)(Navigate - aliases: goto, navigate)
agent-browser get url              # 获取当前 URL (Get current URL)
agent-browser get title            # 获取页面标题 (Get page title)
agent-browser close                # 关闭浏览器(别名:quit, exit)(Close browser - aliases: quit, exit)

无障碍快照(推荐用于 AI 智能体)(Accessibility Snapshot – recommended for AI agents)

agent-browser snapshot             # 返回带 @ref ID 的无障碍树 (Returns accessibility tree with @ref IDs)
agent-browser snapshot -i          # 交互式 / 紧凑模式 (Interactive / compact mode)

快照输出包含可直接使用的 @eN 引用:

@e1 [button] "提交 (Submit)"
@e2 [textbox] "邮箱 (Email)" value=""
@e3 [link] "登录 (Sign in)"

然后对它们进行操作:

agent-browser fill @e2 "user@example.com"
agent-browser click @e1

交互 (Interaction)

agent-browser click <sel>                     # 点击元素 (Click element)
agent-browser dblclick <sel>                  # 双击 (Double-click)
agent-browser fill <sel> <text>               # 清空并填充输入框 (Clear and fill input)
agent-browser type <sel> <text>               # 向元素输入 (Type into element)
agent-browser press <key>                     # 按键(Enter, Tab, Control+a)(Press key - Enter, Tab, Control+a)
agent-browser keyboard type <text>            # 在当前焦点处输入(真实按键)(Type at current focus - real keystrokes)
agent-browser keyboard inserttext <text>      # 插入文本而不触发按键事件 (Insert text without key events)
agent-browser hover <sel>                     # 悬停元素 (Hover element)
agent-browser select <sel> <value>            # 选择下拉选项 (Select dropdown option)
agent-browser check <sel>                     # 勾选复选框 (Check checkbox)
agent-browser uncheck <sel>                   # 取消勾选复选框 (Uncheck checkbox)
agent-browser scroll down 500                 # 滚动(上/下/左/右,可选像素)(Scroll - up/down/left/right, optional px)
agent-browser scroll down --selector "#feed"  # 在元素内滚动 (Scroll within element)
agent-browser scrollintoview <sel>            # 将元素滚动到可视区域 (Scroll element into view)
agent-browser drag <src> <target>             # 拖放 (Drag and drop)
agent-browser upload <sel> /path/file.pdf     # 上传文件 (Upload file)

截图与 PDF (Screenshots & PDF)

agent-browser screenshot                          # 保存到临时目录,打印路径 (Save to temp dir, print path)
agent-browser screenshot page.png                 # 保存到路径 (Save to path)
agent-browser screenshot --full page.png          # 全页截图 (Full-page screenshot)
agent-browser screenshot --annotate               # 叠加编号元素标签 (Numbered element labels overlay)
agent-browser screenshot --screenshot-dir ./shots # 自定义输出目录 (Custom output directory)
agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80
agent-browser pdf output.pdf                      # 将页面保存为 PDF (Save page as PDF)

获取元素信息 (Getting Element Info)

agent-browser get text <sel>           # 文本内容 (Text content)
agent-browser get html <sel>           # innerHTML
agent-browser get value <sel>          # 输入框值 (Input value)
agent-browser get attr <sel> <attr>    # 属性值 (Attribute value)
agent-browser get count <sel>          # 匹配元素数量 (Count matching elements)
agent-browser get box <sel>            # 边界框 (Bounding box)
agent-browser get styles <sel>         # 计算样式 (Computed styles)
agent-browser get cdp-url              # CDP WebSocket URL

状态检查 (State Checks)

agent-browser is visible <sel>
agent-browser is enabled <sel>
agent-browser is checked <sel>

语义定位器(查找)(Semantic Locators – find)

agent-browser find role button click --name "提交 (Submit)"
agent-browser find text "登录 (Sign In)" click
agent-browser find label "邮箱 (Email)" fill "test@example.com"
agent-browser find placeholder "搜索..." fill "rust"
agent-browser find testid "login-btn" click
agent-browser find first ".item" click
agent-browser find nth 2 "a" text
agent-browser find role textbox fill "hello" --name "用户名 (Username)"

操作 (Actions): click, fill, type, hover, focus, check, uncheck, text

等待 (Waiting)

agent-browser wait "#modal"                          # 等待元素可见 (Wait for element visible)
agent-browser wait 2000                              # 等待 N 毫秒 (Wait N milliseconds)
agent-browser wait --text "欢迎回来 (Welcome back)"             # 等待文本出现 (Wait for text)
agent-browser wait --url "**/dashboard"              # 等待 URL 模式匹配 (Wait for URL pattern)
agent-browser wait --load networkidle                # 等待加载状态 (Wait for load state)
agent-browser wait --fn "window.appReady === true"   # 等待 JS 条件成立 (Wait for JS condition)
agent-browser wait "#spinner" --state hidden         # 等待元素消失 (Wait for element to disappear)

加载状态 (Load states): load, domcontentloaded, networkidle

JavaScript 执行 (JavaScript Eval)

agent-browser eval "document.title"
agent-browser eval "JSON.stringify(window.__STATE__)"
agent-browser eval -b "BASE64_ENCODED_JS"
echo "return document.body.innerHTML" | agent-browser eval --stdin

批量执行(高效多步骤)(Batch Execution – efficient multi-step)

echo '[
  ["open", "https://example.com"],
  ["snapshot", "-i"],
  ["fill", "@e2", "user@example.com"],
  ["click", "@e1"],
  ["screenshot", "result.png"]
]' | agent-browser batch --json

# 在第一次失败时停止 (Stop on first failure)
agent-browser batch --bail < commands.json

标签页 & 框架 (Tabs & Frames)

agent-browser tab                    # 列出标签页 (List tabs)
agent-browser tab new https://...    # 新建标签页并打开 URL (New tab with URL)
agent-browser tab 2                  # 切换到标签页 2 (Switch to tab 2)
agent-browser tab close              # 关闭当前标签页 (Close current tab)
agent-browser frame "#my-iframe"     # 切换到 iframe 内 (Switch into iframe)
agent-browser frame main             # 返回主框架 (Return to main frame)

Cookies & 存储 (Cookies & Storage)

agent-browser cookies
agent-browser cookies set session_id "abc123"
agent-browser cookies clear

agent-browser storage local
agent-browser storage local set theme dark
agent-browser storage local clear
agent-browser storage session set cart '{"items":[]}'

网络 (Network)

agent-browser network route "**/api/users" --body '{"users":[]}'  # 模拟响应 (Mock response)
agent-browser network route "**/ads/**" --abort                    # 阻止请求 (Block requests)
agent-browser network unroute                                       # 移除所有路由 (Remove all routes)
agent-browser network requests --filter api                        # 查看请求 (View requests)
agent-browser network har start
agent-browser network har stop recording.har

浏览器设置 (Browser Settings)

agent-browser set viewport 1280 800
agent-browser set viewport 375 812 2        # 带设备像素比(视网膜)(With device pixel ratio - retina)
agent-browser set device "iPhone 14"
agent-browser set geo 37.7749 -122.4194
agent-browser set offline on
agent-browser set headers '{"X-Custom":"value"}'
agent-browser set credentials admin secret
agent-browser set media dark

认证状态 (Auth State)

agent-browser state save ./auth.json    # 保存 cookies + localStorage
agent-browser state load ./auth.json    # 恢复认证状态 (Restore auth state)
agent-browser state list                # 列出已保存的状态 (List saved states)
agent-browser state show auth.json      # 显示已保存状态的摘要 (Summary of saved state)

对话框 (Dialogs)

agent-browser dialog accept             # 接受 alert/confirm/prompt
agent-browser dialog accept "My input"  # 接受 prompt 并输入文本 (Accept prompt with text)
agent-browser dialog dismiss

剪贴板 (Clipboard)

agent-browser clipboard read
agent-browser clipboard write "Hello, World!"
agent-browser clipboard copy           # Ctrl+C 当前选区 (Ctrl+C current selection)
agent-browser clipboard paste          # Ctrl+V

差异与视觉测试 (Diff & Visual Testing)

agent-browser diff snapshot                                  # 与上次快照比较 (vs last snapshot)
agent-browser diff snapshot --baseline before.txt            # 与已保存文件比较 (vs saved file)
agent-browser diff snapshot --selector "#main" --compact
agent-browser diff screenshot --baseline before.png
agent-browser diff screenshot --baseline b.png -o diff.png
agent-browser diff url https://v1.example.com https://v2.example.com
agent-browser diff url https://v1.example.com https://v2.example.com --screenshot
agent-browser diff url https://v1.example.com https://v2.example.com --selector "#content"

调试与性能分析 (Debug & Profiling)

agent-browser trace start trace.zip
agent-browser trace stop
agent-browser profiler start
agent-browser profiler stop profile.json
agent-browser console                  # 查看控制台消息 (View console messages)
agent-browser errors                   # 查看未捕获的 JS 异常 (View uncaught JS exceptions)
agent-browser highlight "#button"      # 在视觉上高亮元素 (Visually highlight element)
agent-browser inspect                  # 打开 Chrome 开发者工具 (Open Chrome DevTools)
agent-browser connect 9222             # 通过 CDP 端口连接到现有浏览器 (Connect to existing browser via CDP port)

常见模式 (Common Patterns)

登录流程并保存会话 (Login flow and save session)

#!/bin/bash
agent-browser open https://app.example.com/login
agent-browser fill "#email" "$LOGIN_EMAIL"
agent-browser fill "#password" "$LOGIN_PASSWORD"
agent-browser click "[type=submit]"
agent-browser wait --url "**/dashboard"
agent-browser state save ./session.json

AI 智能体循环,基于快照驱动交互 (AI agent loop with snapshot-driven interaction)

#!/bin/bash
agent-browser open https://app.example.com
agent-browser state load ./session.json

# 获取快照,解析 @refs,执行操作 (Get snapshot, parse @refs, act)
SNAPSHOT=$(agent-browser snapshot)
echo "$SNAPSHOT"

# 智能体判断 @e5 是搜索框 (Agent determines @e5 is the search box)
agent-browser fill @e5 "季度报告 (quarterly report)"
agent-browser press Enter
agent-browser wait --load networkidle
agent-browser snapshot
agent-browser screenshot results.png

从脚本进行批量命令(JSON)(Batch commands from a script – JSON)

cat > commands.json << 'EOF'
[
  ["open", "https://news.ycombinator.com"],
  ["wait", "--load", "networkidle"],
  ["get", "title"],
  ["snapshot"],
  ["screenshot", "hn.png"]
]
EOF

agent-browser batch --json < commands.json

使用模拟网络进行抓取 (Scrape with mocked network)

agent-browser open https://api-heavy-app.example.com
agent-browser network route "**/api/slow-endpoint" --body '{"data":"mocked"}'
agent-browser snapshot
agent-browser network unroute

带注释的全页截图 (Full-page screenshot with annotations)

agent-browser open https://example.com
agent-browser wait --load networkidle
agent-browser screenshot --full --annotate annotated.png

连接到已运行的 Chrome (Connect to already-running Chrome)

# 以远程调试模式启动 Chrome (Start Chrome with remote debugging)
google-chrome --remote-debugging-port=9222 &

agent-browser connect 9222
agent-browser open https://example.com
agent-browser snapshot

模拟移动设备 (Emulate mobile device)

agent-browser set device "iPhone 14"
agent-browser open https://example.com
agent-browser screenshot mobile.png

用于网络分析的 HAR 录制 (HAR recording for network analysis)

agent-browser open https://example.com
agent-browser network har start
agent-browser click "#load-data"
agent-browser wait --load networkidle
agent-browser network har stop session.har

选择器参考 (Selector Reference)

格式 (Format) 示例 (Example) 说明 (Notes)
@ref @e1, @e12 来自 snapshot 输出 — 适合 AI 使用 (From snapshot output — preferred for AI)
CSS #id, .class, [attr=val] 标准 CSS 选择器 (Standard CSS selectors)
文本 (Text) "登录 (Sign In)" 精确文本匹配 (Exact text match)
XPath //button[@type='submit'] 完整 XPath (Full XPath)

故障排除 (Troubleshooting)

未找到 Chrome (Chrome not found)

agent-browser install              # 下载测试用 Chrome (Downloads Chrome for Testing)
agent-browser install --with-deps  # Linux: 同时安装系统库 (also installs system libs)

未找到元素 / 时序问题 (Element not found / timing issues)

agent-browser wait "#my-element"              # 首先等待可见 (Wait for visibility first)
agent-browser wait --load networkidle         # 等待页面稳定 (Wait for page to settle)
agent-browser wait --fn "!!document.querySelector('#app')"

选择器问题 — 改用快照引用 (Selector issues — use snapshot refs instead)

# 不要使用脆弱的 CSS (Instead of fragile CSS):
agent-browser click ".btn.btn-primary.submit-form"

# 使用快照引用 (Use snapshot refs):
agent-browser snapshot  # 找到 @e7 = [button] "提交" (Find @e7 = [button] "Submit")
agent-browser click @e7

调试页面内容 (Debug what’s on the page)

agent-browser screenshot debug.png        # 视觉检查 (Visual check)
agent-browser snapshot                    # 无障碍树 (Accessibility tree)
agent-browser console                     # JS 控制台输出 (JS console output)
agent-browser errors                      # 未捕获的异常 (Uncaught exceptions)
agent-browser eval "document.readyState"

会话间的认证问题 (Auth issues between sessions)

agent-browser state save ./auth.json   # 登录成功后 (After successful login)
agent-browser state load ./auth.json   # 下次会话开始时 (At start of next session)

处理警报/对话框 (Handling alerts/dialogs)

# 在执行触发对话框的操作前设置处理器 (Set up handler BEFORE the action that triggers dialog)
agent-browser dialog accept
agent-browser click "#delete-button"

性能 — 对多步骤工作流使用批量模式 (Performance — use batch for multi-step workflows)

# 慢速:每个命令一个进程 (Slow: one process per command)
agent-browser open https://example.com
agent-browser fill "#q" "search"
agent-browser click "#submit"

# 快速:单个进程,多个命令 (Fast: single process, multiple commands)
echo '[["open","https://example.com"],["fill","#q","search"],["click","#submit"]]' \
  | agent-browser batch --json

📄 原始文档

完整文档(英文):

https://skills.sh/aradotso/trending-skills/agent-browser-automation

💡 提示:点击上方链接查看 skills.sh 原始英文文档,方便对照翻译。

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。