🚀 快速安装
复制以下命令并运行,立即安装此 Skill:
npx skills add https://skills.sh/aradotso/trending-skills/agent-browser-automation
💡 提示:需要 Node.js 和 NPM
agent-browser
技能来自 ara.so — Daily 2026 Skills 合集
agent-browser 是一个用 Rust 构建的无头浏览器自动化 CLI,专为 AI 智能体设计。它通过 Chrome DevTools 协议(CDP)包装 Chrome,并提供了一个快速、符合人体工程学的命令行界面,用于导航、交互、无障碍快照、截图、网络拦截等操作——无需 Node.js 或 Playwright 运行时。
安装 (Installation)
推荐(npm 全局安装)(Recommended – npm global)
npm install -g agent-browser
agent-browser install # 首次下载测试用 Chrome (Download Chrome for Testing - first time only)
macOS(Homebrew)(macOS – Homebrew)
brew install agent-browser
agent-browser install
Rust / Cargo
cargo install agent-browser
agent-browser install
本地项目依赖 (Local project dependency)
npm install agent-browser
# 添加到 package.json 脚本或通过 npx 调用 (Add to package.json scripts or invoke via npx)
Linux(带系统依赖)(Linux – with system dependencies)
agent-browser install --with-deps
快速开始 (Quick Start)
agent-browser open https://example.com
agent-browser snapshot # 带 @ref 引用的无障碍树(最适合 AI)(Accessibility tree with @refs - best for AI)
agent-browser click @e2 # 通过快照中的引用点击 (Click by ref from snapshot)
agent-browser fill @e3 "hello@example.com" # 通过引用填充 (Fill by ref)
agent-browser get text @e1 # 获取文本内容 (Get text content)
agent-browser screenshot page.png
agent-browser close
核心命令 (Core Commands)
导航 (Navigation)
agent-browser open <url> # 导航(别名:goto, navigate)(Navigate - aliases: goto, navigate)
agent-browser get url # 获取当前 URL (Get current URL)
agent-browser get title # 获取页面标题 (Get page title)
agent-browser close # 关闭浏览器(别名:quit, exit)(Close browser - aliases: quit, exit)
无障碍快照(推荐用于 AI 智能体)(Accessibility Snapshot – recommended for AI agents)
agent-browser snapshot # 返回带 @ref ID 的无障碍树 (Returns accessibility tree with @ref IDs)
agent-browser snapshot -i # 交互式 / 紧凑模式 (Interactive / compact mode)
快照输出包含可直接使用的 @eN 引用:
@e1 [button] "提交 (Submit)"
@e2 [textbox] "邮箱 (Email)" value=""
@e3 [link] "登录 (Sign in)"
然后对它们进行操作:
agent-browser fill @e2 "user@example.com"
agent-browser click @e1
交互 (Interaction)
agent-browser click <sel> # 点击元素 (Click element)
agent-browser dblclick <sel> # 双击 (Double-click)
agent-browser fill <sel> <text> # 清空并填充输入框 (Clear and fill input)
agent-browser type <sel> <text> # 向元素输入 (Type into element)
agent-browser press <key> # 按键(Enter, Tab, Control+a)(Press key - Enter, Tab, Control+a)
agent-browser keyboard type <text> # 在当前焦点处输入(真实按键)(Type at current focus - real keystrokes)
agent-browser keyboard inserttext <text> # 插入文本而不触发按键事件 (Insert text without key events)
agent-browser hover <sel> # 悬停元素 (Hover element)
agent-browser select <sel> <value> # 选择下拉选项 (Select dropdown option)
agent-browser check <sel> # 勾选复选框 (Check checkbox)
agent-browser uncheck <sel> # 取消勾选复选框 (Uncheck checkbox)
agent-browser scroll down 500 # 滚动(上/下/左/右,可选像素)(Scroll - up/down/left/right, optional px)
agent-browser scroll down --selector "#feed" # 在元素内滚动 (Scroll within element)
agent-browser scrollintoview <sel> # 将元素滚动到可视区域 (Scroll element into view)
agent-browser drag <src> <target> # 拖放 (Drag and drop)
agent-browser upload <sel> /path/file.pdf # 上传文件 (Upload file)
截图与 PDF (Screenshots & PDF)
agent-browser screenshot # 保存到临时目录,打印路径 (Save to temp dir, print path)
agent-browser screenshot page.png # 保存到路径 (Save to path)
agent-browser screenshot --full page.png # 全页截图 (Full-page screenshot)
agent-browser screenshot --annotate # 叠加编号元素标签 (Numbered element labels overlay)
agent-browser screenshot --screenshot-dir ./shots # 自定义输出目录 (Custom output directory)
agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80
agent-browser pdf output.pdf # 将页面保存为 PDF (Save page as PDF)
获取元素信息 (Getting Element Info)
agent-browser get text <sel> # 文本内容 (Text content)
agent-browser get html <sel> # innerHTML
agent-browser get value <sel> # 输入框值 (Input value)
agent-browser get attr <sel> <attr> # 属性值 (Attribute value)
agent-browser get count <sel> # 匹配元素数量 (Count matching elements)
agent-browser get box <sel> # 边界框 (Bounding box)
agent-browser get styles <sel> # 计算样式 (Computed styles)
agent-browser get cdp-url # CDP WebSocket URL
状态检查 (State Checks)
agent-browser is visible <sel>
agent-browser is enabled <sel>
agent-browser is checked <sel>
语义定位器(查找)(Semantic Locators – find)
agent-browser find role button click --name "提交 (Submit)"
agent-browser find text "登录 (Sign In)" click
agent-browser find label "邮箱 (Email)" fill "test@example.com"
agent-browser find placeholder "搜索..." fill "rust"
agent-browser find testid "login-btn" click
agent-browser find first ".item" click
agent-browser find nth 2 "a" text
agent-browser find role textbox fill "hello" --name "用户名 (Username)"
操作 (Actions): click, fill, type, hover, focus, check, uncheck, text
等待 (Waiting)
agent-browser wait "#modal" # 等待元素可见 (Wait for element visible)
agent-browser wait 2000 # 等待 N 毫秒 (Wait N milliseconds)
agent-browser wait --text "欢迎回来 (Welcome back)" # 等待文本出现 (Wait for text)
agent-browser wait --url "**/dashboard" # 等待 URL 模式匹配 (Wait for URL pattern)
agent-browser wait --load networkidle # 等待加载状态 (Wait for load state)
agent-browser wait --fn "window.appReady === true" # 等待 JS 条件成立 (Wait for JS condition)
agent-browser wait "#spinner" --state hidden # 等待元素消失 (Wait for element to disappear)
加载状态 (Load states): load, domcontentloaded, networkidle
JavaScript 执行 (JavaScript Eval)
agent-browser eval "document.title"
agent-browser eval "JSON.stringify(window.__STATE__)"
agent-browser eval -b "BASE64_ENCODED_JS"
echo "return document.body.innerHTML" | agent-browser eval --stdin
批量执行(高效多步骤)(Batch Execution – efficient multi-step)
echo '[
["open", "https://example.com"],
["snapshot", "-i"],
["fill", "@e2", "user@example.com"],
["click", "@e1"],
["screenshot", "result.png"]
]' | agent-browser batch --json
# 在第一次失败时停止 (Stop on first failure)
agent-browser batch --bail < commands.json
标签页 & 框架 (Tabs & Frames)
agent-browser tab # 列出标签页 (List tabs)
agent-browser tab new https://... # 新建标签页并打开 URL (New tab with URL)
agent-browser tab 2 # 切换到标签页 2 (Switch to tab 2)
agent-browser tab close # 关闭当前标签页 (Close current tab)
agent-browser frame "#my-iframe" # 切换到 iframe 内 (Switch into iframe)
agent-browser frame main # 返回主框架 (Return to main frame)
Cookies & 存储 (Cookies & Storage)
agent-browser cookies
agent-browser cookies set session_id "abc123"
agent-browser cookies clear
agent-browser storage local
agent-browser storage local set theme dark
agent-browser storage local clear
agent-browser storage session set cart '{"items":[]}'
网络 (Network)
agent-browser network route "**/api/users" --body '{"users":[]}' # 模拟响应 (Mock response)
agent-browser network route "**/ads/**" --abort # 阻止请求 (Block requests)
agent-browser network unroute # 移除所有路由 (Remove all routes)
agent-browser network requests --filter api # 查看请求 (View requests)
agent-browser network har start
agent-browser network har stop recording.har
浏览器设置 (Browser Settings)
agent-browser set viewport 1280 800
agent-browser set viewport 375 812 2 # 带设备像素比(视网膜)(With device pixel ratio - retina)
agent-browser set device "iPhone 14"
agent-browser set geo 37.7749 -122.4194
agent-browser set offline on
agent-browser set headers '{"X-Custom":"value"}'
agent-browser set credentials admin secret
agent-browser set media dark
认证状态 (Auth State)
agent-browser state save ./auth.json # 保存 cookies + localStorage
agent-browser state load ./auth.json # 恢复认证状态 (Restore auth state)
agent-browser state list # 列出已保存的状态 (List saved states)
agent-browser state show auth.json # 显示已保存状态的摘要 (Summary of saved state)
对话框 (Dialogs)
agent-browser dialog accept # 接受 alert/confirm/prompt
agent-browser dialog accept "My input" # 接受 prompt 并输入文本 (Accept prompt with text)
agent-browser dialog dismiss
剪贴板 (Clipboard)
agent-browser clipboard read
agent-browser clipboard write "Hello, World!"
agent-browser clipboard copy # Ctrl+C 当前选区 (Ctrl+C current selection)
agent-browser clipboard paste # Ctrl+V
差异与视觉测试 (Diff & Visual Testing)
agent-browser diff snapshot # 与上次快照比较 (vs last snapshot)
agent-browser diff snapshot --baseline before.txt # 与已保存文件比较 (vs saved file)
agent-browser diff snapshot --selector "#main" --compact
agent-browser diff screenshot --baseline before.png
agent-browser diff screenshot --baseline b.png -o diff.png
agent-browser diff url https://v1.example.com https://v2.example.com
agent-browser diff url https://v1.example.com https://v2.example.com --screenshot
agent-browser diff url https://v1.example.com https://v2.example.com --selector "#content"
调试与性能分析 (Debug & Profiling)
agent-browser trace start trace.zip
agent-browser trace stop
agent-browser profiler start
agent-browser profiler stop profile.json
agent-browser console # 查看控制台消息 (View console messages)
agent-browser errors # 查看未捕获的 JS 异常 (View uncaught JS exceptions)
agent-browser highlight "#button" # 在视觉上高亮元素 (Visually highlight element)
agent-browser inspect # 打开 Chrome 开发者工具 (Open Chrome DevTools)
agent-browser connect 9222 # 通过 CDP 端口连接到现有浏览器 (Connect to existing browser via CDP port)
常见模式 (Common Patterns)
登录流程并保存会话 (Login flow and save session)
#!/bin/bash
agent-browser open https://app.example.com/login
agent-browser fill "#email" "$LOGIN_EMAIL"
agent-browser fill "#password" "$LOGIN_PASSWORD"
agent-browser click "[type=submit]"
agent-browser wait --url "**/dashboard"
agent-browser state save ./session.json
AI 智能体循环,基于快照驱动交互 (AI agent loop with snapshot-driven interaction)
#!/bin/bash
agent-browser open https://app.example.com
agent-browser state load ./session.json
# 获取快照,解析 @refs,执行操作 (Get snapshot, parse @refs, act)
SNAPSHOT=$(agent-browser snapshot)
echo "$SNAPSHOT"
# 智能体判断 @e5 是搜索框 (Agent determines @e5 is the search box)
agent-browser fill @e5 "季度报告 (quarterly report)"
agent-browser press Enter
agent-browser wait --load networkidle
agent-browser snapshot
agent-browser screenshot results.png
从脚本进行批量命令(JSON)(Batch commands from a script – JSON)
cat > commands.json << 'EOF'
[
["open", "https://news.ycombinator.com"],
["wait", "--load", "networkidle"],
["get", "title"],
["snapshot"],
["screenshot", "hn.png"]
]
EOF
agent-browser batch --json < commands.json
使用模拟网络进行抓取 (Scrape with mocked network)
agent-browser open https://api-heavy-app.example.com
agent-browser network route "**/api/slow-endpoint" --body '{"data":"mocked"}'
agent-browser snapshot
agent-browser network unroute
带注释的全页截图 (Full-page screenshot with annotations)
agent-browser open https://example.com
agent-browser wait --load networkidle
agent-browser screenshot --full --annotate annotated.png
连接到已运行的 Chrome (Connect to already-running Chrome)
# 以远程调试模式启动 Chrome (Start Chrome with remote debugging)
google-chrome --remote-debugging-port=9222 &
agent-browser connect 9222
agent-browser open https://example.com
agent-browser snapshot
模拟移动设备 (Emulate mobile device)
agent-browser set device "iPhone 14"
agent-browser open https://example.com
agent-browser screenshot mobile.png
用于网络分析的 HAR 录制 (HAR recording for network analysis)
agent-browser open https://example.com
agent-browser network har start
agent-browser click "#load-data"
agent-browser wait --load networkidle
agent-browser network har stop session.har
选择器参考 (Selector Reference)
| 格式 (Format) | 示例 (Example) | 说明 (Notes) |
|---|---|---|
@ref |
@e1, @e12 |
来自 snapshot 输出 — 适合 AI 使用 (From snapshot output — preferred for AI) |
| CSS | #id, .class, [attr=val] |
标准 CSS 选择器 (Standard CSS selectors) |
| 文本 (Text) | "登录 (Sign In)" |
精确文本匹配 (Exact text match) |
| XPath | //button[@type='submit'] |
完整 XPath (Full XPath) |
故障排除 (Troubleshooting)
未找到 Chrome (Chrome not found)
agent-browser install # 下载测试用 Chrome (Downloads Chrome for Testing)
agent-browser install --with-deps # Linux: 同时安装系统库 (also installs system libs)
未找到元素 / 时序问题 (Element not found / timing issues)
agent-browser wait "#my-element" # 首先等待可见 (Wait for visibility first)
agent-browser wait --load networkidle # 等待页面稳定 (Wait for page to settle)
agent-browser wait --fn "!!document.querySelector('#app')"
选择器问题 — 改用快照引用 (Selector issues — use snapshot refs instead)
# 不要使用脆弱的 CSS (Instead of fragile CSS):
agent-browser click ".btn.btn-primary.submit-form"
# 使用快照引用 (Use snapshot refs):
agent-browser snapshot # 找到 @e7 = [button] "提交" (Find @e7 = [button] "Submit")
agent-browser click @e7
调试页面内容 (Debug what’s on the page)
agent-browser screenshot debug.png # 视觉检查 (Visual check)
agent-browser snapshot # 无障碍树 (Accessibility tree)
agent-browser console # JS 控制台输出 (JS console output)
agent-browser errors # 未捕获的异常 (Uncaught exceptions)
agent-browser eval "document.readyState"
会话间的认证问题 (Auth issues between sessions)
agent-browser state save ./auth.json # 登录成功后 (After successful login)
agent-browser state load ./auth.json # 下次会话开始时 (At start of next session)
处理警报/对话框 (Handling alerts/dialogs)
# 在执行触发对话框的操作前设置处理器 (Set up handler BEFORE the action that triggers dialog)
agent-browser dialog accept
agent-browser click "#delete-button"
性能 — 对多步骤工作流使用批量模式 (Performance — use batch for multi-step workflows)
# 慢速:每个命令一个进程 (Slow: one process per command)
agent-browser open https://example.com
agent-browser fill "#q" "search"
agent-browser click "#submit"
# 快速:单个进程,多个命令 (Fast: single process, multiple commands)
echo '[["open","https://example.com"],["fill","#q","search"],["click","#submit"]]' \
| agent-browser batch --json
📄 原始文档
完整文档(英文):
https://skills.sh/aradotso/trending-skills/agent-browser-automation
💡 提示:点击上方链接查看 skills.sh 原始英文文档,方便对照翻译。
声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。

评论(0)