Browser automation has a
new pecking order.
Five tools now define how software teams and AI agents control browsers in 2026: Microsoft WebWright, browser-harness, Playwright, Puppeteer, and Selenium. The right choice depends on whether you are writing tests, building AI agents, or doing both. This report maps each tool's architecture, AI integration depth, benchmarks, and enterprise readiness so your team can make the call.
Five tools, three paradigms, one decision.
Browser automation has fractured into three distinct paradigms in 2026. The first is the scripted testing layer, where Playwright and Selenium live: deterministic, developer-authored, and optimized for CI/CD pipelines. The second is the AI agent layer, where browser-harness and browser-use live: LLM-driven, self-healing, and optimized for autonomous task completion. The third is the coding-agent layer, where Microsoft WebWright lives: the agent writes and executes code, treats the browser as a disposable tool, and leaves reusable scripts as the persistent artifact.
The paradigm shift matters because most enterprise teams are still buying Playwright or Selenium licenses for use cases that now belong to the AI agent layer. The cost of that mismatch is not just licensing. It is the engineering time spent maintaining brittle selectors, the QA cycles that do not scale, and the AI initiatives that stall because the automation layer was not built for LLM control.
The procurement decision in one sentence: Use Playwright for scripted cross-browser testing. Use browser-harness or browser-use for LLM-driven autonomous web tasks. Use WebWright when you want an AI coding agent to write and maintain the automation scripts themselves. Selenium is the right choice only when legacy language bindings or regulatory constraints require it.
The five tools, profiled.
Each tool occupies a distinct position in the automation stack. Understanding the position is more important than memorizing the feature list, because the position determines what the tool is optimized for and where it will fail.
WebWright gives LLMs a terminal where they can launch multiple browser sessions, write Python/Playwright scripts, execute them, inspect screenshots and logs, and iterate until the task is complete. The persistent artifact is code, not a browser session. The browser is disposable. The workspace is the state.
Architecture
The codebase is intentionally minimal at approximately 1,500 lines total. The agent loop is 450 lines. The Playwright environment wrapper is 570 lines. The CLI is 150 lines. Model backends for OpenAI, Anthropic, and OpenRouter are 150 to 200 lines each. Dependencies are limited to httpx, pydantic, playwright, and typer. There are no hidden frameworks, no graph engines, no multi-agent orchestration layers.
The design philosophy is explicit: "just a terminal, a browser, and a model." The agent writes free-form Python scripts using the Playwright API, executes them, reads the output and screenshots, and repairs the code as needed. This is fundamentally different from tools that predict discrete clicks or coordinate-based actions. The agent reasons at the code level, not the pixel level.
Benchmarks
On the Online-Mind2Web benchmark (300 real-world web tasks), WebWright with GPT-5.4 achieves 86.7%, described by the authors as the highest score among open-sourced harnesses in the AutoEval category. Claude Opus 4.7 achieves 84.7%, with stronger performance on the hard split at 80.5%. On the Odysseys benchmark (200 long-horizon tasks), WebWright achieves 60.1% with GPT-5.4 at an average of 76.1 steps per task. This represents a 15.6-point improvement over the prior state of the art and a 26.6-point improvement over the GPT-5.4 coordinate-prediction baseline. Even Qwen-3.5-9B, a small open-source model, performs well on familiar sites when five or more reusable tools are available in the workspace.
AI Integration
WebWright is natively AI-first. The LLM is the driver. It ships with plugin manifests for Claude Code, OpenAI Codex, OpenClaw, and Hermes Agent. In Claude Code, it installs as a skill via the plugin marketplace and exposes two slash commands: /webwright:run for one-shot task scripts and /webwright:craft for parameterized, reusable CLI tools. The Task2UI mode added in May 2026 renders task results into an HTML web app for easy review and reuse.
browser-harness is a minimal Python tool that gives AI agents direct Chrome control via the Chrome DevTools Protocol, bypassing framework abstractions entirely. The core is four Python files totaling approximately 600 lines. What makes it distinctive is its self-healing architecture: when an agent encounters a task it cannot complete, it dynamically edits the helper files mid-execution, writing new browser control tools into the harness that are immediately available on subsequent runs.
Architecture
The harness connects to Chrome via a single persistent WebSocket to the CDP endpoint. There is no abstraction layer between the agent and the browser. All browser actions are issued as raw CDP commands: Page.navigate, Input.dispatchMouseEvent, DOM.querySelector. The daemon layer manages session lifecycle, handles crashes and disconnects, buffers CDP events, and orchestrates multiple browser sessions including remote endpoints.
The agent workspace stores all helper functions and site-specific skills in user-editable files. Agents build up a persistent knowledge base over time. Domain-specific and interaction-specific skill folders maintain reusable modules that can be shared across teams. This is the key differentiator from Playwright and WebWright: the harness grows smarter with use.
Self-Healing
When an agent encounters a task that requires a capability not yet in helpers.py, it writes the new helper function and immediately uses it. This means the harness adapts to novel workflows, strange OAuth flows, dynamic forms, and custom navigation patterns without requiring a human developer to update the automation scripts. The risk of this approach is real: agents editing their own helper files creates potential for RCE vulnerabilities, and the project's security posture is still evolving.
AI Integration
browser-harness is designed from the ground up for LLM control. It supports real Chrome sessions including all open tabs and logged-in sessions, enabling true last-mile automation where API access is not possible. It connects to the browser-use Cloud for multi-browser concurrency, proxy rotation, stealth mode, and parallel deployments. The MCP server wraps launch_chrome, stop_chrome, doctor, run_code, and open_research for integration with Claude Code and other MCP-compatible agents.
Playwright is the dominant cross-browser end-to-end testing framework for modern web applications. It bundles a test runner, assertions, browser isolation via BrowserContext, parallelization, trace viewer, and code generation. It supports Chromium, Firefox, and WebKit across Windows, Linux, and macOS, with language bindings for Node.js, Python, Java, and .NET.
Architecture
Playwright communicates with browsers via WebSockets, maintaining a persistent connection that eliminates the per-command HTTP round-trip overhead that slows Selenium. Tests run in parallel by default across multiple browsers. BrowserContext provides lightweight isolation: fifty agent workers can share one Chromium process with distinct contexts, significantly reducing resource consumption per agent. This makes Playwright the most efficient foundation for large-scale parallel automation.
Performance
In 2025 benchmarks, Playwright completes a standard test suite in 42.3 seconds versus Selenium Grid's 68.9 seconds, a 38% improvement. CPU usage is approximately 18.4% versus Selenium's 27.6%. Memory consumption is approximately 412 MB versus Selenium's 583 MB. Parallel test stability exceeds 98% for Playwright versus approximately 92% for Selenium Grid. (Source: Markaicode, 2025.)
AI Integration
Playwright added an MCP interface and an Agent CLI in 2025, signaling Microsoft's intent to position it as the automation layer for AI agent workflows. The MCP interface allows LLM-based agents to control Playwright programmatically. However, Playwright itself is still fundamentally a scripted tool: it requires explicit instructions and does not have an autonomous reasoning loop. It is the engine that WebWright, browser-use, and browser-harness run on top of, not a competitor to them.
Puppeteer is Google's high-level JavaScript API for controlling Chrome and Firefox via the Chrome DevTools Protocol and WebDriver BiDi. It is the original modern browser automation library and remains the default choice for Chrome-specific workloads, stealth automation, and scraping tasks where deep DevTools access matters.
Architecture
Puppeteer communicates via CDP and the newer WebDriver BiDi standard. It installs with or without a bundled Chrome binary via the puppeteer versus puppeteer-core packages. The one-browser-per-agent model is less efficient than Playwright's BrowserContext isolation at scale, but provides a simpler mental model for lightweight, single-agent tasks. Puppeteer added Bun support in 2025 alongside npm, Yarn, and pnpm.
AI Integration
Puppeteer added MCP support via chrome-devtools-mcp in 2025, enabling AI agent integration. It also supports the experimental WebMCP API. However, like Playwright, Puppeteer is fundamentally a scripted tool. The AI integration is at the control layer, not the reasoning layer. Puppeteer is commonly used as the browser engine inside legacy AI scraping bots, but is gradually losing share to Playwright for new agent frameworks due to its single-browser focus.
Position in 2026
Puppeteer's 85,000 GitHub stars and 4 million weekly npm downloads reflect its deep entrenchment in the JavaScript ecosystem. It remains the best choice for Chrome-only workloads, stealth automation, and scenarios requiring deep DevTools access. For new projects requiring cross-browser support, massive parallelization, or AI agent integration, Playwright is the more natural choice. Puppeteer's market share is declining for new projects but stable in legacy deployments.
Selenium is the original browser automation standard and remains the most widely deployed framework in regulated enterprise environments. It is not a single tool but an umbrella project encompassing WebDriver (the automation API), IDE (a Chrome/Firefox recording extension), and Grid (distributed multi-browser execution). Its language bindings cover Java, Python, C#, Ruby, and JavaScript.
Architecture
Selenium WebDriver communicates with browsers via HTTP/WebDriver, introducing per-command latency that modern WebSocket-based tools eliminate. The Grid enables distributed execution across different machines and platforms, supporting multi-browser and multi-OS combinations at scale. The IDE records user actions as a starting point for test development. The architecture is non-intrusive: the API does not require compilation with application code.
Performance Gap
The performance gap with Playwright has widened in 2025. Selenium Grid uses 27.6% CPU versus Playwright's 18.4%. Memory consumption is 583 MB versus Playwright's 412 MB. Parallel test stability is approximately 92% versus Playwright's 98%. Setup complexity is higher: Selenium Grid requires separate driver configuration and management that Playwright handles automatically. (Source: Markaicode, 2025.)
Position in 2026
Selenium remains heavily used in legacy and regulated enterprise environments, particularly in banking, healthcare, and government, due to its maturity, broad language support, and extensive documentation. Fortune 500 companies are increasingly running dual frameworks: Playwright for new web applications and Selenium for legacy suites that are too costly to migrate. New enterprise projects almost universally select Playwright over Selenium. Selenium's market share is declining for new projects but its installed base is enormous and will persist for years.
The numbers that matter.
Benchmarks across two categories: AI agent task completion (where WebWright and browser-harness compete) and scripted test execution (where Playwright and Selenium compete). These are different competitions. Comparing WebWright's Mind2Web score to Playwright's test execution speed is a category error.
AI Agent Task Completion: Online-Mind2Web (300 tasks)
Scripted Test Execution: Suite Completion Time
Resource Usage: CPU and Memory
Key insight: WebWright's +15.6-point improvement over prior SOTA on Odysseys is not a marginal gain. It represents the difference between an agent that can reliably complete long-horizon web tasks and one that cannot. The coding-agent paradigm (write and execute scripts) is demonstrably more capable than the click-prediction paradigm for complex, multi-step tasks.
The procurement decision in one table.
The right tool depends on the use case. This table maps the five tools across the dimensions that matter most for enterprise procurement decisions.
| Dimension | WebWright | browser-harness | Playwright | Puppeteer | Selenium |
|---|---|---|---|---|---|
| Primary use case | AI coding agent writes automation scripts | LLM-driven autonomous web tasks, self-healing | Scripted cross-browser testing and CI/CD | Chrome-specific automation and scraping | Legacy enterprise testing, regulated environments |
| AI/LLM integration | Native: LLM is the driver | Native: self-healing, CDP direct | Added: MCP interface, Agent CLI | Added: chrome-devtools-mcp | Minimal: ecosystem plugins only |
| Browser support | Chromium (via Playwright) | Chrome (CDP direct) | Chromium, Firefox, WebKit | Chrome, Firefox | Chrome, Edge, Firefox, IE, Safari |
| Language | Python | Python | Node.js, Python, Java, .NET | JavaScript / TypeScript | Java, Python, C#, Ruby, JS |
| Setup complexity | Low (pip install + API key) | Low (4 Python files) | Low (3 CLI commands) | Low (npm install) | High (Grid, drivers, config) |
| Parallel execution | Sequential per agent run | Cloud: up to 3 free, more paid | Native: BrowserContext isolation | One browser per agent | Grid: distributed multi-machine |
| Enterprise readiness | Early: MIT research project | Growing: cloud tier available | High: Microsoft-backed, CI/CD native | High: Google-backed, mature | Very high: 20-year track record |
| Self-healing | Yes: agent repairs its own scripts | Yes: agent edits helper files mid-run | No: requires human maintenance | No: requires human maintenance | No: requires human maintenance |
| License | MIT | Open source | Apache 2.0 | Apache 2.0 | Apache 2.0 |
| GitHub stars (2026) | 1.5k | 95.7k (browser-use) | 60k+ | 85k+ | 32k+ |
Three paradigms, not five tools.
The most useful way to understand the landscape is not to compare five tools but to understand the three paradigms they represent. Each paradigm has a different answer to the question: who or what decides what the browser does next?
Paradigm 1: Scripted Automation (Playwright, Puppeteer, Selenium)
A human developer writes explicit instructions. The tool executes them deterministically. The browser does exactly what the script says. When the web application changes, the script breaks and a human must fix it. This paradigm is optimized for repeatability, CI/CD integration, and regression testing. It is the right choice when the task is well-defined, the application is under the team's control, and the goal is to verify that known behavior still works.
Playwright is the best implementation of this paradigm in 2026. It is faster, more resource-efficient, and easier to maintain than Selenium. Puppeteer is the right choice within this paradigm when Chrome-specific features or deep DevTools access are required. Selenium is the right choice when legacy language bindings, regulatory constraints, or an existing test suite make migration too costly.
Paradigm 2: LLM-Driven Agent (browser-harness, browser-use)
An LLM decides what the browser does next based on the current page state. The tool provides the LLM with a control surface: CDP commands, helper functions, screenshots, DOM snapshots. The LLM reasons about the task, issues commands, observes the result, and continues until the task is complete or fails. When the web application changes, the agent adapts without requiring a human to update scripts. This paradigm is optimized for autonomous task completion, novel workflows, and scenarios where the task cannot be fully specified in advance.
browser-harness is the most minimal implementation of this paradigm. Its self-healing architecture, where the agent writes new helper functions mid-execution, makes it uniquely capable for novel workflows. The risk is the security surface created by agents editing their own code. browser-use (the parent project) adds a higher-level agent API and cloud infrastructure on top of the same paradigm.
Paradigm 3: Coding Agent (WebWright)
An LLM writes Python/Playwright scripts, executes them in a terminal, inspects the output, and iterates until the task is complete. The persistent artifact is code, not a browser session. The agent reasons at the code level, not the pixel or DOM level. This paradigm is optimized for tasks that benefit from reusable, composable scripts: the agent builds a library of tools over time that can be applied to similar tasks. The coding-agent paradigm achieves higher benchmark scores on complex, long-horizon tasks than the click-prediction paradigm because code is a more expressive action space than discrete clicks.
WebWright is the only open-source implementation of this paradigm with published benchmark results. Its 86.7% score on Online-Mind2Web and 60.1% on Odysseys represent the current state of the art for open-sourced browser agents.
The architectural insight for enterprise teams: The three paradigms are not mutually exclusive. A production AI agent stack might use Playwright as the browser engine (Paradigm 1), browser-harness for autonomous task completion (Paradigm 2), and WebWright for generating and maintaining the automation scripts themselves (Paradigm 3). The tools are complementary, not competitive, when used at the right layer of the stack.
What enterprise teams need to know.
Security
The self-healing architectures of browser-harness and WebWright introduce security considerations that scripted tools do not. When an agent can edit its own helper files mid-execution, the attack surface includes the agent's reasoning process, the LLM provider's outputs, and the file system. Enterprise deployments should run these tools in isolated environments with restricted file system access, network egress controls, and audit logging of all agent-generated code before execution. The browser-harness project has open discussions about RCE vulnerabilities and is actively evolving its security posture.
Playwright and Selenium have mature security postures appropriate for enterprise CI/CD pipelines. They do not execute LLM-generated code and do not have self-modification capabilities.
Compliance and Auditability
Selenium's 20-year track record and broad language support make it the default choice in regulated industries where compliance teams require documented, audited test suites. Playwright is increasingly accepted in regulated environments as its enterprise adoption grows. WebWright and browser-harness are too new for most regulated environments and lack the audit trail documentation that compliance teams require.
Vendor Support
Playwright is backed by Microsoft with a dedicated team, regular releases, and enterprise support options. Puppeteer is backed by Google with similar support. Selenium is maintained by the Selenium HQ organization with broad community support. WebWright is a Microsoft Research project with MIT license and no formal enterprise support. browser-harness is an open-source project with a commercial cloud tier via browser-use.
Total Cost of Ownership
The TCO comparison favors Playwright for scripted testing: lower maintenance due to auto-waits and built-in features, faster execution reducing CI/CD costs, and simpler setup reducing DevOps overhead. Selenium's higher maintenance burden is well-documented: manual wait handling, driver update management, and more brittle selectors increase the ongoing engineering cost. For AI agent use cases, browser-harness and WebWright reduce the human engineering cost of maintaining automation scripts by shifting that work to the LLM, but introduce LLM API costs and the security overhead described above.
Which tool for which job.
Use WebWright when:
You want an AI coding agent to write, maintain, and iterate on browser automation scripts. You are building a research or prototyping environment where the agent needs to explore unfamiliar web applications. You want the highest benchmark scores on complex, long-horizon web tasks. You are already using Claude Code or another agent platform that supports the WebWright plugin. You accept the early-stage maturity and lack of enterprise support.
Use browser-harness when:
You need an LLM to control a real Chrome session including logged-in state and open tabs. You are building autonomous agents that need to adapt to novel workflows without human script maintenance. You want the thinnest possible abstraction between the LLM and the browser. You are comfortable with the security implications of a self-healing architecture. You are using Claude Code and want the MCP server integration.
Use Playwright when:
You are building a scripted end-to-end test suite for a web application. You need cross-browser coverage across Chromium, Firefox, and WebKit. You need the best performance, lowest resource usage, and highest parallel stability among scripted tools. You are integrating with a CI/CD pipeline. You want Microsoft enterprise support and a mature, well-documented API. You are building the browser automation layer that an AI agent will run on top of.
Use Puppeteer when:
You need Chrome-specific features or deep DevTools access. You are building stealth automation or anti-bot evasion scenarios. You have an existing JavaScript/TypeScript codebase and want the simplest possible Chrome automation API. You do not need cross-browser coverage. You are maintaining a legacy scraping or automation system built on Puppeteer.
Use Selenium when:
You are in a regulated industry where Selenium is the documented, audited standard. You have a large existing Selenium test suite that is too costly to migrate. You need Java, C#, or Ruby language bindings that Playwright does not support. You need Internet Explorer support. You are maintaining a legacy enterprise test infrastructure that is not worth replacing.
Source notes.
All benchmarks and statistics in this report are sourced from the references below. No statistics are invented or extrapolated beyond what the sources report.
-
01
Microsoft Research · WebWright (2026). Lu, Yadong; Xu, Lingrui; Huang, Chao; Awadallah, Ahmed. "Webwright: A Terminal Is All You Need For Web Agents." GitHub repository: github.com/microsoft/webwright. Benchmark results: 86.7% on Online-Mind2Web (GPT-5.4), 84.7% (Claude Opus 4.7), 60.1% on Odysseys (GPT-5.4), +15.6 points over prior SOTA.
-
02
browser-use / browser-harness (2025-2026). GitHub: github.com/browser-use/browser-harness. Architecture details from: Medium (Dr. Fadi Shaar), Pyshine, NeuralStackly, Flowtivity, TheMenonLab, Knightli. Community stats: 95.7k GitHub stars (browser-use parent project) as of May 2026.
-
03
Playwright (2025-2026). playwright.dev. Performance benchmarks from Markaicode (2025): 42.3s suite completion, 18.4% CPU, 412 MB memory, 98.2% parallel stability. Community stats: 60k+ GitHub stars, 7M npm weekly downloads.
-
04
Puppeteer (2025-2026). pptr.dev. Community stats: 85k+ GitHub stars, 4M npm weekly downloads. MCP integration via chrome-devtools-mcp. WebDriver BiDi support added 2025.
-
05
Selenium WebDriver (2025-2026). selenium.dev. Performance benchmarks from Markaicode (2025): 68.9s suite completion, 27.6% CPU, 583 MB memory, 92.1% parallel stability. Community stats: 32k+ GitHub stars, 354k+ repositories.
-
06
Markaicode · Playwright MCP vs. Selenium Grid: 2025 Performance Benchmarks. markaicode.com/vs/playwright-mcp-vs-selenium-grid/. Cross-browser performance comparison with CPU, memory, and stability metrics.
-
07
MorphLLM · Playwright vs Puppeteer (2026). morphllm.com/comparisons/playwright-vs-puppeteer. GitHub star counts and adoption trends for Playwright and Puppeteer as of early 2026.
-
08
NXCode · Stagehand vs Browser Use vs Playwright: AI Browser Automation Compared (2026). nxcode.io/resources/news/stagehand-vs-browser-use-vs-playwright-ai-browser-automation-2026. AI agent adoption trends and browser-use community growth.
-
09
Microsoft Azure Tech Community · Give Your AI Agent Eyes: Browser-Harness Meets Playwright Workspaces (2026). techcommunity.microsoft.com. Remote browser and cloud support details for browser-harness.
-
10
DEV Community · Playwright vs. Selenium in 2025: Key Differences for Test Automation. dev.to/dmitrybaraishuk. Enterprise adoption trends and repository count comparisons.