Browser Automation Tools: WebWright vs browser-harness vs Playwright vs Puppeteer vs Selenium Review

Executive Summary

Five tools, three paradigms, one decision.

Browser automation has fractured into three distinct paradigms in 2026. The first is the scripted testing layer, where Playwright and Selenium live: deterministic, developer-authored, and optimized for CI/CD pipelines. The second is the AI agent layer, where browser-harness and browser-use live: LLM-driven, self-healing, and optimized for autonomous task completion. The third is the coding-agent layer, where Microsoft WebWright lives: the agent writes and executes code, treats the browser as a disposable tool, and leaves reusable scripts as the persistent artifact.

The paradigm shift matters because most enterprise teams are still buying Playwright or Selenium licenses for use cases that now belong to the AI agent layer. The cost of that mismatch is not just licensing. It is the engineering time spent maintaining brittle selectors, the QA cycles that do not scale, and the AI initiatives that stall because the automation layer was not built for LLM control.

86.7%

WebWright on Mind2Web

Highest score among open-sourced browser agent harnesses on the Online-Mind2Web benchmark (300 tasks, GPT-5.4). Microsoft Research, 2026.

95.7k

browser-use GitHub stars

browser-use (the parent project of browser-harness) has become the fastest-growing browser automation project on GitHub as of May 2026.

60k+

Playwright GitHub stars

Playwright is the dominant framework for scripted cross-browser testing, with 7 million npm weekly downloads and growing enterprise adoption. GitHub, 2026.

38%

Playwright faster than Selenium

Playwright completes the same test suite 38% faster than Selenium Grid in 2025 benchmarks, with lower CPU and memory usage. Markaicode, 2025.

The procurement decision in one sentence: Use Playwright for scripted cross-browser testing. Use browser-harness or browser-use for LLM-driven autonomous web tasks. Use WebWright when you want an AI coding agent to write and maintain the automation scripts themselves. Selenium is the right choice only when legacy language bindings or regulatory constraints require it.

Tool Profiles

The five tools, profiled.

Each tool occupies a distinct position in the automation stack. Understanding the position is more important than memorizing the feature list, because the position determines what the tool is optimized for and where it will fail.

Microsoft WebWright

github.com/microsoft/webwright · MIT License · 2026

AI Coding Agent

WebWright gives LLMs a terminal where they can launch multiple browser sessions, write Python/Playwright scripts, execute them, inspect screenshots and logs, and iterate until the task is complete. The persistent artifact is code, not a browser session. The browser is disposable. The workspace is the state.

Architecture

The codebase is intentionally minimal at approximately 1,500 lines total. The agent loop is 450 lines. The Playwright environment wrapper is 570 lines. The CLI is 150 lines. Model backends for OpenAI, Anthropic, and OpenRouter are 150 to 200 lines each. Dependencies are limited to httpx, pydantic, playwright, and typer. There are no hidden frameworks, no graph engines, no multi-agent orchestration layers.

The design philosophy is explicit: "just a terminal, a browser, and a model." The agent writes free-form Python scripts using the Playwright API, executes them, reads the output and screenshots, and repairs the code as needed. This is fundamentally different from tools that predict discrete clicks or coordinate-based actions. The agent reasons at the code level, not the pixel level.

Benchmarks

On the Online-Mind2Web benchmark (300 real-world web tasks), WebWright with GPT-5.4 achieves 86.7%, described by the authors as the highest score among open-sourced harnesses in the AutoEval category. Claude Opus 4.7 achieves 84.7%, with stronger performance on the hard split at 80.5%. On the Odysseys benchmark (200 long-horizon tasks), WebWright achieves 60.1% with GPT-5.4 at an average of 76.1 steps per task. This represents a 15.6-point improvement over the prior state of the art and a 26.6-point improvement over the GPT-5.4 coordinate-prediction baseline. Even Qwen-3.5-9B, a small open-source model, performs well on familiar sites when five or more reusable tools are available in the workspace.

AI Integration

WebWright is natively AI-first. The LLM is the driver. It ships with plugin manifests for Claude Code, OpenAI Codex, OpenClaw, and Hermes Agent. In Claude Code, it installs as a skill via the plugin marketplace and exposes two slash commands: /webwright:run for one-shot task scripts and /webwright:craft for parameterized, reusable CLI tools. The Task2UI mode added in May 2026 renders task results into an HTML web app for easy review and reuse.

Python MIT License Playwright dependency OpenAI / Anthropic / OpenRouter Claude Code plugin 1,500 lines 1.5k GitHub stars

browser-harness

github.com/browser-use/browser-harness · Open Source · 2025-2026

Self-Healing Agent

browser-harness is a minimal Python tool that gives AI agents direct Chrome control via the Chrome DevTools Protocol, bypassing framework abstractions entirely. The core is four Python files totaling approximately 600 lines. What makes it distinctive is its self-healing architecture: when an agent encounters a task it cannot complete, it dynamically edits the helper files mid-execution, writing new browser control tools into the harness that are immediately available on subsequent runs.

Architecture

The harness connects to Chrome via a single persistent WebSocket to the CDP endpoint. There is no abstraction layer between the agent and the browser. All browser actions are issued as raw CDP commands: Page.navigate, Input.dispatchMouseEvent, DOM.querySelector. The daemon layer manages session lifecycle, handles crashes and disconnects, buffers CDP events, and orchestrates multiple browser sessions including remote endpoints.

The agent workspace stores all helper functions and site-specific skills in user-editable files. Agents build up a persistent knowledge base over time. Domain-specific and interaction-specific skill folders maintain reusable modules that can be shared across teams. This is the key differentiator from Playwright and WebWright: the harness grows smarter with use.

Self-Healing

When an agent encounters a task that requires a capability not yet in helpers.py, it writes the new helper function and immediately uses it. This means the harness adapts to novel workflows, strange OAuth flows, dynamic forms, and custom navigation patterns without requiring a human developer to update the automation scripts. The risk of this approach is real: agents editing their own helper files creates potential for RCE vulnerabilities, and the project's security posture is still evolving.

AI Integration

browser-harness is designed from the ground up for LLM control. It supports real Chrome sessions including all open tabs and logged-in sessions, enabling true last-mile automation where API access is not possible. It connects to the browser-use Cloud for multi-browser concurrency, proxy rotation, stealth mode, and parallel deployments. The MCP server wraps launch_chrome, stop_chrome, doctor, run_code, and open_research for integration with Claude Code and other MCP-compatible agents.

Python CDP direct Self-healing helpers MCP server Cloud support 600 lines core 95.7k stars (browser-use)

Microsoft Playwright

playwright.dev · Apache 2.0 · Microsoft · 2020-2026

Scripted Testing

Playwright is the dominant cross-browser end-to-end testing framework for modern web applications. It bundles a test runner, assertions, browser isolation via BrowserContext, parallelization, trace viewer, and code generation. It supports Chromium, Firefox, and WebKit across Windows, Linux, and macOS, with language bindings for Node.js, Python, Java, and .NET.

Architecture

Playwright communicates with browsers via WebSockets, maintaining a persistent connection that eliminates the per-command HTTP round-trip overhead that slows Selenium. Tests run in parallel by default across multiple browsers. BrowserContext provides lightweight isolation: fifty agent workers can share one Chromium process with distinct contexts, significantly reducing resource consumption per agent. This makes Playwright the most efficient foundation for large-scale parallel automation.

Performance

In 2025 benchmarks, Playwright completes a standard test suite in 42.3 seconds versus Selenium Grid's 68.9 seconds, a 38% improvement. CPU usage is approximately 18.4% versus Selenium's 27.6%. Memory consumption is approximately 412 MB versus Selenium's 583 MB. Parallel test stability exceeds 98% for Playwright versus approximately 92% for Selenium Grid. (Source: Markaicode, 2025.)

AI Integration

Playwright added an MCP interface and an Agent CLI in 2025, signaling Microsoft's intent to position it as the automation layer for AI agent workflows. The MCP interface allows LLM-based agents to control Playwright programmatically. However, Playwright itself is still fundamentally a scripted tool: it requires explicit instructions and does not have an autonomous reasoning loop. It is the engine that WebWright, browser-use, and browser-harness run on top of, not a competitor to them.

Node.js / Python / Java / .NET Apache 2.0 Chromium / Firefox / WebKit MCP interface 60k+ GitHub stars 7M npm weekly downloads

Puppeteer

pptr.dev · Apache 2.0 · Google · 2017-2026

Chrome-First Automation

Puppeteer is Google's high-level JavaScript API for controlling Chrome and Firefox via the Chrome DevTools Protocol and WebDriver BiDi. It is the original modern browser automation library and remains the default choice for Chrome-specific workloads, stealth automation, and scraping tasks where deep DevTools access matters.

Architecture

Puppeteer communicates via CDP and the newer WebDriver BiDi standard. It installs with or without a bundled Chrome binary via the puppeteer versus puppeteer-core packages. The one-browser-per-agent model is less efficient than Playwright's BrowserContext isolation at scale, but provides a simpler mental model for lightweight, single-agent tasks. Puppeteer added Bun support in 2025 alongside npm, Yarn, and pnpm.

AI Integration

Puppeteer added MCP support via chrome-devtools-mcp in 2025, enabling AI agent integration. It also supports the experimental WebMCP API. However, like Playwright, Puppeteer is fundamentally a scripted tool. The AI integration is at the control layer, not the reasoning layer. Puppeteer is commonly used as the browser engine inside legacy AI scraping bots, but is gradually losing share to Playwright for new agent frameworks due to its single-browser focus.

Position in 2026

Puppeteer's 85,000 GitHub stars and 4 million weekly npm downloads reflect its deep entrenchment in the JavaScript ecosystem. It remains the best choice for Chrome-only workloads, stealth automation, and scenarios requiring deep DevTools access. For new projects requiring cross-browser support, massive parallelization, or AI agent integration, Playwright is the more natural choice. Puppeteer's market share is declining for new projects but stable in legacy deployments.

JavaScript / TypeScript Apache 2.0 Chrome / Firefox CDP + WebDriver BiDi 85k GitHub stars 4M npm weekly downloads

Selenium WebDriver

selenium.dev · Apache 2.0 · Selenium HQ · 2004-2026

Legacy Enterprise

Selenium is the original browser automation standard and remains the most widely deployed framework in regulated enterprise environments. It is not a single tool but an umbrella project encompassing WebDriver (the automation API), IDE (a Chrome/Firefox recording extension), and Grid (distributed multi-browser execution). Its language bindings cover Java, Python, C#, Ruby, and JavaScript.

Architecture

Selenium WebDriver communicates with browsers via HTTP/WebDriver, introducing per-command latency that modern WebSocket-based tools eliminate. The Grid enables distributed execution across different machines and platforms, supporting multi-browser and multi-OS combinations at scale. The IDE records user actions as a starting point for test development. The architecture is non-intrusive: the API does not require compilation with application code.

Performance Gap

The performance gap with Playwright has widened in 2025. Selenium Grid uses 27.6% CPU versus Playwright's 18.4%. Memory consumption is 583 MB versus Playwright's 412 MB. Parallel test stability is approximately 92% versus Playwright's 98%. Setup complexity is higher: Selenium Grid requires separate driver configuration and management that Playwright handles automatically. (Source: Markaicode, 2025.)

Position in 2026

Selenium remains heavily used in legacy and regulated enterprise environments, particularly in banking, healthcare, and government, due to its maturity, broad language support, and extensive documentation. Fortune 500 companies are increasingly running dual frameworks: Playwright for new web applications and Selenium for legacy suites that are too costly to migrate. New enterprise projects almost universally select Playwright over Selenium. Selenium's market share is declining for new projects but its installed base is enormous and will persist for years.

Java / Python / C# / Ruby / JS Apache 2.0 Chrome / Edge / Firefox / IE / Safari Grid distributed execution 32k+ GitHub stars 354k+ repositories

Performance Benchmarks

The numbers that matter.

Benchmarks across two categories: AI agent task completion (where WebWright and browser-harness compete) and scripted test execution (where Playwright and Selenium compete). These are different competitions. Comparing WebWright's Mind2Web score to Playwright's test execution speed is a category error.

AI Agent Task Completion: Online-Mind2Web (300 tasks)

WebWright (GPT-5.4)Microsoft Research, 2026

86.7%

WebWright (Claude Opus 4.7)Microsoft Research, 2026

84.7%

Prior SOTA (vision-based)Odysseys baseline, 2025

44.5%

Coordinate prediction baselineGPT-5.4 base, 2026

33.5%

Scripted Test Execution: Suite Completion Time

Playwright MCPMarkaicode, 2025

42.3s

Selenium GridMarkaicode, 2025

68.9s

Resource Usage: CPU and Memory

Playwright CPU usageMarkaicode, 2025

18.4%

Selenium CPU usageMarkaicode, 2025

27.6%

Key insight: WebWright's +15.6-point improvement over prior SOTA on Odysseys is not a marginal gain. It represents the difference between an agent that can reliably complete long-horizon web tasks and one that cannot. The coding-agent paradigm (write and execute scripts) is demonstrably more capable than the click-prediction paradigm for complex, multi-step tasks.

Decision Matrix

The procurement decision in one table.

The right tool depends on the use case. This table maps the five tools across the dimensions that matter most for enterprise procurement decisions.

Dimension	WebWright	browser-harness	Playwright	Puppeteer	Selenium
Primary use case	AI coding agent writes automation scripts	LLM-driven autonomous web tasks, self-healing	Scripted cross-browser testing and CI/CD	Chrome-specific automation and scraping	Legacy enterprise testing, regulated environments
AI/LLM integration	Native: LLM is the driver	Native: self-healing, CDP direct	Added: MCP interface, Agent CLI	Added: chrome-devtools-mcp	Minimal: ecosystem plugins only
Browser support	Chromium (via Playwright)	Chrome (CDP direct)	Chromium, Firefox, WebKit	Chrome, Firefox	Chrome, Edge, Firefox, IE, Safari
Language	Python	Python	Node.js, Python, Java, .NET	JavaScript / TypeScript	Java, Python, C#, Ruby, JS
Setup complexity	Low (pip install + API key)	Low (4 Python files)	Low (3 CLI commands)	Low (npm install)	High (Grid, drivers, config)
Parallel execution	Sequential per agent run	Cloud: up to 3 free, more paid	Native: BrowserContext isolation	One browser per agent	Grid: distributed multi-machine
Enterprise readiness	Early: MIT research project	Growing: cloud tier available	High: Microsoft-backed, CI/CD native	High: Google-backed, mature	Very high: 20-year track record
Self-healing	Yes: agent repairs its own scripts	Yes: agent edits helper files mid-run	No: requires human maintenance	No: requires human maintenance	No: requires human maintenance
License	MIT	Open source	Apache 2.0	Apache 2.0	Apache 2.0
GitHub stars (2026)	1.5k	95.7k (browser-use)	60k+	85k+	32k+

Architecture Deep Dive

Three paradigms, not five tools.

The most useful way to understand the landscape is not to compare five tools but to understand the three paradigms they represent. Each paradigm has a different answer to the question: who or what decides what the browser does next?

Paradigm 1: Scripted Automation (Playwright, Puppeteer, Selenium)

A human developer writes explicit instructions. The tool executes them deterministically. The browser does exactly what the script says. When the web application changes, the script breaks and a human must fix it. This paradigm is optimized for repeatability, CI/CD integration, and regression testing. It is the right choice when the task is well-defined, the application is under the team's control, and the goal is to verify that known behavior still works.

Playwright is the best implementation of this paradigm in 2026. It is faster, more resource-efficient, and easier to maintain than Selenium. Puppeteer is the right choice within this paradigm when Chrome-specific features or deep DevTools access are required. Selenium is the right choice when legacy language bindings, regulatory constraints, or an existing test suite make migration too costly.

Paradigm 2: LLM-Driven Agent (browser-harness, browser-use)

An LLM decides what the browser does next based on the current page state. The tool provides the LLM with a control surface: CDP commands, helper functions, screenshots, DOM snapshots. The LLM reasons about the task, issues commands, observes the result, and continues until the task is complete or fails. When the web application changes, the agent adapts without requiring a human to update scripts. This paradigm is optimized for autonomous task completion, novel workflows, and scenarios where the task cannot be fully specified in advance.

browser-harness is the most minimal implementation of this paradigm. Its self-healing architecture, where the agent writes new helper functions mid-execution, makes it uniquely capable for novel workflows. The risk is the security surface created by agents editing their own code. browser-use (the parent project) adds a higher-level agent API and cloud infrastructure on top of the same paradigm.

Paradigm 3: Coding Agent (WebWright)

An LLM writes Python/Playwright scripts, executes them in a terminal, inspects the output, and iterates until the task is complete. The persistent artifact is code, not a browser session. The agent reasons at the code level, not the pixel or DOM level. This paradigm is optimized for tasks that benefit from reusable, composable scripts: the agent builds a library of tools over time that can be applied to similar tasks. The coding-agent paradigm achieves higher benchmark scores on complex, long-horizon tasks than the click-prediction paradigm because code is a more expressive action space than discrete clicks.

WebWright is the only open-source implementation of this paradigm with published benchmark results. Its 86.7% score on Online-Mind2Web and 60.1% on Odysseys represent the current state of the art for open-sourced browser agents.

The architectural insight for enterprise teams: The three paradigms are not mutually exclusive. A production AI agent stack might use Playwright as the browser engine (Paradigm 1), browser-harness for autonomous task completion (Paradigm 2), and WebWright for generating and maintaining the automation scripts themselves (Paradigm 3). The tools are complementary, not competitive, when used at the right layer of the stack.

Enterprise Readiness

What enterprise teams need to know.

Security

The self-healing architectures of browser-harness and WebWright introduce security considerations that scripted tools do not. When an agent can edit its own helper files mid-execution, the attack surface includes the agent's reasoning process, the LLM provider's outputs, and the file system. Enterprise deployments should run these tools in isolated environments with restricted file system access, network egress controls, and audit logging of all agent-generated code before execution. The browser-harness project has open discussions about RCE vulnerabilities and is actively evolving its security posture.

Playwright and Selenium have mature security postures appropriate for enterprise CI/CD pipelines. They do not execute LLM-generated code and do not have self-modification capabilities.

Compliance and Auditability

Selenium's 20-year track record and broad language support make it the default choice in regulated industries where compliance teams require documented, audited test suites. Playwright is increasingly accepted in regulated environments as its enterprise adoption grows. WebWright and browser-harness are too new for most regulated environments and lack the audit trail documentation that compliance teams require.

Vendor Support

Playwright is backed by Microsoft with a dedicated team, regular releases, and enterprise support options. Puppeteer is backed by Google with similar support. Selenium is maintained by the Selenium HQ organization with broad community support. WebWright is a Microsoft Research project with MIT license and no formal enterprise support. browser-harness is an open-source project with a commercial cloud tier via browser-use.

Total Cost of Ownership

The TCO comparison favors Playwright for scripted testing: lower maintenance due to auto-waits and built-in features, faster execution reducing CI/CD costs, and simpler setup reducing DevOps overhead. Selenium's higher maintenance burden is well-documented: manual wait handling, driver update management, and more brittle selectors increase the ongoing engineering cost. For AI agent use cases, browser-harness and WebWright reduce the human engineering cost of maintaining automation scripts by shifting that work to the LLM, but introduce LLM API costs and the security overhead described above.

Decision Framework

Which tool for which job.

Use WebWright when:

You want an AI coding agent to write, maintain, and iterate on browser automation scripts. You are building a research or prototyping environment where the agent needs to explore unfamiliar web applications. You want the highest benchmark scores on complex, long-horizon web tasks. You are already using Claude Code or another agent platform that supports the WebWright plugin. You accept the early-stage maturity and lack of enterprise support.

Use browser-harness when:

You need an LLM to control a real Chrome session including logged-in state and open tabs. You are building autonomous agents that need to adapt to novel workflows without human script maintenance. You want the thinnest possible abstraction between the LLM and the browser. You are comfortable with the security implications of a self-healing architecture. You are using Claude Code and want the MCP server integration.

Use Playwright when:

You are building a scripted end-to-end test suite for a web application. You need cross-browser coverage across Chromium, Firefox, and WebKit. You need the best performance, lowest resource usage, and highest parallel stability among scripted tools. You are integrating with a CI/CD pipeline. You want Microsoft enterprise support and a mature, well-documented API. You are building the browser automation layer that an AI agent will run on top of.

Use Puppeteer when:

You need Chrome-specific features or deep DevTools access. You are building stealth automation or anti-bot evasion scenarios. You have an existing JavaScript/TypeScript codebase and want the simplest possible Chrome automation API. You do not need cross-browser coverage. You are maintaining a legacy scraping or automation system built on Puppeteer.

Use Selenium when:

You are in a regulated industry where Selenium is the documented, audited standard. You have a large existing Selenium test suite that is too costly to migrate. You need Java, C#, or Ruby language bindings that Playwright does not support. You need Internet Explorer support. You are maintaining a legacy enterprise test infrastructure that is not worth replacing.

Sources

Source notes.

All benchmarks and statistics in this report are sourced from the references below. No statistics are invented or extrapolated beyond what the sources report.

01
Microsoft Research · WebWright (2026). Lu, Yadong; Xu, Lingrui; Huang, Chao; Awadallah, Ahmed. "Webwright: A Terminal Is All You Need For Web Agents." GitHub repository: github.com/microsoft/webwright. Benchmark results: 86.7% on Online-Mind2Web (GPT-5.4), 84.7% (Claude Opus 4.7), 60.1% on Odysseys (GPT-5.4), +15.6 points over prior SOTA.
02
browser-use / browser-harness (2025-2026). GitHub: github.com/browser-use/browser-harness. Architecture details from: Medium (Dr. Fadi Shaar), Pyshine, NeuralStackly, Flowtivity, TheMenonLab, Knightli. Community stats: 95.7k GitHub stars (browser-use parent project) as of May 2026.
03
Playwright (2025-2026). playwright.dev. Performance benchmarks from Markaicode (2025): 42.3s suite completion, 18.4% CPU, 412 MB memory, 98.2% parallel stability. Community stats: 60k+ GitHub stars, 7M npm weekly downloads.
04
Puppeteer (2025-2026). pptr.dev. Community stats: 85k+ GitHub stars, 4M npm weekly downloads. MCP integration via chrome-devtools-mcp. WebDriver BiDi support added 2025.
05
Selenium WebDriver (2025-2026). selenium.dev. Performance benchmarks from Markaicode (2025): 68.9s suite completion, 27.6% CPU, 583 MB memory, 92.1% parallel stability. Community stats: 32k+ GitHub stars, 354k+ repositories.
06
Markaicode · Playwright MCP vs. Selenium Grid: 2025 Performance Benchmarks. markaicode.com/vs/playwright-mcp-vs-selenium-grid/. Cross-browser performance comparison with CPU, memory, and stability metrics.
07
MorphLLM · Playwright vs Puppeteer (2026). morphllm.com/comparisons/playwright-vs-puppeteer. GitHub star counts and adoption trends for Playwright and Puppeteer as of early 2026.
08
NXCode · Stagehand vs Browser Use vs Playwright: AI Browser Automation Compared (2026). nxcode.io/resources/news/stagehand-vs-browser-use-vs-playwright-ai-browser-automation-2026. AI agent adoption trends and browser-use community growth.
09
Microsoft Azure Tech Community · Give Your AI Agent Eyes: Browser-Harness Meets Playwright Workspaces (2026). techcommunity.microsoft.com. Remote browser and cloud support details for browser-harness.
10
DEV Community · Playwright vs. Selenium in 2025: Key Differences for Test Automation. dev.to/dmitrybaraishuk. Enterprise adoption trends and repository count comparisons.