Kernel browsers expose four ways to drive a session. For agents, we recommend computer use or playwright execution — both run co-located with the browser and avoid the bot-detection surface a direct CDP connection introduces.Documentation Index
Fetch the complete documentation index at: https://kernel.sh/docs/llms.txt
Use this file to discover all available pages before exploring further.
- Computer Use
- Playwright Execution
- CDP
- WebDriver BiDi
Kernel’s Computer Controls API exposes OS-level mouse, keyboard, and screen primitives — the surface a computer-use model already knows how to drive (screenshot, click, type, key, scroll, drag). No CDP or WebDriver connection required, so there’s no protocol fingerprint to leak. Ideal for Claude, OpenAI, or Gemini computer-use loops.
Why computer use for agents
Kernel’s computer controls are built to match how computer-use models were trained — the same primitives the model emits (screenshot, click at coords, type, key, scroll, drag) map 1:1 onto the API. There’s no harness translating model output into framework calls.- Native fit. Screenshot, click, type, key, scroll, drag — the primitives the model already speaks.
- Faster screenshots. Captures bypass CDP, which removes the largest source of latency in a vision loop.
- Better against bot detection. No CDP connection means no CDP fingerprint to leak. Pairs naturally with stealth mode and residential proxies.
- Human-like input. OS-level events with Bézier-curve mouse paths, variable typing speed, and configurable mistype rate.
- Not DOM-limited. Screenshots capture the full VM, so the agent can see and interact with native dialogs, canvas elements, iframes, and PDFs — not just things you can address with a selector.
Why playwright execution over a direct CDP connection
If you’re reaching for Playwright, prefer the execution API overconnectOverCDP. Same Playwright API you already know, none of the setup.
- Run from anywhere. No
playwrightpackage to version-pin, no Chromium download, no CDP connection to manage. Send the code, get the result. - Co-located with the browser. Code runs in the same VM as the browser — no network hop between your script and the page, fewer flakes.
- Patchright by default. Hardened against bot detection out of the box.
- Full Playwright API.
page,context, andbrowserare all in scope. Anything Playwright can do — DOM queries, file uploads, full-page screenshots — works here. - Returns values.
returnfrom your code and the result comes back in the response. Easy to use as an agent tool.
Computer use + playwright execution
Computer controls drive the browser the way a person would — they don’t speak the programmatic API surface. Anything you’d reach for the DOM or Playwright client for (reading text and attributes,page.goto, file uploads, cookie or storage access, switching tabs) belongs on the playwright execution side. The recommended pattern for agents is computer controls for interaction, playwright execution as a tool the agent can call when it needs structured data or a programmatic action.
Going deeper
- Computer Controls reference — every mouse, keyboard, and screen primitive.
- Playwright Execution reference — the full execution surface, return values, and timeouts.
- Computer use integrations — drop-in examples for Anthropic, Gemini, OpenAI, and more.