> ## Documentation Index
> Fetch the complete documentation index at: https://kernel.sh/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Control

> Drive the browser with computer use, playwright execution, CDP, or WebDriver BiDi

Kernel browsers expose four ways to drive a session. For agents, we recommend [computer use](/browsers/computer-controls) or [playwright execution](/browsers/playwright-execution) — both run co-located with the browser and avoid the bot-detection surface a direct CDP connection introduces.

<Tabs>
  <Tab title="Computer Use">
    Kernel's [Computer Controls](/browsers/computer-controls) API exposes OS-level mouse, keyboard, and screen primitives — the surface a computer-use model already knows how to drive (screenshot, click, type, key, scroll, drag). No CDP or WebDriver connection required, so there's no protocol fingerprint to leak. Ideal for [Claude](/integrations/computer-use/anthropic), [OpenAI](/integrations/computer-use/openai), or [Gemini](/integrations/computer-use/gemini) computer-use loops.

    <CodeGroup>
      ```typescript Typescript/Javascript theme={null}
      import Kernel from '@onkernel/sdk';

      const kernel = new Kernel();
      const kernelBrowser = await kernel.browsers.create();

      const screenshot = await kernel.browsers.computer.captureScreenshot(kernelBrowser.session_id);

      await kernel.browsers.computer.clickMouse(kernelBrowser.session_id, {
        x: 420,
        y: 280,
      });

      await kernel.browsers.computer.typeText(kernelBrowser.session_id, {
        text: 'kernel cloud browsers',
      });
      ```

      ```python Python theme={null}
      from kernel import Kernel

      kernel = Kernel()
      kernel_browser = kernel.browsers.create()

      screenshot = kernel.browsers.computer.capture_screenshot(id=kernel_browser.session_id)

      kernel.browsers.computer.click_mouse(
          id=kernel_browser.session_id,
          x=420,
          y=280,
      )

      kernel.browsers.computer.type_text(
          id=kernel_browser.session_id,
          text="kernel cloud browsers",
      )
      ```

      ```go Go theme={null}
      package main

      import (
      "context"

      "github.com/kernel/kernel-go-sdk"
      )

      func main() {
      ctx := context.Background()
      client := kernel.NewClient()

      kernelBrowser, err := client.Browsers.New(ctx, kernel.BrowserNewParams{})
      if err != nil {
      	panic(err)
      }

      screenshot, err := client.Browsers.Computer.CaptureScreenshot(
      	ctx,
      	kernelBrowser.SessionID,
      	kernel.BrowserComputerCaptureScreenshotParams{},
      )
      if err != nil {
      	panic(err)
      }
      defer screenshot.Body.Close()

      if err := client.Browsers.Computer.ClickMouse(
      	ctx,
      	kernelBrowser.SessionID,
      	kernel.BrowserComputerClickMouseParams{
      		X: 420,
      		Y: 280,
      	},
      ); err != nil {
      	panic(err)
      }

      if err := client.Browsers.Computer.TypeText(
      	ctx,
      	kernelBrowser.SessionID,
      	kernel.BrowserComputerTypeTextParams{
      		Text: "kernel cloud browsers",
      	},
      ); err != nil {
      	panic(err)
      }
      }
      ```
    </CodeGroup>
  </Tab>

  <Tab title="Playwright Execution">
    Run any Playwright code from anywhere — no local Playwright install, no Chromium download, no CDP connection to manage. Your code executes inside the browser's VM with the full Playwright API in scope and returns structured data back to your agent. Ships with [Patchright](/browsers/bot-detection/stealth) by default.

    <CodeGroup>
      ```typescript Typescript/Javascript theme={null}
      const response = await kernel.browsers.playwright.execute(
        kernelBrowser.session_id,
        {
          code: `
            await page.goto('https://example.com');
            return await page.title();
          `,
        },
      );

      console.log(response.result);
      ```

      ```python Python theme={null}
      response = kernel.browsers.playwright.execute(
          id=kernel_browser.session_id,
          code="""
            await page.goto('https://example.com')
            return await page.title()
          """,
      )

      print(response.result)
      ```

      ```go Go theme={null}
      response, err := client.Browsers.Playwright.Execute(
      ctx,
      kernelBrowser.SessionID,
      kernel.BrowserPlaywrightExecuteParams{
      	Code: `
            await page.goto('https://example.com');
            return await page.title();
          `,
      },
      )
      if err != nil {
      panic(err)
      }

      fmt.Println(response.Result)
      ```
    </CodeGroup>
  </Tab>

  <Tab title="CDP">
    Chrome DevTools Protocol — the wire format Playwright, Puppeteer, and most browser frameworks speak. Use `cdp_ws_url` from the created browser session for deterministic, scripted automation driven from your own infra.

    <CodeGroup>
      ```typescript Typescript/Javascript theme={null}
      import { chromium } from 'playwright';

      const browser = await chromium.connectOverCDP(kernelBrowser.cdp_ws_url);
      const context = browser.contexts()[0];
      const page = context.pages()[0];

      await page.goto('https://example.com');
      const title = await page.title();
      console.log(title);
      ```

      ```python Python theme={null}
      from playwright.async_api import async_playwright

      async with async_playwright() as playwright:
          browser = await playwright.chromium.connect_over_cdp(kernel_browser.cdp_ws_url)
          context = browser.contexts[0]
          page = context.pages[0]

          await page.goto('https://example.com')
          title = await page.title()
          print(title)
      ```
    </CodeGroup>
  </Tab>

  <Tab title="WebDriver BiDi">
    W3C-standard browser control. Use `webdriver_ws_url` with [Vibium](/integrations/vibium) or any other BiDi client.

    <CodeGroup>
      ```typescript Typescript/Javascript theme={null}
      import { browser } from 'vibium';

      const bro = await browser.start(kernelBrowser.webdriver_ws_url);
      const page = await bro.page();

      await page.goto('https://example.com');
      const title = await page.title();
      console.log(title);
      ```

      ```python Python theme={null}
      from vibium.sync_api import browser

      bro = browser.start(kernel_browser.webdriver_ws_url)
      page = bro.page()

      page.goto('https://example.com')
      title = page.title()
      print(title)
      ```
    </CodeGroup>
  </Tab>
</Tabs>

## Why computer use for agents

Kernel's computer controls are built to match how computer-use models were trained — the same primitives the model emits (screenshot, click at coords, type, key, scroll, drag) map 1:1 onto the API. There's no harness translating model output into framework calls.

* **Native fit.** Screenshot, click, type, key, scroll, drag — the primitives the model already speaks.
* **Faster screenshots.** Captures bypass CDP, which removes the largest source of latency in a vision loop.
* **Better against bot detection.** No CDP connection means no CDP fingerprint to leak. Pairs naturally with [stealth mode](/browsers/bot-detection/stealth) and [residential proxies](/proxies/residential).
* **Human-like input.** OS-level events with Bézier-curve mouse paths, variable typing speed, and configurable mistype rate.
* **Not DOM-limited.** Screenshots capture the full VM, so the agent can see and interact with native dialogs, canvas elements, iframes, and PDFs — not just things you can address with a selector.

## Why playwright execution over a direct CDP connection

If you're reaching for Playwright, prefer the execution API over `connectOverCDP`. Same Playwright API you already know, none of the setup.

* **Run from anywhere.** No `playwright` package to version-pin, no Chromium download, no CDP connection to manage. Send the code, get the result.
* **Co-located with the browser.** Code runs in the same VM as the browser — no network hop between your script and the page, fewer flakes.
* **Patchright by default.** Hardened against bot detection out of the box.
* **Full Playwright API.** `page`, `context`, and `browser` are all in scope. Anything Playwright can do — DOM queries, file uploads, full-page screenshots — works here.
* **Returns values.** `return` from your code and the result comes back in the response. Easy to use as an agent tool.

## Computer use + playwright execution

Computer controls drive the browser the way a person would — they don't speak the programmatic API surface. Anything you'd reach for the DOM or Playwright client for (reading text and attributes, `page.goto`, file uploads, cookie or storage access, switching tabs) belongs on the [playwright execution](/browsers/playwright-execution) side. The recommended pattern for agents is computer controls for interaction, playwright execution as a tool the agent can call when it needs structured data or a programmatic action.

<CodeGroup>
  ```typescript Typescript/Javascript theme={null}
  const response = await kernel.browsers.playwright.execute(
    kernelBrowser.session_id,
    {
      code: `
        const rows = await page.$$eval('table tr', (trs) =>
          trs.map((tr) => Array.from(tr.querySelectorAll('td')).map((td) => td.textContent))
        );
        return rows;
      `,
    },
  );

  console.log(response.result);
  ```

  ```python Python theme={null}
  response = kernel.browsers.playwright.execute(
      id=kernel_browser.session_id,
      code="""
        const rows = await page.$$eval('table tr', (trs) =>
          trs.map((tr) => Array.from(tr.querySelectorAll('td')).map((td) => td.textContent))
        );
        return rows;
      """,
  )

  print(response.result)
  ```

  ```go Go theme={null}
  response, err := client.Browsers.Playwright.Execute(
  	ctx,
  	kernelBrowser.SessionID,
  	kernel.BrowserPlaywrightExecuteParams{
  		Code: `
  			const rows = await page.$$eval('table tr', (trs) =>
  				trs.map((tr) => Array.from(tr.querySelectorAll('td')).map((td) => td.textContent))
  			);
  			return rows;
  		`,
  	},
  )
  if err != nil {
  	panic(err)
  }

  fmt.Println(response.Result)
  ```
</CodeGroup>

## Going deeper

* [Computer Controls reference](/browsers/computer-controls) — every mouse, keyboard, and screen primitive.
* [Playwright Execution reference](/browsers/playwright-execution) — the full execution surface, return values, and timeouts.
* [Computer use integrations](/integrations/computer-use/anthropic) — drop-in examples for Anthropic, Gemini, OpenAI, and more.
