open source

virtual desktops
for AI agents

Real desktops with a real browser and real desktop applications. Full isolation, full automation, MCP-native. Give Claude, Cursor, and Copilot their own screen.

get started free explore features product hunt

view source on github · agpl-3.0 · self-hosted

screenbox — live orchestration

An AI agent creates a desktop, opens Chromium, navigates websites, takes screenshots, and extracts data — all through MCP tools.

knowledge compilation

agents that learn between sessions

Agents lose everything they learned when a session ends. Screenbox fixes this. Action logs from past sessions are compiled into declarative knowledge facts and auto-injected into future interactions. Your agent gets smarter every time it works.

session logs

action history from past work

↓

compile

LLM extracts declarative facts

↓

merge

preview diff, then apply

↓

auto-inject

knowledge flows into every screenshot and look response

capabilities

21 MCP tools across 100+ actions

mcp-native

Connect from any MCP client in one line of config. Claude Desktop, Claude Code, Cursor, or anything that speaks MCP. Supports stdio, streamable HTTP, and SSE transports.

real chromium

Not headless. Not Playwright. A real Chromium browser with DevTools and extensions. Plus real desktop apps — file managers, terminals, office tools. Your agent works in a full desktop environment, not a browser sandbox.

semantic page map

Structured DOM map with element coordinates. Headings, links, forms, buttons — all with viewport positions. Faster and cheaper than vision-only agents. No AI model needed to find a button.

docker isolation

Every desktop is a throwaway container. Memory-limited, network-isolated, no bind mounts. When the task is done, the sandbox disappears. Nothing leaks to the host.

full automation

Screenshot, OCR, click, type, key combos, shell, file I/O, clipboard. 8 core tools + 4 dispatchers covering 100+ actions. Batch multiple operations in a single call to reduce round-trips.

multi-agent

Agent registration, API keys, desktop ownership. Each agent sees only its own desktops. Admin sees everything. Built for teams running multiple agents in parallel.

human oversight

Watch agents work live via RDP or noVNC. Take mouse and keyboard control at any moment. Help the agent, then release. Share desktop links with external viewers.

snapshot/restore

Save full desktop state — files, browser sessions, everything. Restore later. Clone desktops from templates. Perfect for repeatable workflows and environment provisioning.

dashboard

Web UI for managing everything. Live desktop view, agent management, snapshot controls, knowledge browser, session logs. Pure frontend — all operations proxy through the MCP API.

auto-management

Idle desktops pause automatically. Acquired desktops release after configurable TTL. Resources freed when not in use. No babysitting required.

cross-platform

Linux, macOS, Windows. Native Docker on Linux. Docker Desktop on macOS. WSL2 on Windows. ~2 GB RAM per desktop, no GPU needed.

quick start

up and running in 60 seconds

bash — screenbox setup

$ git clone https://github.com/dklymentiev/screenbox
$ cd screenbox
$ ./setup.sh
# generates .env, builds images

$ docker compose up -d
# MCP endpoint ready on :8080
# your agent now has its own desktop

architecture

how it fits together

view architecture diagram

  Agent (Claude / Cursor / Copilot)
       |
       | MCP Protocol (Streamable HTTP / stdio)
       v
  +--------------------+
  |  Screenbox MCP     |   Tools: screenshot, look, click,
  |  Server (:8080)    |   type, key, shell, chrome, file...
  +--------------------+
       |
       | Docker exec
       v
  +----------------+  +----------------+
  | Desktop 1      |  | Desktop 2      |  ...
  | XFCE + Chrome  |  | XFCE + Chrome  |
  | Xvnc :99       |  | Xvnc :99       |
  | noVNC / RDP    |  | noVNC / RDP    |
  +----------------+  +----------------+

virtual desktopsfor AI agents

agents that learn between sessions

21 MCP tools across 100+ actions

up and running in 60 seconds

how it fits together

virtual desktops
for AI agents