open source

virtual desktops
for AI agents

Real desktops with a real browser and real desktop applications. Full isolation, full automation, MCP-native. Give Claude, Cursor, and Copilot their own screen.

get started free explore features product hunt
view source on github  ·  agpl-3.0  ·  self-hosted
screenbox — live orchestration

An AI agent creates a desktop, opens Chromium, navigates websites, takes screenshots, and extracts data — all through MCP tools.

knowledge compilation

agents that learn between sessions

Agents lose everything they learned when a session ends. Screenbox fixes this. Action logs from past sessions are compiled into declarative knowledge facts and auto-injected into future interactions. Your agent gets smarter every time it works.

session logs
action history from past work
compile
LLM extracts declarative facts
merge
preview diff, then apply
auto-inject
knowledge flows into every screenshot and look response
capabilities

21 MCP tools across 100+ actions

mcp-native
Connect from any MCP client in one line of config. Claude Desktop, Claude Code, Cursor, or anything that speaks MCP. Supports stdio, streamable HTTP, and SSE transports.
real chromium
Not headless. Not Playwright. A real Chromium browser with DevTools and extensions. Plus real desktop apps — file managers, terminals, office tools. Your agent works in a full desktop environment, not a browser sandbox.
semantic page map
Structured DOM map with element coordinates. Headings, links, forms, buttons — all with viewport positions. Faster and cheaper than vision-only agents. No AI model needed to find a button.
docker isolation
Every desktop is a throwaway container. Memory-limited, network-isolated, no bind mounts. When the task is done, the sandbox disappears. Nothing leaks to the host.
full automation
Screenshot, OCR, click, type, key combos, shell, file I/O, clipboard. 8 core tools + 4 dispatchers covering 100+ actions. Batch multiple operations in a single call to reduce round-trips.
multi-agent
Agent registration, API keys, desktop ownership. Each agent sees only its own desktops. Admin sees everything. Built for teams running multiple agents in parallel.
human oversight
Watch agents work live via RDP or noVNC. Take mouse and keyboard control at any moment. Help the agent, then release. Share desktop links with external viewers.
snapshot/restore
Save full desktop state — files, browser sessions, everything. Restore later. Clone desktops from templates. Perfect for repeatable workflows and environment provisioning.
dashboard
Web UI for managing everything. Live desktop view, agent management, snapshot controls, knowledge browser, session logs. Pure frontend — all operations proxy through the MCP API.
auto-management
Idle desktops pause automatically. Acquired desktops release after configurable TTL. Resources freed when not in use. No babysitting required.
cross-platform
Linux, macOS, Windows. Native Docker on Linux. Docker Desktop on macOS. WSL2 on Windows. ~2 GB RAM per desktop, no GPU needed.
quick start

up and running in 60 seconds

bash — screenbox setup
$ git clone https://github.com/dklymentiev/screenbox
$ cd screenbox
$ ./setup.sh
# generates .env, builds images

$ docker compose up -d
# MCP endpoint ready on :8080
# your agent now has its own desktop
architecture

how it fits together

view architecture diagram
  Agent (Claude / Cursor / Copilot)
       |
       | MCP Protocol (Streamable HTTP / stdio)
       v
  +--------------------+
  |  Screenbox MCP     |   Tools: screenshot, look, click,
  |  Server (:8080)    |   type, key, shell, chrome, file...
  +--------------------+
       |
       | Docker exec
       v
  +----------------+  +----------------+
  | Desktop 1      |  | Desktop 2      |  ...
  | XFCE + Chrome  |  | XFCE + Chrome  |
  | Xvnc :99       |  | Xvnc :99       |
  | noVNC / RDP    |  | noVNC / RDP    |
  +----------------+  +----------------+