A dev environment for AI coding agents is the running system an agent builds against: the language runtime, the database with data in it, a URL the result can be loaded from, and the services — email, queues, HTTPS — the application expects in production. It is distinct from agent orchestration, which decides which agent works on what. Most tools in this space orchestrate; very few provide the environment.
That distinction matters because of where agents actually fail. Claude Code, OpenAI Codex, Cursor and Google Antigravity are all competent at writing code. What they routinely cannot do is verify the code: run the migration, hit the endpoint, load the page, confirm the email got sent. When an agent reports “done” and the feature doesn't work, the gap is almost never the model — it's that the agent was working in a checkout, not an application.
Why git worktrees alone are not enough
The dominant pattern in agent orchestration today is the git worktree: give each agent its own isolated copy of the repository on its own branch, so parallel agents can't clobber each other. Vibe Kanban, Conductor, Claude Squad, Superset and Kanbotsare all built on this model, and for what it solves — merge conflicts between parallel agents — it's a good model.
But a worktree is a filesystem trick, not an environment. A fresh worktree has no database, no running server, no HTTPS certificate, no domain, no SMTP catcher. The agent can write a Laravel migration in a worktree; it cannot run it without a database. It can build a checkout page; it cannot load it, screenshot it, or click through it without a server. It can wire up a password-reset email; it cannot confirm the email renders without somewhere for the mail to land.
Orchestrators know this, which is why they bolt on escape hatches: Superset has setup/teardown scripts per workspace, Kanbots' branch preview will start your dev server if you have one configured. In every case the runtime, the database, the certificates and the domains remain your problem. The worktree isolates the code; nothing provisions the world around it.
What an agent actually needs from its environment
Working backwards from the verify step, a coding agent needs roughly what a new human teammate needs on day one:
- A managed runtime — the right PHP or Node version for this project, pinned, without assuming the agent can (or should) run
brew installon your machine. - A real database— MySQL or PostgreSQL with the project's schema and seed data, with credentials already in
.env, so migrations and queries actually execute. - A loadable URL with trusted HTTPS — not
localhost:3000with a self-signed warning, but a real local domain likeyourapp.testwith a certificate the browser trusts, so redirects, cookies, OAuth callbacks and service workers behave the way they will in production. - Email capture— an inbox like Mailpit that catches outgoing mail locally, so “send the welcome email” is checkable.
- A way to show its work — a public tunnel for a shareable preview, or at minimum a URL a human can open before merging.
- A task surface with guardrails— somewhere the work is assigned, claimed and reported, with a lease so two agents can't grab the same task.
Cloud platforms solve this with disposable sandboxes — Google Antigravity provisions remote Linux environments through the Gemini API, and cloud agents like Codex run in containers. That works, with trade-offs: your code executes on someone else's machines, the sandbox doesn't match your local stack, and you pay per minute. The local equivalent — same guarantees, your Mac, no metering — is the gap most orchestrators leave open. (PortBay vs Antigravity covers the local/cloud split in detail.)
Worktree, sandbox, or local environment: the three approaches
Every tool in this space takes one of three approaches to giving an agent somewhere to work. A worktree-only orchestrator isolates the code and nothing else. A cloud AI agent sandboxprovisions a disposable remote machine — Warp has started calling the category the “Agentic Development Environment,” and its Oz platform, Google Antigravity and cloud Codex all run this model. A local dev environment for AI agents provisions the same things a sandbox does — runtime, database, server — but on your machine, around your real project. The trade-offs are structural, not a matter of polish:
| Worktree-only | Cloud agent sandbox | Local agent environment | |
|---|---|---|---|
| Where code runs | Your machine, per-task checkout | Provider's cloud, per-task container | Your machine, the real project |
| What the agent gets | Isolated branch; no runtime, DB or server | Disposable Linux env, preconfigured | Running app: runtime, DB, HTTPS, mail |
| Matches your local stack | Only if you wire it up per task | Rarely — it's a generic image | It is your local stack |
| Privacy & cost | Local and free | Code leaves your machine; metered | Local and free |
| Examples | Vibe Kanban, Claude Squad, Superset, Kanbots, Emdash | Antigravity, cloud Codex, Warp Oz | PortBay |
| Best for | Parallel throughput on a running stack | Team scale, agent infra as a service | Verification-heavy work, privacy |
The phrase “AI agent sandbox” usually implies the cloud column, but the sandbox properties developers actually want — isolation, a real runtime, disposability — don't require someone else's servers. A local environment with per-project databases and provisioned runtimes is an agent sandbox that happens to live on your Mac, match your production stack, and cost nothing per minute.
The current landscape: orchestrators vs environments
| Tool | Task surface | Isolation | Provisions the environment? |
|---|---|---|---|
| Vibe Kanban | Kanban board | Git worktrees | No — bring your own stack |
| Conductor | Workspace list | Git worktrees | No |
| Claude Squad | Terminal sessions | tmux + worktrees | No |
| Superset | Branch sidebar | Worktrees + port ranges | No — your setup scripts |
| Kanbots | Kanban board | Worktree per run | No — runs your dev script |
| Emdash | Task list + diff view | Git worktrees | No — bring your own stack |
| Antigravity | Agent Manager | Workspaces | Cloud sandboxes, not local |
| PortBay | Kanban board | One agent per card, leased | Yes — runtime, DB, HTTPS, tunnels |
None of the worktree managers are wrong — they optimize for parallel throughput, and if you run six agents at once on a project whose stack is already humming, they earn their place. The point is that orchestration and environment are different layers, and only one of them makes the agent's output verifiable.
A local environment for Claude Code, concretely
Here is what the environment layer looks like in practice, using Claude Code and PortBay as the example (the flow is identical for Codex, Cursor or Antigravity):
- Add the project folder and press play. PortBay detects the framework, pins the right PHP or Node runtime, issues a trusted mkcert certificate and serves the app at
yourapp.testover HTTPS. - Create the database in one click — MySQL or PostgreSQL, per project — and the connection string lands in
.env. - Put the task on the board: a card with the description, assigned to Claude Code. Move it to Todo.
- PortBay dispatches the agent inside the running project. It can run the migration against a real database, load
https://yourapp.test/checkout, and check the confirmation email in the local inbox. - The agent comments on the card describing what changed and moves it to Done. If you want eyes on it, open a one-click Cloudflare tunnel and share the live URL.
The same loop without the environment layer is: write code in a worktree, hope it runs, find out after merge. The difference is not the agent — it's what the agent could touch while it worked.
How to choose
If your bottleneck is parallel throughput— many simple tasks, a stack that's already running, review capacity to burn — a worktree orchestrator is the right tool, and the comparisons linked above lay out which one fits. If your bottleneck is verification— agents that finish tasks which then don't survive contact with a browser — the environment is the missing layer, and it's the layer PortBay was built to provide: a free, open-source macOS app that is both the local dev environment and the task board your agents work from.
The two approaches also compose. Nothing stops you running a worktree manager for wide parallel sweeps and dispatching the verification-heavy cards — the ones that need a database, a real URL and an inbox — from a board that owns the environment.
