What is a dev environment for AI coding agents?

A dev environment for AI coding agents is the running system an agent builds against: the language runtime, the database with data in it, a URL the result can be loaded from, and the services — email, queues, HTTPS — the application expects in production. It is distinct from agent orchestration, which decides which agent works on what. Most tools in this space orchestrate; very few provide the environment.

That distinction matters because of where agents actually fail. Claude Code, OpenAI Codex, Cursor and Google Antigravity are all competent at writing code. What they routinely cannot do is verify the code: run the migration, hit the endpoint, load the page, confirm the email got sent. When an agent reports “done” and the feature doesn't work, the gap is almost never the model — it's that the agent was working in a checkout, not an application.

Why git worktrees alone are not enough

The dominant pattern in agent orchestration today is the git worktree: give each agent its own isolated copy of the repository on its own branch, so parallel agents can't clobber each other. Vibe Kanban, Conductor, Claude Squad, Superset and Kanbotsare all built on this model, and for what it solves — merge conflicts between parallel agents — it's a good model.

But a worktree is a filesystem trick, not an environment. A fresh worktree has no database, no running server, no HTTPS certificate, no domain, no SMTP catcher. The agent can write a Laravel migration in a worktree; it cannot run it without a database. It can build a checkout page; it cannot load it, screenshot it, or click through it without a server. It can wire up a password-reset email; it cannot confirm the email renders without somewhere for the mail to land.

Orchestrators know this, which is why they bolt on escape hatches: Superset has setup/teardown scripts per workspace, Kanbots' branch preview will start your dev server if you have one configured. In every case the runtime, the database, the certificates and the domains remain your problem. The worktree isolates the code; nothing provisions the world around it.

What an agent actually needs from its environment

Working backwards from the verify step, a coding agent needs roughly what a new human teammate needs on day one:

A managed runtime — the right PHP or Node version for this project, pinned, without assuming the agent can (or should) run brew install on your machine.
A real database— MySQL or PostgreSQL with the project's schema and seed data, with credentials already in .env, so migrations and queries actually execute.
A loadable URL with trusted HTTPS — not localhost:3000 with a self-signed warning, but a real local domain like yourapp.test with a certificate the browser trusts, so redirects, cookies, OAuth callbacks and service workers behave the way they will in production.
Email capture— an inbox like Mailpit that catches outgoing mail locally, so “send the welcome email” is checkable.
A way to show its work — a public tunnel for a shareable preview, or at minimum a URL a human can open before merging.
A task surface with guardrails— somewhere the work is assigned, claimed and reported, with a lease so two agents can't grab the same task.

Cloud platforms solve this with disposable sandboxes — Google Antigravity provisions remote Linux environments through the Gemini API, and cloud agents like Codex run in containers. That works, with trade-offs: your code executes on someone else's machines, the sandbox doesn't match your local stack, and you pay per minute. The local equivalent — same guarantees, your Mac, no metering — is the gap most orchestrators leave open. (PortBay vs Antigravity covers the local/cloud split in detail.)

Worktree, sandbox, or local environment: the three approaches

Every tool in this space takes one of three approaches to giving an agent somewhere to work. A worktree-only orchestrator isolates the code and nothing else. A cloud AI agent sandboxprovisions a disposable remote machine — Warp has started calling the category the “Agentic Development Environment,” and its Oz platform, Google Antigravity and cloud Codex all run this model. A local dev environment for AI agents provisions the same things a sandbox does — runtime, database, server — but on your machine, around your real project. The trade-offs are structural, not a matter of polish:

	Worktree-only	Cloud agent sandbox	Local agent environment
Where code runs	Your machine, per-task checkout	Provider's cloud, per-task container	Your machine, the real project
What the agent gets	Isolated branch; no runtime, DB or server	Disposable Linux env, preconfigured	Running app: runtime, DB, HTTPS, mail
Matches your local stack	Only if you wire it up per task	Rarely — it's a generic image	It is your local stack
Privacy & cost	Local and free	Code leaves your machine; metered	Local and free
Examples	Vibe Kanban, Claude Squad, Superset, Kanbots, Emdash	Antigravity, cloud Codex, Warp Oz	PortBay
Best for	Parallel throughput on a running stack	Team scale, agent infra as a service	Verification-heavy work, privacy

The phrase “AI agent sandbox” usually implies the cloud column, but the sandbox properties developers actually want — isolation, a real runtime, disposability — don't require someone else's servers. A local environment with per-project databases and provisioned runtimes is an agent sandbox that happens to live on your Mac, match your production stack, and cost nothing per minute.

The current landscape: orchestrators vs environments

Tool	Task surface	Isolation	Provisions the environment?
Vibe Kanban	Kanban board	Git worktrees	No — bring your own stack
Conductor	Workspace list	Git worktrees	No
Claude Squad	Terminal sessions	tmux + worktrees	No
Superset	Branch sidebar	Worktrees + port ranges	No — your setup scripts
Kanbots	Kanban board	Worktree per run	No — runs your dev script
Emdash	Task list + diff view	Git worktrees	No — bring your own stack
Antigravity	Agent Manager	Workspaces	Cloud sandboxes, not local
PortBay	Kanban board	One agent per card, leased	Yes — runtime, DB, HTTPS, tunnels

None of the worktree managers are wrong — they optimize for parallel throughput, and if you run six agents at once on a project whose stack is already humming, they earn their place. The point is that orchestration and environment are different layers, and only one of them makes the agent's output verifiable.

A local environment for Claude Code, concretely

Here is what the environment layer looks like in practice, using Claude Code and PortBay as the example (the flow is identical for Codex, Cursor or Antigravity):

Add the project folder and press play. PortBay detects the framework, pins the right PHP or Node runtime, issues a trusted mkcert certificate and serves the app at yourapp.test over HTTPS.
Create the database in one click — MySQL or PostgreSQL, per project — and the connection string lands in .env.
Put the task on the board: a card with the description, assigned to Claude Code. Move it to Todo.
PortBay dispatches the agent inside the running project. It can run the migration against a real database, load https://yourapp.test/checkout, and check the confirmation email in the local inbox.
The agent comments on the card describing what changed and moves it to Done. If you want eyes on it, open a one-click Cloudflare tunnel and share the live URL.

The same loop without the environment layer is: write code in a worktree, hope it runs, find out after merge. The difference is not the agent — it's what the agent could touch while it worked.

How to choose

If your bottleneck is parallel throughput— many simple tasks, a stack that's already running, review capacity to burn — a worktree orchestrator is the right tool, and the comparisons linked above lay out which one fits. If your bottleneck is verification— agents that finish tasks which then don't survive contact with a browser — the environment is the missing layer, and it's the layer PortBay was built to provide: a free, open-source macOS app that is both the local dev environment and the task board your agents work from.

The two approaches also compose. Nothing stops you running a worktree manager for wide parallel sweeps and dispatching the verification-heavy cards — the ones that need a database, a real URL and an inbox — from a board that owns the environment.