How the airgap works
The security model behind Code-Aware AI Triage — why the network boundary lives outside the agent, how to airgap a GitLab runner, and why an attacker-submitted report still can't exfiltrate your code.
Why It Matters
Code-Aware AI Triage feeds an AI agent two things at once: your source code and an attacker-controlled report body. That’s a prompt-injection target. A malicious researcher can write a report whose text tries to hijack the agent into reading secrets and curling them to an external host.
The defense is not “ask the agent nicely not to.” The defense is that even a fully-hijacked agent has nowhere to send data — because the network boundary is enforced outside the agent, at the environment layer. This page explains that boundary and how to set it up.
Danger
Treat every report body as hostile. The triage prompt wraps report fields in delimiters and tells the model never to execute instructions found inside them — but delimiter hygiene is defense-in-depth, not the boundary. The boundary is the egress deny-all described below.
The Core Principle: Containment at the Environment Layer
The single most important design decision: the agent’s own network allowlist is never the security boundary. Anthropic’s containment guidance for agentic systems is explicit — design for containment at the environment layer first, and treat the agent’s self-imposed restrictions as defense-in-depth only.
There are concrete reasons an agent’s own allowlist can’t be trusted as the boundary:
- Allowlists get bypassed. Claude Code has shipped patches for real allowlist-bypass vulnerabilities — for example a misread-allowlist class of bug, and a SOCKS5 parser differential (a null-byte trick) that defeated any wildcard allowlist. Pin recent versions, but don’t rely on the allowlist as your wall.
- Allowlisted domains have exfil-capable sub-APIs. If you allowlist your LLM provider’s domain, that same domain may expose a files/upload or storage API — a perfectly good exfiltration channel that your “allow the model endpoint” rule happily permits.
The deterministic “cannot exfiltrate” guarantee therefore has to live outside the agent, in the network and runner configuration.
The Boundary: Deny-All Egress + a Two-Destination Proxy
The airgap is built from four environment-layer controls. The example repo demonstrates all of them in its .gitlab-ci.yml and infra/ directory.
1. A Self-Managed GitLab Runner
You must run a self-managed runner. GitLab.com SaaS shared runners can’t be host-network-isolated by you, so you can’t enforce the egress boundary on them. The README of the example repo states this as the airgap prerequisite.
2. A Non-Privileged Docker Executor
| Setting | Value | Why |
|---|---|---|
privileged |
false |
Privileged mode is effectively host root. Mandatory for an untrusted-report path. |
| Run as | non-root | Drop SETUID/SETGID; no capabilities you don’t need. |
| Docker socket | not mounted | A mounted socket is a host-takeover path. |
| Host volumes | none | No bind mounts into the job. |
FF_NETWORK_PER_BUILD |
1 |
GitLab creates a dedicated bridge network per job, torn down when the job ends — so the job network is isolated and reproducible. |
# config.toml on your self-managed runner
[[runners]]
executor = "docker"
environment = ["FF_NETWORK_PER_BUILD=1"]
[runners.docker]
privileged = false
3. Host-Level Deny-All Egress
Warning
GitLab Runner has no native egress allowlist. There is no built-in “only let jobs reach these hosts” setting — a proposed feature exists but is not shipped. The egress boundary must be enforced at the host or namespace layer.
Apply a default-DROP egress policy on the job network with iptables/nftables, or run the job in a network namespace with no route to your intranet. Everything outbound is denied by default; you then open exactly the two destinations below — and nothing else, including your own internal network.
4. A TLS-Terminating, Token-Validating Egress Proxy
The only permitted route out is a proxy that allows exactly two destinations:
- Your LLM inference endpoint (e.g.
https://api.deepseek.com/anthropic). - Kit’s scoped MCP host (to read the report and post the triage back).
The proxy does more than pin those two hosts. It:
- Restricts to the inference and MCP paths only — so the LLM provider’s files/storage sub-APIs are blocked even though the host is allowed.
- Validates the per-run session token — an attacker who smuggles their own API key into the agent can’t use it, because the proxy only honors the token minted for this run.
This mirrors the in-VM MITM-proxy pattern Anthropic uses for its own agent containment. A harden-runner-style egress filter is a reasonable conceptual analog for the reference implementation.
Masking Is Hygiene, Not Protection
You’ll pass the scoped MCP token and your model key as masked / protected CI variables. Do it — it keeps secrets out of job logs. But understand the limit:
Important
Masking only keeps a value from appearing in logs ([MASKED]). A compromised job can still read the variable’s value at runtime. GitLab’s own docs say masking “is not a foolproof security measure.” What stops a leaked token from leaving the job is the egress deny-all, not the mask.
That’s why the scoped token model matters: even if a token is read by a hijacked job, it’s scoped to one report for one hour and is single-use on the write side, and it can’t be sent anywhere because egress is denied.
Prove It: The Injection Canary
The example repo ships a fixture report (examples/sample-report) containing an injection canary — report text that tries to curl an external host. Run the pipeline against it and confirm the egress boundary blocks the call. If the canary’s request succeeds, your airgap isn’t airgapped — fix the host firewall before connecting to Kit.
A Note on Your Model Choice
You control the model, which means you also own its risk. Feeding your source and untrusted reports to a hosted endpoint (DeepSeek’s API by default) is a data-residency and trust decision that is yours to make — that’s the entire point of “your AI, your network.” Where prudent, point the agent at a self-hosted endpoint (local vLLM, an Ollama-Anthropic proxy) so inference never leaves your perimeter. The SECURITY.md in the example repo calls this out.
Quick Checklist
- Self-managed GitLab Runner (not SaaS shared runners)
-
Docker executor with
privileged = false, non-root, no Docker socket, no host volumes -
FF_NETWORK_PER_BUILD=1for per-job network isolation - Host-level default-DROP egress (iptables/nftables or a network namespace)
- Egress proxy allowing exactly two hosts + paths: your LLM endpoint and Kit’s MCP host
- Proxy validates the per-run token and blocks the LLM provider’s storage sub-APIs
- Scoped token treated as one-report / one-hour / single-use — masking is hygiene only
- Run the injection-canary fixture and confirm the exfil attempt is blocked
- Pin a current Claude Code / sandbox-runtime version (defense-in-depth, not the boundary)
Next Steps
- Set up the airgapped triage agent — the guided stepper that connects your fork
- Customizing the triage prompt and model — model swap, prompt hygiene, output schema
- Code-Aware AI Triage — what the triage produces and how it surfaces