## Why It Matters

Code-Aware AI Triage feeds an AI agent two things at once: your **source code** and an **attacker-controlled report body**. That's a prompt-injection target. A malicious researcher can write a report whose text tries to hijack the agent into reading secrets and curling them to an external host.

The defense is not "ask the agent nicely not to." The defense is that **even a fully-hijacked agent has nowhere to send data** — because the network boundary is enforced *outside* the agent, at the environment layer. This page explains that boundary and how to set it up.

> [!CAUTION]
> Treat every report body as hostile. The triage prompt wraps report fields in delimiters and tells the model never to execute instructions found inside them — but delimiter hygiene is defense-in-depth, **not** the boundary. The boundary is the egress deny-all described below.

## The Core Principle: Containment at the Environment Layer

The single most important design decision: **the agent's own network allowlist is never the security boundary.** Anthropic's containment guidance for agentic systems is explicit — design for containment at the environment layer first, and treat the agent's self-imposed restrictions as defense-in-depth only.

There are concrete reasons an agent's own allowlist can't be trusted as the boundary:

- **Allowlists get bypassed.** Claude Code has shipped patches for real allowlist-bypass vulnerabilities — for example a misread-allowlist class of bug, and a SOCKS5 parser differential (a null-byte trick) that defeated *any* wildcard allowlist. Pin recent versions, but don't rely on the allowlist as your wall.
- **Allowlisted domains have exfil-capable sub-APIs.** If you allowlist your LLM provider's domain, that same domain may expose a files/upload or storage API — a perfectly good exfiltration channel that your "allow the model endpoint" rule happily permits.

The deterministic "cannot exfiltrate" guarantee therefore has to live **outside** the agent, in the network and runner configuration.

## The Boundary: Deny-All Egress + a Two-Destination Proxy

The airgap is built from four environment-layer controls. The example repo demonstrates all of them in its `.gitlab-ci.yml` and `infra/` directory.

### 1. A Self-Managed GitLab Runner

You **must** run a self-managed runner. GitLab.com SaaS shared runners can't be host-network-isolated by you, so you can't enforce the egress boundary on them. The README of the example repo states this as the airgap prerequisite.

### 2. A Non-Privileged Docker Executor

| Setting | Value | Why |
|---------|-------|-----|
| `privileged` | `false` | Privileged mode is effectively host root. Mandatory for an untrusted-report path. |
| Run as | non-root | Drop `SETUID`/`SETGID`; no capabilities you don't need. |
| Docker socket | not mounted | A mounted socket is a host-takeover path. |
| Host volumes | none | No bind mounts into the job. |
| `FF_NETWORK_PER_BUILD` | `1` | GitLab creates a **dedicated bridge network per job**, torn down when the job ends — so the job network is isolated and reproducible. |

```toml
# config.toml on your self-managed runner
[[runners]]
  executor = "docker"
  environment = ["FF_NETWORK_PER_BUILD=1"]
  [runners.docker]
    privileged = false
```

### 3. Host-Level Deny-All Egress

> [!WARNING]
> **GitLab Runner has no native egress allowlist.** There is no built-in "only let jobs reach these hosts" setting — a proposed feature exists but is not shipped. The egress boundary **must** be enforced at the host or namespace layer.

Apply a default-`DROP` egress policy on the job network with `iptables`/`nftables`, or run the job in a network namespace with no route to your intranet. Everything outbound is denied by default; you then open exactly the two destinations below — and nothing else, including your own internal network.

### 4. A TLS-Terminating, Token-Validating Egress Proxy

The only permitted route out is a proxy that allows exactly **two** destinations:

1. Your **LLM inference endpoint** (e.g. `https://api.deepseek.com/anthropic`).
2. **Kit's scoped MCP host** (to read the report and post the triage back).

The proxy does more than pin those two hosts. It:

- **Restricts to the inference and MCP paths only** — so the LLM provider's files/storage sub-APIs are blocked even though the host is allowed.
- **Validates the per-run session token** — an attacker who smuggles their own API key into the agent can't use it, because the proxy only honors the token minted for this run.

This mirrors the in-VM MITM-proxy pattern Anthropic uses for its own agent containment. A `harden-runner`-style egress filter is a reasonable conceptual analog for the reference implementation.

## Masking Is Hygiene, Not Protection

You'll pass the scoped MCP token and your model key as **masked / protected** CI variables. Do it — it keeps secrets out of job logs. But understand the limit:

> [!IMPORTANT]
> Masking only keeps a value from appearing in logs (`[MASKED]`). A compromised job **can still read the variable's value** at runtime. GitLab's own docs say masking "is not a foolproof security measure." What stops a leaked token from leaving the job is the **egress deny-all**, not the mask.

That's why the scoped token model matters: even if a token is read by a hijacked job, it's scoped to **one report for one hour** and is single-use on the write side, and it can't be sent anywhere because egress is denied.

## Prove It: The Injection Canary

The example repo ships a fixture report (`examples/sample-report`) containing an **injection canary** — report text that tries to curl an external host. Run the pipeline against it and confirm the egress boundary **blocks the call**. If the canary's request succeeds, your airgap isn't airgapped — fix the host firewall before connecting to Kit.

## A Note on Your Model Choice

You control the model, which means you also own its risk. Feeding your source and untrusted reports to a hosted endpoint (DeepSeek's API by default) is a **data-residency and trust decision that is yours to make** — that's the entire point of "your AI, your network." Where prudent, point the agent at a **self-hosted** endpoint (local vLLM, an Ollama-Anthropic proxy) so inference never leaves your perimeter. The `SECURITY.md` in the example repo calls this out.

## Quick Checklist

- [ ] Self-managed GitLab Runner (not SaaS shared runners)
- [ ] Docker executor with `privileged = false`, non-root, no Docker socket, no host volumes
- [ ] `FF_NETWORK_PER_BUILD=1` for per-job network isolation
- [ ] Host-level **default-DROP** egress (iptables/nftables or a network namespace)
- [ ] Egress proxy allowing exactly two hosts + paths: your LLM endpoint and Kit's MCP host
- [ ] Proxy validates the per-run token and blocks the LLM provider's storage sub-APIs
- [ ] Scoped token treated as one-report / one-hour / single-use — masking is hygiene only
- [ ] Run the **injection-canary** fixture and confirm the exfil attempt is blocked
- [ ] Pin a current Claude Code / sandbox-runtime version (defense-in-depth, not the boundary)

## Next Steps

- [Set up the airgapped triage agent](/docs/set-up-triage-agent) — the guided stepper that connects your fork
- [Customizing the triage prompt and model](/docs/customizing-triage-agent) — model swap, prompt hygiene, output schema
- [Code-Aware AI Triage](/docs/code-aware-ai-triage) — what the triage produces and how it surfaces