To triage AI-generated bug bounty reports, run five levers in order: (1) AI-assisted first-pass filtering that auto-flags hallucinated functions and missing proof-of-concept before a human reads the report, (2) rate limiting at intake so one actor can't flood the queue, (3) reputation scoring that penalizes confirmed slop, (4) SLA tracking that protects scarce reviewer time, and (5) bounties that pay for impact, not submission count. The programs that died in 2026 didn't have a slop problem. They had a triage problem.

If you run a `security.txt` inbox, a vulnerability disclosure program (VDP), or a bug bounty, you already feel it. Submission volume is climbing and the valid-report rate is collapsing. Every plausible-but-fake report still costs a senior engineer 30 minutes to three hours to disprove. This is the operations-under-load problem: not "should we have a VDP" (you should, and we covered [how to stand one up](/blog/how-to-set-up-vulnerability-disclosure-program)), but "the queue is drowning, how do we keep it alive without burning out reviewers or losing the real researchers?"

## The 2026 casualties: curl, HackerOne, and Nextcloud

In the first half of 2026, three of the most credible names in vulnerability disclosure visibly buckled under AI-generated noise. These aren't fringe programs. They are the reference implementations of doing disclosure well, and they all hit the same wall: the cost of producing a fake report fell to near zero while the cost of disproving it stayed human and high.

### Why curl pulled the plug

curl shut down its HackerOne bug bounty effective **January 31, 2026**. Founder Daniel Stenberg had been documenting the decline for months. In his July 2025 post "Death by a thousand slops," he reported that **roughly 20% of all 2025 submissions were AI slop** and that **only about 5% of 2025 submissions turned out to be genuine vulnerabilities**, a steep drop from prior years (source: [daniel.haxx.se](https://daniel.haxx.se/blog/2025/07/14/death-by-a-thousand-slops/)).

The economics were brutal at the human level. Each report engaged **three to four of his seven-person security team** for **30 minutes to about three hours** of validation work, almost all of it volunteer time. The opening of 2026 made the decision for him: in the **first 21 days of 2026, curl received around 20 submissions and confirmed zero vulnerabilities**, including **seven HackerOne reports inside a single 16-hour window** (source: [daniel.haxx.se](https://daniel.haxx.se/blog/2026/01/26/the-end-of-the-curl-bug-bounty/); [BleepingComputer](https://www.bleepingcomputer.com/news/security/curl-ending-bug-bounty-program-after-flood-of-ai-slop-reports/)).

From February 1, 2026, curl routes reports to GitHub private reporting and `security@curl.se` with no monetary reward, explicitly, in Stenberg's words, to "remove the incentive for people to submit crap." Over its lifetime the program paid out for **87 confirmed vulnerabilities and over $100,000 since 2019**. The bounty worked for years. AI slop is what broke it.

### HackerOne's 76% surge and the Internet Bug Bounty pause

HackerOne made two distinct moves. First, its **Internet Bug Bounty (IBB)**, the pooled fund that pays for core open-source dependencies, **paused new submissions in late March 2026**, citing a shift in the balance between AI-assisted discovery and human remediation capacity (source: [Privacy Guides](https://www.privacyguides.org/news/2026/04/17/hackerone-pauses-internet-bug-bounty/); [InfoWorld](https://www.infoworld.com/article/4154210/internet-bug-bounty-program-hits-pause-on-payouts.html)).

Second, in April 2026, alongside launching a paid validation service, HackerOne disclosed the headline macro stat: **vulnerability submissions grew 76% year over year, hitting a record high in March 2026**, while **about 25% of findings were confirmed exploitable**, a rate that held roughly steady (source: [HackerOne press release](https://www.hackerone.com/press-release/hackerone-introduces-h1-validation-help-enterprises-manage-surge-ai-discovered)). Read that carefully: the absolute number of real bugs is still growing, but so is the noise around them, and the noise grows faster. By May 2026, the IBB **sharply cut reward amounts across every severity tier** (source: [The Register](https://www.theregister.com/security/2026/05/21/hackerone-takes-an-axe-to-its-bug-bounty-rewards/5244458)).

### This is ecosystem-wide

It would be easy to write off curl as an underfunded open-source project and HackerOne as one platform's growing pains. The pattern is broader than that.

| Program | What happened | When |
|---|---|---|
| **Nextcloud** | Ended payouts citing a "massive increase of low-quality reports"; kept intake open until they "figure out how to filter submissions properly" | April 2026 |
| **Google** | Halted accepting some AI-generated reports | March 2026 |
| **Cosmos Labs (crypto)** | Co-CEO reported a **900% year-over-year jump**, 20 to 50 submissions per day | 2026 |
| **Bugcrowd** | Submissions **more than quadrupled over a three-week period**, mostly false positives or low-quality AI findings | March 2026 |
| **Linux kernel** | Linus Torvalds called the security list "almost entirely unmanageable" under duplicate AI-assisted reports | 2026 |

Sources: [heise](https://www.heise.de/en/news/Due-to-AI-Bug-bounty-programs-without-rewards-now-also-at-Nextcloud-11271443.html); [Computing.co.uk](https://www.computing.co.uk/news/2026/security/bug-bounty-platforms-battle-ai-slop); [Cointelegraph via TradingView](https://www.tradingview.com/news/cointelegraph:f6afb56fe094b:0-ai-drives-surge-in-bug-bounty-reports-but-slop-is-rising-too/).

Nextcloud's line is the most telling. They didn't kill intake. They kept the channel open and stopped paying *until they could filter properly*. That gap, "we want the reports but can't afford to triage them," is the exact problem this article is about.

## Why AI slop breaks bug bounty economics

A bug bounty is a costly-signal mechanism. The original assumption: a human invested real effort to find a bug, so they produce a specific, reproducible report, and triaging it is worth the reviewer's time. The whole model rests on submission effort being expensive enough to filter out noise at the source.

AI collapses that assumption. It makes producing a plausible-looking report nearly free while the cost of disproving one stays stubbornly human. A generated report can cite a real-looking CVE, describe a real-looking function, and lay out confident reproduction steps, all hallucinated. A reviewer still has to open the codebase, trace the claim, and prove the negative. Proving a bug is real is fast. Proving a convincing fake is *not* real is slow.

It gets worse. AI tends to surface many instances of the *same* underlying issue, the same cross-site scripting pattern with twenty different payloads, and submit each as a separate finding. Volume inflates without security improving. The bottleneck moves entirely to validation, which is the one part of the pipeline you can't cheaply automate by default (source: [Pen Test Partners](https://www.pentestpartners.com/security-blog/ai-noise-and-the-effect-its-having-on-vulnerability-disclosure-programs/)).

Worth keeping in perspective: VDPs and bounties historically ran **60 to 80% invalid submissions even before AI** (source: [Yogosha](https://yogosha.com/blog/ai-and-vulnerability-triage-lessons-learned-from-our-automated-assistance-poc/)). The triage tax always existed. AI didn't invent the problem. It removed the natural rate limit that kept it survivable.

## The triage playbook: five levers that keep a VDP alive

You don't fix this with one tool. You fix it by changing the economics back, raising the cost of submitting slop and lowering the cost of filtering it. Here are the five levers, in order of leverage.

1. **AI-assisted first-pass filtering at intake.** Auto-classify obvious slop (hallucinated functions, fabricated CVEs, no proof-of-concept, template language) before a human spends an hour on it.
2. **Rate limiting and spam controls.** A sliding-window throttle so one actor can't flood the queue in a 16-hour burst.
3. **Reputation scoring that penalizes slop.** Make confirmed slop cost the submitter, so repeat offenders self-select out.
4. **SLA tracking that protects reviewer time.** Acknowledge and resolve on a clock so genuine researchers don't go public out of frustration.
5. **Bounties for impact, not volume.** Pay by severity, set informational findings to zero, and require consolidation so a thousand identical findings pay once.

### 1. AI-assisted first-pass filtering at intake

The single highest-leverage move is to stop letting raw submissions hit a human first. An LLM screening pass can catch the tells that almost every slop report shares: references to functions or code elements that don't exist, CVEs cited as new that are years old or fabricated, generic remediation boilerplate, and the absence of any specific, runnable proof-of-concept.

The critical design choice is that this is *assistive*, not auto-reject. The first pass should recommend, not decide: high-confidence pass, needs-review, or flag. A human still adjudicates anything borderline. You're not building a robot gatekeeper. You're building a sorting hat that routes the obvious slop away from your senior engineers' attention.

### 2. Rate limiting and spam controls

curl took seven reports in 16 hours. No human team triages that in real time. A simple sliding-window throttle, say five reports per actor per five minutes before a temporary block, neutralizes the burst-flood pattern without touching legitimate researchers, who almost never submit in rapid succession. This is the cheapest lever to implement and one of the most effective against the specific 2026 attack shape.

### 3. Reputation scoring that penalizes slop

Karma systems are old news in bounty land, but most only reward valid work. The 2026 environment demands the inverse: confirmed slop and dismissed spam should *subtract* points. When a submitter's reputation drops, you can gate their future submissions or deprioritize them automatically. The goal is to make slop cost something, so the people gaming volume stop finding it worth their time.

### 4. SLA tracking that protects reviewer time

Here's the connection people miss: SLA tracking isn't only about being responsive. It's about not losing the good researchers. Genuine researchers go public when acknowledgments slip, fixes ship silently, or severity gets quietly downgraded, which is the exact failure mode we covered in [when researchers go public](/blog/when-researchers-go-public-botched-disclosure). Under slop load, ack times are the first thing to slip, and that's precisely when you can least afford to alienate the real reporters buried in the noise. An acknowledgment clock (e.g., 72 hours) plus per-severity resolution targets keeps the people you most want to keep coordinating with you.

<div class="blog-inline-cta">
  <p><strong>Drowning in your VDP inbox?</strong> Kit's CSIRT module ships AI first-pass screening, intake rate limiting, reputation karma, SLA clocks, and impact-based bounties as configuration, not a six-month build.</p>
  <p><a href="/users/sign_up">Start your free trial</a></p>
</div>

### 5. Bounties for impact, not volume

If your bounty pays per accepted finding regardless of severity, you are paying people to submit volume, and AI is happy to oblige. Tie payouts to a severity matrix where informational findings pay **$0** and only real impact pays real money. Pair that with deduplication so a thousand instances of one root-cause bug resolve to one ticket and one payout. This is exactly the lever curl reached for at the end: removing the monetary incentive to submit crap. You don't have to go all the way to zero like curl did, but you do have to stop rewarding quantity.

## Don't throw out the good AI-assisted researchers

The enemy is *unvalidated volume*, not AI. This distinction matters, because the lazy reaction, "ban AI-assisted reports," would discard your best contributors.

The clearest counterexample is Joshua Rogers, whose AI-assisted work surfaced roughly 50 *real* bugs in curl, every one human-validated before submission. Same tooling as the slop merchants, opposite outcome, because a competent human stood between the model and the submit button. Stenberg has been explicit that responsible AI-assisted research is welcome; it's the firehose of unchecked output that he can't sustain.

So the playbook's job is precise: filter on *evidence and validity*, not on *whether AI was involved*. A report with a working proof-of-concept and a real code reference passes whether a human or a model drafted it. A report with hallucinated functions and no PoC fails for the same reason regardless of authorship. Screen the signal, not the tool.

## How Kit operationalizes the playbook

Here's the uncomfortable truth for any team building this in-house: the five levers above are roughly six months of internal tooling. Kit's `Csirt::` vertical implements them as configuration. The mapping is close to one-to-one.

| Playbook lever | Kit CSIRT feature | What it does |
|---|---|---|
| **AI-assisted first-pass triage** | `Csirt::AiScreening` | LLM screening returns `pass` / `review` / `flag` by confidence, and detects explicit slop signals: hallucinated functions, fabricated CVEs, a previous CVE cited as new, generic remediation, no specific PoC, template language, vague reproduction steps, references to nonexistent code |
| **Rate limiting / spam controls** | `Csirt::SpamConfig` | Sliding-window throttle (default 5 reports per 5 minutes, then a temporary block) with auto-block on repeat |
| **Reputation that penalizes slop** | `Csirt::KarmaEvent` | Confirmed AI slop costs the submitter karma; spam dismissal and not-applicable also subtract; valid resolutions and bounties add, with severity bonuses |
| **Deduplication** | `Csirt::TriageConfig` | Dedup enabled by default, so repeated instances of one root cause don't become a hundred tickets |
| **SLA tracking** | `Csirt::SlaConfig` | Acknowledgment clock (default 72h) plus per-severity resolution targets |
| **Impact-based bounties** | `Csirt::BountyMatrixConfig` | Severity-tiered payout bands; informational pays $0, scaling up to critical |
| **Ownership under load** | `Csirt::OnCallConfig` | Auto-assign on-call so acknowledgments don't slip when the queue spikes |
| **Less out-of-scope noise** | `Csirt::ScopeConfig`, `Csirt::SecurityTxtConfig` | Public scope and policy reduce invalid submissions at the source |

A precise note on the AI screening: it's *assistive triage*, a recommendation plus a confidence score, not an auto-reject. Humans still adjudicate every `flag` and `review`. Kit doesn't claim to block all slop. It filters the first pass and re-prices the incentives so your reviewers' hours go to real bugs, and a Joshua-Rogers report with a working PoC sails through while a hallucinated one gets caught.

## The takeaway: a slop problem is really a triage problem

The programs that died in 2026 weren't beaten by AI. They were beaten by an intake pipeline built for a costly-signal world that no longer exists. Slop is just volume. The failure was structural: no automated first pass, no rate limit, no reputation penalty, and a bounty that paid for quantity.

Every one of those is fixable, and every one of them is a setting rather than a research project. If you're standing up a program from scratch, start with [the VDP setup guide](/blog/how-to-set-up-vulnerability-disclosure-program). If you're already underwater, the move is to change the economics back: automate the first pass, throttle the floods, make slop cost something, protect your reviewers' time with SLAs, and only ever pay for real impact. Keep the curl-killer reports out and the Joshua-Rogers reports in.

That's exactly what Kit's CSIRT module is built to do. [Start a free trial](/users/sign_up) and turn a flooded inbox into a filtered, rate-limited, SLA-backed queue.