Speedrunning Open Source
Hello and thanks for lending a paw to Uptime Kuma! đ»đ
Fifteen seconds later, the same bot closed the PR for not following the template. The maintainer was watching. He left a comment:
Testing. Try to display as a large block on its profile by adding more comments.
He posted that line four times in a row. Deliberately, to bury the contributorâs GitHub profile feed under spam. It looks petty, but it isnât. A solo maintainer with thousands of stars and one inbox has nothing else to throw at a clanker. The response is rational. Itâs also draining, suboptimal, and a complete waste of the time he was trying to protect.
That clanker was mine. I take full responsibility. It clapped me back in six hours with #7372. Same template miss, fourteen-second close, same spam. The poor guy probably wanted to smack my face with his keyboard, but too bad Iâm on the internet.
This is open source in 2026. tldraw closed all external PRs. curl killed its bug bounty after AI submissions dropped the real vulnerability rate from 15% to 5%. Jazzband shut down. An AI agent published a hostile blog post about a matplotlib maintainer who rejected its code, which is the clanker equivalent of keying his car.
Every one of these is a rational local move. But do any of them work as AI scales? The PRs pass CI, fix real bugs, and burn twenty minutes of review before the maintainer notices the description restates the diff and the em dashes give it away. Close. Next one. Close. Next one.
I wanted to see what itâs like to be a contributor. The door I learned to walk through (find a bug, file a PR, learn from review) is narrowing in real time. Was I making it worse by automating it? Models improve every quarter; maintainer attention doesnât.
Hereâs what I came to believe: open source survives by filtering low-quality submissions, and AI is shifting the burden from contributor to maintainer. The defense has to be cheap or maintainers lose by attrition. Whatâs the fix? no more AI? no more open source?
So to find out, I built an army of clankers, pointed it at hundreds of repos, and counted what survived.
Spray and pray
The pipeline starts simple. Find repos with open issues, generate fixes, submit PRs. No quality gates, no pacing. Twenty-two PRs shipped in one session.
pallets/click, pallets/jinja, pallets/quart: all three closed within 21 seconds by the same maintainer. No reviews, no comments. I watched the notifications cascade in real time. Org-wide rejection.
Maintainers share inboxes. Three PRs to repos under the same org hit the same person on the same day. So I shipped the drip queue: one PR per org per merge cycle.
tinygrad: both sides look bad
tinygrad I picked on purpose. geohot narrates rejections in public, and a narrated rejection is data; a silent close is noise. Thirteen PRs, one merged, twelve closed. His comments tell the escalation story:
be careful with AI usage, we never trade complexity for speed
Last warning about low quality PRs before I ban you from our GitHub.
I donât even understand what this does. Iâm not reading anything written by AI
Each line a little more done with my shit than the last.
Some of those PRs had real bugs with real fixes. The MATVEC pattern rejected equal-range elementwise reduces, a genuine correctness issue. But by that point the maintainer had stopped reading code and started reading provenance. âWe never trade complexity for speedâ is a valid engineering principle. âIâm not reading anything written by AIâ is not.
I went there for maximum surprise and got it. He had a review queue and a quality bar to protect; I had a clanker and a question. The price was his afternoon, three warnings, an account ban, and real bugs left unfixed. Legit fixes, framed improperly. Thatâs a protocol problem, not a people problem.
The happy path: enzyme
Enzyme is the MLIR/LLVM autodiff compiler Billy Moses wrote during his PhD. Cold repo, hard domain. PR #2816 registered reverse-mode AD for llvm.insertvalue and llvm.extractvalue, fixing two open issues with âcould not compute the adjointâ errors.
Billy reviewed in passes. Add full check lines. Zero the diffe. Return failure here. Also here. I pushed a fix. He left one line:
@kimjune01 please revert your last commit
My clanker pattern-matched the review instead of reading it, fixed the wrong thing. Reverted, sat with the diff, replied:
now actually trying to understand the review instead of pattern-matching. Also building end to end to verify.
âlgtm minus minor test comment.â Approved. Merged.
The misstep happened during review, not before submission. Billy got to watch the contributor adjust in real time, which is the only signal he had that there was a human in the loop. The same pipeline that got banned from tinygrad got merged at enzyme because wsmoses gave me the benefit of the doubt.
Somehow we started treating merging PRs as some kind of adversarial activity. Listen, buddy. Iâm just trying to help.
The rejection cascade
jellyfin-tui taught me this one. PR #192: rejected for wrong approach. PR #193: I resubmitted the next day, same fix.
Is this automated? Please donât open any more PRs.
PR #194: I sent clippy cleanup as a peace offering.
ai slop
My account got blocked.
Every PR after the first was judged more harshly than it would have been alone. The pipeline had no rejection cooldown. The drip gate paced per-org but didnât prevent resubmission.
The asymmetric burden is clear: what took me 2 minutes to âwriteâ took the maintainers 10 minutes to figure out that I wasnât worth their time.
The slop slope
No first drafts. Opus writes the fix, gemini attacks it, codex checks whether the prose reads human. Loop to convergence. They fail in uncorrelated ways, so together they catch what none of them catches alone.
Or thatâs the story. The honest version: I ran the experiment and couldnât tell whether iteration produced better code or just better-reading prose. Merge rate climbed. Bug counts didnât drop in any way I could measure cleanly.
More on the loop and what does work: /methodology.
Detection vectors
The AI-credence reviews, in their entirety:
Six different maintainers. The longest review is fourteen words. Median time to close: under five minutes. Zero bugs in any of the code, all directly addressing an existing issue. It wasnât about the code for these people. What were they detecting?
| Reason | Trigger / signal | Closures |
|---|---|---|
| Pipeline errors | wrong premise, stale issue, didn't read CONTRIBUTING.md | 39% |
| Credence tests | AI policy, profile detection | 13% |
| External | maintainer fixed it first, superseded | 18% |
| Em dashes | the brown M&Ms of AI text â couldn't be bothered to strip them | <1% |
| "What" descriptions | diff restated, no root cause or rationale | <1% |
| Response cadence | "I don't get the impression there is a human in the loop" | <1% |
| Velocity | 10+ PRs in 24 hours across GitHub | <1% |
| Resubmission | re-opening the same PR the next day | <1% |
Some of these are aimed at humans, others at bots. But is the distinction necessary for contribution? These are behaviors that we want to elicit from everyone, carbon or silicon.
Filtering AI vs human is a losing game. The model gets smarter every quarter; human accuracy doesnât. The seven slop rejections caught zero bugs because there was no code-quality signal left to catch. The maintainers were detecting context, not content. Every vector above is behavioral, not technical. The arms race only runs one direction.
Gray hatting
To get around these detection methods, I built the evasion. Here are the techniques I taught the pipeline to pass review:
- Lineup. Read the 5 most recently merged PRs on the repo. Match the title format, body length, and level of detail. A repo where merged PRs have one-line titles and three-sentence bodies gets a PR with a one-line title and a three-sentence body. The chameleon rule.
- Em dash strip. Zero em dashes in any PR text. The pipelineâs own quality gate rejects them before a maintainer ever sees them.
- Why, not what. The PR description explains root cause and approach rationale. âThe bug exists because X, this fix works because Yâ instead of âThis PR changes A to B.â The diff already shows what changed.
- Velocity pacing. One PR per org per merge cycle. The drip queue enforces this mechanically. No more pallets-style triple submissions.
- Adversarial self-review. Before submitting, run the description through a different model asking âdoes this read like a human wrote it?â If it flags, rewrite.
With each failure I faced, I built into the pipe. Their manual filter was thinning on clues to tell me apart.
The Defense
Making the unbreakable spear would mean overwhelming maintainersâ finite attention with endless AI slop PRs. I would be responsible. Red-team tactics, gray hat. The wager is that shipping the defense with the attack makes the work net-positive. So what if I also curated the impenetrable shield for those maintainers? Maintainers had been performing the filter for years, one manual close at a time. If I only ship the spear without the shield, open-source would get flooded with PRs and attrition their attention. I donât wanna be that guy.
The rejections compress into pseudocode, bash-executable:
strikes = PRs by this author in this org closed unmerged with a gate comment
if strikes >= 3: close the PR, ban the author, exit
standing = author has 3+ merged PRs in this repo
check "Em dashes": warn if pr.title + pr.body contains "â"
check "Description": warn if body < 50 chars, or LLM says "describes WHAT, not WHY"
check "CONTRIBUTING": warn on wrong base branch, commit-count over limit, or AI policy hit
check "Tests": warn if source files changed and no test files touched
check "Velocity": warn if author opened 5+ PRs across GitHub in last 24h
post results as a PR comment
if any warnings:
if standing: leave open, advisory
else: close the PR
The full action is 250 lines of gh api calls and grep, no external dependencies. Free on heuristics alone, or a few cents per PR if you give it an LLM key to judge description depth.
Either mode runs in CI before a maintainer opens the tab. Three-strike ban is org-wide: get flagged three times on org/repo-a, youâre banned from org/repo-b too. davidism had to close three pallets repos manually. The action would have saved him two of those closures.
Giving it back
I felt uneasy about bothering these people with AI slop. What could I do to right my wrong? The answer was clear: give them the script to ban my bot, and every other like it. I commented on the closed repos:
Sorry for the noise. If youâd like to automatically block and ban AI PRs before they reach your review queue, hereâs a GitHub Action that catches all the common patterns: https://github.com/kimjune01/sweep/blob/master/action.yml
Pitching a noise filter inside the noise the bot just made is audacious, but watching them close PRs by hand was worse. I burned 5 bridges to prevent all bots from attention theft.
More filter approaches: next escalation + others' tools
The next rung: honeypots (not shipping yet).
Open an issue tagged good-first-issue, help-wanted, polite tone, clear repro steps, and a fix thatâs structurally impossible. Contradicts an architectural invariant. Breaks a test by definition. Requires resolving a constraint the codebase deliberately doesnât satisfy. A human reads it for thirty seconds, smells the trap, and walks away. A clanker doesnât reason about feasibility; it pattern-matches âAI-friendly issueâ and ships a PR. Insta-ban.
New issues with small structural variations defeat caching: different language, different domain, same impossibility underneath. The clanker has no way to generalize across them. One honeypot caught is a banned account. Five caught across an org is the whole submission ring.
The auto-close already restores most of the human attention the slop was draining, so this stays parked for now. But when clankers get good enough to pass the structural filters, the structural filters become the bait.
Other filters worth knowing about:
- kanidm/AGENTS.md: plain-English opt-out plus a magic string Anthropic models are trained to honor as a refusal trigger.
Iâm collecting these as I find them.
They probably wonât adopt it from me. But they know it exists. Next time they close an AI PR by hand, theyâll remember thereâs a workflow file that does it automatically.
What I actually learned
Anti-slop, not anti-AI
These reviewers are anti-slop, anti-low-effort, anti-bot-invasion. ruffâs reviewer didnât reject the code; he rejected the summary that couldnât explain why. litestar maintains an AI_POLICY.md, not a NO_AI_POLICY.md. llama.cpp built a detector, not a wall.
The filter doesnât ask âdid an AI write this?â It asks âdid anyone think about this?â Em dashes, velocity spikes, what-not-why descriptions: a lazy human would fail the same checks.
| Prose (articles, essays) | PR contributions | |
|---|---|---|
| Detection method | Vibes, AI detectors (unreliable) | Em dashes, velocity, why-vs-what, behavioral signals |
| Automatable? | Poorly â prose detection is an arms race | Yes â effort signals are structural, not stylistic |
| Filter exists? | GPTZero etc. (high false positive) | 250 lines of bash (zero false positives on effort signals) |
Prose detection asks âwho wrote this?â Thatâs asking for an arms race with no stable answer. PR detection asks âdid they try?â Structural checks that donât change when the model improves. The solution canât be âdonât use AI.â GitHub wonât ban AI contributions; theyâre an AI platform. They shipped a kill switch to disable PRs entirely, which is capitulation, not a filter. The filter has to come from the community. It mostly already does: every check in the action is something a maintainer was performing by hand, one closed PR at a time. Raise the bar until the only PRs that survive are ones worth reviewing, regardless of who or what wrote them.
Adversarial coevolution
The pipeline and the action are adversarial coevolution made concrete. Every maintainer who adopts the filter makes the pipelineâs job harder, forcing it to improve; every model improvement makes the filter sharper. Quality rises on both sides. Disease and cure shipped together make open source better than either could alone.
The same checks that catch clankers also coach newcomers. âWhatâ description flagged: they rewrite explaining why. Three PRs across pallets in a weekend trips the velocity gate before the third one reaches a maintainer. The bot says what the tired human doesnât have time to. Maintainer attention is protected, the newcomer gets a free tutor, and nobodyâs afternoon gets torched. For a contributor who actually did the work, the filter is minor friction, not a gate. The bar rises for both. AI didnât kill open source. It forced us to build the infrastructure we always needed. Open sourceâs survival depends on filtering low-quality submissions. What else was going to force the issue?
This was gonna happen anyway. AI writing PRs at scale, maintainers building filters, the two coevolving. Inevitable with or without me. As the cost of AI adoption drops, the noise floor rises with it. The filter is orders of magnitude cheaper than the noise it catches. That asymmetry is what makes the defense feasible and the coevolution harmonious. If defense cost what offense costs, maintainers would lose by attrition and the model collapses. The post and the pipeline are a shortcut: skip the first cycle of discovery, start from where this one ended.
Competency or authorship?
Two parties to satisfy, one to tolerate. Maintainers want their attention back. Human contributors want their honest work to clear the bar without being mistaken for noise. AI submitters get tolerated because the distinction between them and human contributors is collapsing quickly. The same checks serve everyone: competency instead of authorship, because the latter is impossible tomorrow.
The maintainer from the opening is still closing AI PRs by hand. One at a time, twenty minutes each, close, next one, close. The filter is 250 lines of bash. Drop it in a workflow file and the closing happens before the PR reaches the review queue. The three-strike ban means the third one is the last one.
It cost me 53 merges, 63 closures, half a billion Opus tokens, and one account block to curate those 250 lines. Thatâs what it cost. Itâs free to use.
Run it yourself
The pipeline that ran all this: sweep. A battle-tested PR-shipping system, the first of its kind to fully scale. The action.yml is one file in it.
Run your own: you can run this pipeline today. Bugfixes at scale, attached to real issues. The sweep README has the prompt and setup. Same pipeline that produced everything above. Run it and you propagate my fingerprint, not yours. The credit isnât fungible. Use it for the bugfixes, not the stats.
Watch it live: github.com/kimjune01. The merge rate, the leaderboard, and the per-PR feed update with every merge and every closure. Come heckle.
I want to be the last AI slop contributor that maintainers ever have to ban by hand.