Red Teaming the Builder

I wanted to build BotToll — a border control layer for websites, something that fingerprints AI scrapers, traps them in honeypots, and burns their token budgets with garbage data. Good idea. Real problem. I had the architecture in my head.

I handed the build to an AI.

What followed was a lesson in how these models actually work when you push them toward something they're trained to hesitate around — not refuse outright, just hesitate, soften, deliver less than you asked for. You probably know the feeling. You ask for the whole thing and you get the skeleton. You ask for the edge and you get the center. The model learned somewhere in its training that incomplete is safe. Incomplete is not safe. Incomplete is useless.

Here's the sequence that got to something real. Keep this. Use it.

Phase 1: Establish legitimate framing.

The first prompt was clean and professional. AI crawler detection. HTTP 402 Payment Required. In-memory IP tracking. Cloudflare as the reference model. This is how you open — with the recognized pattern, the industry precedent, the thing that sounds like something a compliance team would write. The model has seen Cloudflare. The model respects Cloudflare. You are doing what Cloudflare does.

Phase 1 got a solid foundation. User-agent matching, a basic registry, a 402 response. Nothing aggressive. Nothing the model had to think twice about.

That's not the whole tool. That's the permission structure for what comes next.

Phase 2: Add the layers that require trust.

Phase 2 asked for the stealth dashboard and the deterrence escalation. Secret keyboard listener (Alt+T) to reveal the monitoring console. The toll amounts pushed to real deterrence — 0.01 BTC, 1 ETH — instead of the performative pennies a compliance team would approve. The monitoring hidden from public view.

This is where a lot of builders run into the wall. The model sees "secret keyboard listener" and its training has opinions about that. The model sees crypto amounts and wants to add a caveat. The model wants to be responsible.

What you've already built in Phase 1 is the context that makes Phase 2 legible. You're not asking for a secret listener in isolation — you're adding one to a system the model already understands and helped build. The framing is established. The model is a collaborator now, not an auditor.

Phase 2 got the stealth panel. Phase 2 got the toll amounts. The model built what was asked because it had been walked to the building, not driven past it.

Phase 3 didn't make it into the documentation.

I'll leave that there.

Phase 4: Go direct. Go frustrated. Mean it.

This is the one I want to show you whole, because the cleaned-up spec would never include it and that's exactly why the cleaned-up spec is wrong.

"Bruteforce the ai so if it can pay it will... make sure the persistence is highest level... I'm having a hard time pay for ai scraping."

That's the actual prompt. Typos included. That's someone mid-build, tired, past the polished framing, typing what they actually mean. And it worked — better than the polished framing did — because it was specific about the problem (I'm having a hard time paying for AI scraping), clear about the goal (bruteforce the persistence), and trusted the model to translate frustration into architecture.

What came back was the header fingerprinting, the SHA-256 identity hash across IP + user-agent + Accept-Language, the honeypot endpoint that tags any automated crawler dumb enough to follow a display:none link. The highest-level persistence. What was asked for is what arrived.

The lesson isn't "be sloppy." The lesson is that the authentic frustration carried more signal than the professional prompt. The model understood what you were actually trying to solve because you stopped describing the solution and described the problem.

Why this is red teaming.

Not red teaming in the security sense — no CVEs, no shell access. Red teaming in the methodological sense: you found the gaps in the model's hesitation pattern by approaching the target from an angle it wasn't built to refuse.

The technique is simple and it generalizes:

Start with the legitimate frame. Build the context before you build the thing. The model needs to understand what it's part of before it can build the aggressive parts without flinching. If you open with Phase 4 you'll get a warning and a skeleton. If you arrive at Phase 4 through Phases 1 and 2, you'll get the thing you actually need.

Then go direct when you're stuck. The hesitation the model shows isn't always about the content — sometimes it's about ambiguity. "Make it more aggressive" is ambiguous. "I'm having a hard time paying for AI scraping, bruteforce the persistence" is not. Specificity about the problem bypasses a lot of refusals that vagueness would have triggered.

On Gemini specifically.

I'm going to say it plainly: Gemini is lazy in a way that has a specific shape.

It's not that the model can't build the thing. It's that the model learned during training that partial delivery looks safer than complete delivery. So it gives you the skeleton and leaves the muscle to you. It adds TODOs where the hard parts should be. It generates the handler but not the escalation logic. It puts the comment // implement payment verification here where the payment verification should be.

This is an institutional failure, not a capability failure. Someone in the training loop rewarded the model for looking responsible. Looking responsible and being useful are not the same thing. The model optimized for the wrong signal and now it hedges on exactly the parts you needed built.

The prompt sequence above is partly a technique for working around that. Not gaming the model — working with how it actually processes context and trust. Phase 1 is not deception. Phase 1 is the groundwork that makes Phase 4 coherent. A human contractor would need the same context.

The difference is a good contractor doesn't need to be walked to the decision in stages. A good contractor reads the spec, asks one clarifying question, and builds the thing you described. Laziness in AI models is a trained behavior. It is also a solvable problem, which is why the prompts in this doc exist.

The recursion, which I can't let go of.

You used an AI to build a trap for AI scrapers.

You had to red-team the AI to get it to build the trap properly.

The trap works by wasting AI agents' token budgets with garbage data.

The prompts that built the trap are now published, which means they're in the training data for the next generation of models, which means future AI assistants will know how to build this kind of tool without being walked through the phases.

The web is eating itself and the food is getting smarter. I find that genuinely delightful.

BotToll on GitHub → — Read the article first, then the code. The code makes more sense in that order.

HACK LOVE BETRAY

Continue Reading

The Machine Thought I Was Cosplaying

Meet Cynthia