LLMs are great at finding bugs, terrible at finding them twice

You run your LLM-based code/security reviewer on a code change 10 times. It flags a SQL injection vulnerability 7 times. The other 3 times? Nothing. Same code, same vulnerability, different result.

The problem is obvious once you think about it. LLMs are probabilistic. Security requirements are binary. Either there’s a vulnerability or there isn’t. “Usually catches security issues” isn’t a product feature, it’s a bug.

But here’s the thing: LLMs are actually really good at finding these issues when they do find them. The patterns they discover are often subtle and contextual, stuff that would be relatively hard to write rules for upfront. So what if we use LLMs differently?

Instead of having LLMs do the analysis directly, what if we use them to discover patterns, then convert those discoveries into deterministic rules?

Here’s how this works. You deploy LLMs to analyze code and find issues. When they flag something, you don’t just fix it, you study it. What pattern triggered this detection? What makes this code dangerous? What would a rule look like that catches this every time?

Take SQL injection. The LLM might notice that developers are building queries by concatenating user input. Great insight. Now you write a rule: look for string concatenation patterns in database query contexts. Glob patterns for the files, regex for the code structure, AST analysis for the complex cases.

The result? That SQL injection pattern now gets caught 10 times out of 10, not 7.

You keep the LLM running to find new patterns. But once you understand a pattern, you codify it. LLMs for discovery, rules for execution.

This isn’t just about reliability. It’s about building systems that scale. LLM analysis is expensive and slow. Rule execution is fast and cheap. If you can convert common patterns to rules, you can scan massive codebases in seconds instead of hours.

The architecture ends up being layered. LLM discovery layer that continuously hunts for new issues. Rule execution layer that catches known patterns deterministically. Feedback loop connecting them so new discoveries become new rules.

Building this isn’t trivial. LLMs pick up on subtle context that’s hard to encode in rules. You have to figure out which parts of the LLM’s reasoning you can capture and which you can’t. You need infrastructure for managing rule lifecycles – creation, testing, deployment, retirement.

But the payoff is huge. Developers get consistent feedback. Security teams get deterministic protection.

I think this is where code analysis is heading. Not better LLMs that analyze code directly, but better systems that use LLMs to discover patterns and convert them into reliable detection infrastructure.

The next breakthrough won’t be GPT-N finding more bugs. It’ll be figuring out how to turn LLM insights into systems that never miss the bugs we already know about.