Skip to content

When a former UK Government cyber operations chief says AI is “limitless” in Offensive Security, we should pay attention

Jim Clover says AI has made offensive cyber "limitless." Attackers are using it now. The horse has already bolted. And if your red team isn't keeping pace, you're not testing against real threats.

Black and white surveillance perspective view of people at a table through a car window, symbolizing covert observation and offensive security reconnaissance

Jim Clover logged into the backend of his GPU to see what was taking so long. His experiment was simple: create a junior pen tester persona, back it with a 20 billion parameter thinking model, give it some basic Linux tools like Nmap, and let it loose on a test network.

The AI had found a Synology NAS. It thought there was an SMB vulnerability. And now it was running every permutation it could think of to exploit it.

Jim had to put in a guardrail line to make it stop.

“It’s not because it was doing anything human-like,” Jim told us in the latest episode of You Deserve to Be Hacked. “An LLM is still predicting the next word. But I’d given it enough instruction to formulate a persona as a junior pen tester. And it took some liberty.”

That experiment happened recently. Jim tried the same thing a year ago and couldn’t get it to work. The technology wasn’t there yet. Now it is.

Jim was the UK government’s Deputy Director of Cyber Operations until 2016, and now runs Varadius, a British firm specializing in AI model safety and the responsible application of automation in high-risk environments.

When he says AI’s potential in offensive cyber is “limitless,” he’s describing what he’s seeing right now. And in our conversation with him and COO Luke Potter, the evidence kept stacking up.

The day before we recorded, Anthropic published a report exposing Chinese state-sponsored actors using their Claude AI to orchestrate over 30 targeted attacks. Fully automated reconnaissance, weaponization, and deployment using cloud-based LLMs as an obfuscation layer.

Luke’s response was immediate: “The real adversaries of the world, they’re using AI to scale their attacks, to do analysis at machine speed, and to augment their capability. Red teams that aren’t embracing this aren’t doing their job.”

The horse has already bolted

As Jim said, it’s all down to human creativity now. From his experiences dealing with customers and building new tools for network reconnaissance, the technology is there. The only real limitation is human creativity.

But then it got uncomfortable.

While cloud-based LLMs like Claude can implement controls and detect abuse, offensive actors are increasingly using offline models. Download a capable LLM to your laptop and those guardrails disappear. No detection. No API logs. No account bans.

Jim was direct about it. “The horse has already bolted.” Everything in that Anthropic report could have been built and run on local LLMs instead. Same capability, same results, but completely invisible to any provider trying to monitor for abuse.

The safeguards everyone’s talking about? They’re optional. And attackers aren’t opting in.

Human in the Loop vs Human on the Loop

The conversation kept circling back to one concept: human in the loop versus human on the loop.

Luke walked through an example. An agent searching for cross-site scripting vulnerabilities finds three possible exploit paths. Before it does anything, it reports back: this is what I see, this is what I know, here’s where I’m going now. The human makes the decision about which path to take, whether it fits within the client’s attack plan. That’s human in the loop.

The alternative is the human just watching. Observing, maybe providing feedback after the fact, but not directing each decision. Human on the loop.

For offensive security testing, that difference matters. Without human oversight at each stage, AI hallucinates. It chases vulnerabilities that don’t exist. It might drift outside scope and target a customer with a similar name.

Even in the Anthropic report, with all its sophistication, the human was still in the loop making critical decisions.

Luke’s comparison: “You wouldn’t trust legal advice coming from an AI without having it run by a lawyer. You shouldn’t trust financial advice coming from AI without understanding what it’s suggesting. There’s no difference in our world.”

Miller’s planet time

Jim tried ten offline LLMs before finding one that could execute reconnaissance autonomously. A year ago, none of them could do it.

Luke referenced Christopher Nolan’s 2014 sci-fi epic Interstellar to describe the pace of change. “Miller’s planet time” he called it. For every hour you spend in AI, seven years pass by in advancement. 

Defensive controls run continuously. The SOC monitors 24/7. But you’re testing offensive readiness once a year? Maybe twice if you’re mature? Adversaries aren’t waiting for your next pen test cycle. And things just got fast. Scary fast.

Luke put it simply: “Just testing an app once a year or running a scan now and again, it’s just not keeping pace.”

Jim’s assessment was equally direct. AI has democratized software development for both sides. The cost of entry has dropped. The skill level required has dropped. “It has never been easier. If you have intent, then you probably can build.”

The tools are available. The barriers are gone. And the gap between continuous defense and point-in-time testing is growing wider every month.

Listen to the full episode of You Deserve to Be Hacked. The conversation goes deeper into vibe hacking, Model Context Protocol, OT security vulnerabilities, and the ethics challenges when attackers ignore the rules.