AI guardrails will always fail. NIST just proved it mathematically.

Resources
News
AI guardrails will always fail. NIST just proved it mathematically.

“There is no finite set of guardrails that is universally robust against adversarial prompts.”

That is not a risk assessment. It is not a vendor claim or a consultant’s forecast. It is the finding of Apostol Vassilev, a senior scientist at the National Institute of Standards and Technology (NIST), published in peer-reviewed research in IEEE Security & Privacy in May 2026. It is a mathematical proof. And it has significant consequences for every organization that has signed off on an AI governance policy and considered the work done.

The dream that ended in 1931

To understand what Vassilev proved, you need to understand what Gödel destroyed.

In the early twentieth century, the most ambitious mathematicians alive were trying to build a theory of everything. Not in physics. In mathematics itself. The goal was a complete, consistent system: a finite set of axioms from which every true mathematical statement could eventually be proved. Get the foundation right and the entire edifice of mathematics would follow. It was the discipline’s cleanest dream.

Kurt Gödel ended it in 1931. His incompleteness theorems showed that any formal system powerful enough to describe basic arithmetic will always contain true statements that cannot be proved within the system itself. The system is, in a precise technical sense, incomplete. You cannot close it. Adding more axioms doesn’t solve the problem. As Vassilev puts it: “You can add more statements to address the contradictions you encounter, but you’re back to where you started. It happens again.”

The dream of a finished, complete mathematics was not deferred. It was proved impossible.

Why this is now your problem

The guardrails governing an AI system are a formal rule set. A finite set of constraints designed to define what the system will and will not do. Vassilev’s proof applies Gödel’s logic directly to that structure.

“You can never make a claim that you are robust against all adversarial prompt attacks. There will always be some prompt that can potentially evade and defeat any defensive infrastructure that you have built around your AI system.”

This is not a gap in your implementation. It is not a failure of your security team. It is a property of the structure itself. For any fixed set of rules, a prompt that defeats them exists. The only question is whether an attacker finds it before you do.

What this means if you work in compliance or procurement

For security teams, the “no finish line” framing is familiar territory. The harder implication runs into governance, compliance, and the decisions made in boardrooms and procurement committees.

Every AI risk framework built around a fixed control set is structurally incomplete. Not because the authors were careless. Not because the threat landscape evolved after sign-off. Because a finite rule set applied to an infinite problem space cannot, even in principle, cover it. Incompleteness is not a risk. It is the only possible outcome.

What that means practically: an AI policy that is written, approved, and filed does not become outdated. It is outdated at the moment of completion. A vendor assessment that evaluates controls at a point in time does not tell you whether those controls hold tomorrow. A compliance framework that treats AI governance as a deliverable rather than a process is not describing a manageable risk. It is describing an unexamined one.

The audit that ends is not the audit that works.

If your board is asking whether your AI controls are sufficient, the structurally honest answer is: not permanently, and not by design. The question worth asking is whether your program is built to keep working the problem after the document is signed.

Why AI changes the economics of the attack

Traditional software vulnerabilities have historically been expensive to find. Zero-day exploits in deterministic systems required significant resources to locate and weaponize. Nation-state actors dominated that space because the barrier was high.

AI systems accept human language as input. That single fact changes the economics of exploitation entirely.

Language is not a narrow, parseable input channel. It is the full expressive complexity of human communication: ambiguous, contextual, infinitely varied. The number of ways to embed harmful intent in a prompt has no upper bound. As Vassilev observes, the complexity and richness of language makes compliance-checking built on a finite rule set infinitely ambiguous. The barrier to finding an exploit is not technical skill. It is creativity and time. Both are abundant.

The attack surface is not just large. It is unbounded. Vassilev means that precisely: not growing, not difficult to map, but without a theoretical limit.

What the proof requires

Vassilev does not present this as a reason to abandon AI security. He presents it as a specification for what the response has to look like.

A problem with no complete static solution requires a continuous dynamic response. His framework has three elements. Red teams working constantly to find new adversarial prompts before attackers do. Continuous updates that harden guardrails against newly discovered attack vectors. And operational resilience built around when, not if, an exploit occurs.

At CovertSwarm, this is not a framework we adopted in response to Vassilev’s proof. It is how we have operated from the start. Continuous red teaming across people, processes, and technology. That means we don’t only probe your technical controls. We test the procedures your teams follow when something goes wrong, the governance sign-offs that create false confidence, and the human decisions that sit between your policy documents and your actual security posture. Persistent coverage that treats the attack surface as what it is: a moving, expanding, unbounded problem that requires permanent attention. Resilience built on the assumption that the gap exists. The only open question is who finds it first.

The organizations that stay ahead are not the ones that deployed the most tools at launch. They are the ones that never stopped looking for what their tools cannot see.

The boulder and the hill

Vassilev titled his paper “Robust AI Security and Alignment: A Sisyphean Endeavor?” The myth is exact. Sisyphus pushes the boulder to the top of the hill. It rolls back. He pushes it again. Not because he failed, but because the nature of the task is that it does not end.

That is not a counsel of despair. It is the most precise description available of what serious AI security actually requires. The goal, as Vassilev frames it, is to reach a state where the cost of finding new exploits exceeds attackers’ resources. Not security as a finished condition. Security as an economic equilibrium, maintained by continuous effort.

Gödel did not prove that security is impossible. He proved that completeness is. The right response to that finding is not a better policy document, a more thorough vendor assessment, or a more detailed risk register.

It is a commitment to never stop testing.

The boulder does not stay at the top of the hill. The only question worth asking of your program is who you have assigned to keep pushing it.

——————————————————————

Apostol Vassilev, “Robust AI Security and Alignment: A Sisyphean Endeavor?” IEEE Security & Privacy, May 2026. DOI: 10.1109/MSEC.2026.3678214. NIST summary: nist.gov

DORA is not GDPR. Stop treating it like it is.

Most firms are treating DORA like GDPR: get a consultant, document the framework, move on. That worked for data privacy. It won’t work for a regulation…

Combining regulation with real-world security assurance: DORA and NIS2

Whether you’re a local financial startup or a multinational food distributor, understanding how DORA and NIS2 may affect your organization is vital. With implementation dates just…

Low-angle view of institutional buildings at dusk, symbolising structure and resilience in a regulated environment

Part 1: CBEST Series – Beyond the Checklist

Explore how threat-led penetration testing helps financial institutions go beyond traditional checks to strengthen resilience and meet regulatory expectations like CBEST, STAR-FS and DORA.

Looking up at a glass skyscraper framed between concrete overpasses, symbolising layered perspectives and security vantage points in CBEST threat intelligence

Part 2: CBEST Series – Operational Resilience

CBEST threat-led testing proves whether your organization can withstand real-world attacks, uncovering hidden weaknesses and driving true operational resilience.

aerial view of a city symbolising complexity and continuous threat readiness

Part 3: CBEST Series – The Future of Threat-Led Penetration Testing

Regulated testing like CBEST is pivotal, but as threats shift, organizations must adopt more strategic, agile threat-led penetration testing. Discover what’s next.