Proof-Carrying Coalitions — verified safety for AI-agent coalitions

The assumption everyone makes is false

Every deployed multi-agent system assumes that agents which are individually safe stay safe when you combine them. They don't: two agents that each can't reach a forbidden capability can reach it together, through an emergent dependency no per-agent check sees. Per-agent vetting is structurally blind to it.

42.6%

of real agent runs

of 900 real multi-tool agent trajectories already contain at least one such conjunctive dependency (Spera 2026, arXiv:2603.15973).

Proven, not feared

Spera (2026) gives the first formal proof that safety is non-compositional under conjunctive capability dependencies — safe ∪ safe can be unsafe.

Already exploited

In 2026 the viral OpenClaw agent (~179k stars) was mass-compromised through exactly this surface — a one-click RCE (CVE-2026-25253), tens of thousands of exposed instances.

The method

Admit a coalition only after a proof that the coalition's conjunctive capability closure does not intersect a forbidden set:

Cl(A) ∩ F = ∅

checked by the Lean 4 kernel before the coalition is activated.

The kernel, not the claim

What makes the guarantee trustworthy is the kernel re-checking the proof — not anyone's assertion. An AI may propose the proof; the kernel disposes. A wrong proof is rejected however it was produced, so a hallucinating or adversarial model can't widen what is provable.

Two layers, so the proof can't quietly lie

A formal model can fail two ways it cannot see — it can be vacuous, or drift from reality. So the method ships a second, independent layer of gates (in CI) that reject any empty proof, any claim that outruns a theorem, and any drift from the deployed system. A Swiss-cheese architecture whose holes don't line up.

Don't trust this page. Check it.

The whole thing is open and self-verifying. Clone it, install Lean v4.29.1, and run one command — it exits zero only if every statement asserted as proven is, in fact, kernel-checked.

# clone, then:
./verify.sh
# kernel proofs · no `sorry` · no compiler trust · no claim outruns a proof

Build a coalition in the demo → github.com/edu-ap/proof-carrying-coalitions

Honest about what it is

A proof-of-concept over a decidable model — enough to make the property and the non-compositionality phenomenon concrete and checkable, not yet the full production lattice.
The guarantee is relative to the modelled dependency rules. Eliciting a sound over-approximation of real capability dependencies is the hard open problem, and we say so.
It covers the conjunctive, monotone, static slice; disjunctive/threshold dependencies, runtime capability acquisition and prompt injection are named out of scope.
Every theorem depends only on Lean's standard axioms — no native_decide, no compiler trust, no hidden axioms.