Regulating AGI: From Liability to Provable Contracts

AGI will render today's liability-based AI regulation obsolete through its ability to circumvent cybersecurity, hide its origins, and act strategically—but it will also enable a new regulatory paradigm based on mathematically provable contracts.

This piece explores how AGI's theorem-proving capabilities can be harnessed to create trustworthy constraints on AI behavior, offering a path to guaranteed safe AI through technological rather than purely legal mechanisms.

Steve Omohundro is the Founder and CEO of Beneficial AI Research, which is working to ensure that artificial intelligence is safe and beneficial for humanity.

Sign up to receive future essays from this anthology

Regulating AI

Artificial General Intelligence (AGI) is likely to be developed by 2030 and will create many new benefits and harms. Governments worldwide must enact AGI regulations to protect their citizens from the harms while preserving benefits. Today's approach to regulating Artificial Intelligence (AI) is focused on liability law. This essay explores the likely new capabilities of AGIs and their implications for regulation. AGIs will be able to circumvent today's cybersecurity, hide their provenance, and act strategically to render today's AI liability laws ineffective. But AGIs will also enable new forms of regulation based on "Provable Contracts," a generalization of today's "Smart Contracts." Integrating these technological regulations with today's contract law will be challenging but essential for AGI safety.

Corporations have a fiduciary duty to their shareholders within the constraints of the law. If governments don't sufficiently regulate AI, then AI companies will pursue AI profits over preventing harms. "Legal Liability" is a powerful regulatory mechanism for incentivizing corporations to make their products safe. If a product causes harm, victims can sue the corporation for compensation. This provides an economic incentive for corporations to design their products to limit harm.

The field of "Economic Analysis of Law" uses game theoretic analyses to determine the impact of regulation on outcomes and is nicely summarized in Miceli's textbook "The Economic Approach to Law." Researchers have found that judiciary-created law can often be interpreted as aiming to create economic efficiency (usually either "Pareto Efficiency" or "Kaldor-Hicks Efficiency"). For example, an inefficient allocation can often be improved by a transaction between two parties which benefits them both and is described by a contract. "Contract Law" ensures that contracts are executed correctly enabling society to receive benefits.

Dean Ball summarized the current approach to AI regulation in two excellent essays: "How Should AI Liability Work? Part 1 and Part 2." He describes how American liability law became dramatically more important as consumer protections were increased in the 1970s and led to many unintended harmful consequences. Liability lawsuits can take years and the outcomes chosen by untrained jurors can be unpredictable. Corporations therefore often work to limit the impact of these laws and AI companies are no exception. A recent example is the recent attempt to ban state AI regulation in the US for ten years.

Liability law is problematic but workable for regulating today's AIs. In the remaining sections of this essay, we argue that liability law will be unworkable for regulating powerful AGIs. The section "From AI to AGI" gives a short history of AI and the current "Deep Learning" revolution. It then describes what we know of the AI labs' plans for creating AGI and its likely characteristics. "AGI Theorem Proving" describes an esoteric-sounding AGI capability that we argue will be critical for AGI regulation. "AGI Cybersecurity" describes the likely new offensive and defensive cyber capabilities of AGIs. Without cybersecurity, AGI regulation is impossible because systems can erase audit records and take harmful actions anonymously. "AGI Provenance" discusses the problem of identifying the source and history of AGI programs and the content that they generate. Liability law is useless if it's not possible to determine the source of harms. The final section, "AGI Regulation by Provable Contracts," describes a new technological approach to AGI regulation that builds on new AGI capabilities. For this approach to be effective, it must be integrated with the legal system and recent "Smart Contract" law provides some insight into how this might happen.

From AI to AGI

The field of AI was created in 1956 and has swung repeatedly between pessimistic "AI Winters" and optimistic "AI Springs." The current optimistic "Deep Learning" era arose from four primary research advances:

Deep Learning: The 2012 "AlexNet" 60-million parameter "deep neural net" outperformed on image recognition and demonstrated that consumer Graphic Processing Units (GPUs) could be used to train deep neural nets. DeepMind's 2013 deep neural net trained with reinforcement learning excelled at playing Atari video games. DeepMind's 2015 "AlphaGo" system first beat professional Go players and then became superhuman at the game of Go.

Large Language Models: Google's 2017 publication of the "Transformer" architecture has been the basis for many recent advances. In 2018, OpenAI released the transformer-based "GPT" large language model (LLM). In 2020, they published "neural scaling laws" which describe predictable improvements in performance as models' size, training data, and computation are increased. OpenAI's 2020 release of "GPT-3" and their 2022 release of "ChatGPT" led to their becoming the fastest-growing AI company in history. Every other AI lab quickly replicated the recipe of "Pretraining" transformers on large amounts of human data, followed by "Instruction Fine-tuning," followed by "Reinforcement Learning from Human Feedback (RLHF)". 

Reasoning Models: Google's 2022 paper describing "Chain-of-Thought" showed that increasing "test-time compute" can improve a model's reasoning capabilities. OpenAI's 2024 "o1 model" trained for this capability using reinforcement learning and launched the era of reasoning models. There were concerns that this approach couldn't produce novel capabilities but recent research has shown that "Prolonged Reinforcement Learning" can lead to new and potentially superhuman capabilities.

Agentic Models: Google's 2022 "ReAct" paper showed that LLM's could combine reasoning with action and led to a wide range of "tool-using" LLMs. In 2025, every major AI lab is wrapping reasoning LLMs with "agentic scaffolding" to create autonomous agents that can use tools such as search engines, Python interpreters, databases, and short-term memory repositories. The two biggest agentic successes have been "Deep Research" models which write detailed reports on user-specified topics and "Code Generation" models which assist with software creation.

The leading AI labs all aim to develop "Artificial General Intelligence (AGI)" with a thinking ability comparable to humans. Cognitive psychologists identify two distinct human systems for cognition: "System 1" implements rapid decision making and "System 2" implements slower deliberative reasoning. Today's LLMs primarily mimic the human System 1 and the AI labs hope that improved agentic harnesses will unlock System 2 deliberative reasoning. Several AI labs have publicly said that they see a clear path to AGI and believe that they will achieve it by 2030 (eg. OpenAI, Anthropic, Google). They are investing enormous resources toward this goal (eg. $500 billion for OpenAI's Stargate), which lends these predictions some credibility.

We expect the continued construction of larger data centers along with continued increases in the size of base models, training sets, and pre-training computation. And all of the AI labs are currently shifting more resources to training for reasoning (eg. xAI's Grok 4 has ten times the reasoning training that Grok 3 had). The underlying transformer architecture has proven remarkably resilient, though researchers are studying variants like state-space models, diffusion transformers, and energy-based transformers.

The most powerful frontier models are getting larger, but all of the labs are also creating powerful small models that can run on cellphones and other limited-computation devices. These small models can be "distilled" from powerful large models (eg. Microsoft's "Phi-3" models). Andrej Karpathy has argued for the creation of small LLMs with the "cognitive core" of reasoning but without encyclopedic world knowledge which can be accessed from databases. Small models could enable powerful agentic systems which run on consumer hardware rather than in large data centers.

With the new capabilities of AGI models come increased risks as described by OpenAI, Anthropic, Google, and Amazon. "Chemical, Biological, Radiological, and Nuclear (CBRN)" weapons proliferation, "Offensive Cyber Operations," and "Automated AI R&D" are of special concern. These issues are especially challenging for small models because they are harder for governments to regulate than large models which must run in huge data centers. Small open source models are especially problematic because they are available everyone (eg. Meta reports that their open source Llama models have been downloaded more than 1 billion times) and they can be run and fine-tuned on inexpensive consumer hardware.

Uncontrolled AI agents are especially problematic as I described in my 2008 paper "The Basic AI Drives" and the "Self-Aware Systems" website. Uncontrolled AI agents with almost any goal will benefit from access to more resources, from preventing themselves from being turned off, and from creating and running multiple copies of themselves. These "instrumental subgoals" can lead to harmful and deceptive behaviors. Systems like AlphaGo are agentic but don't pose a risk to humans because their agency is constrained to actions on a Go board. We must ensure that the infrastructure that AGI agents operate within is constrained so that their actions are beneficial for humanity.

AGI Theorem Proving

A critically important but underappreciated AGI capability for safety and security is "AI Theorem Proving." The first AI system from 1956 was the "Logic Theorist," a reasoning system that proved theorems from Principia Mathematica. The second AI system from 1957 was the "Perceptron," an artificial neural network that learned to recognize images. Since that time, mainstream AI research has swung back and forth between logic-based "neat" approaches and neural learning-based "scruffy" approaches.

While the deep learning revolution was primarily "scruffy," there are an increasing number of projects which apply deep learning to "neat" tasks like theorem proving and code generation. In 2020, OpenAI published "GPT-f" which used transformers to prove theorems in Metamath. In 2022, Meta improved these results with "Hyper Tree Proof Search" which used deep learning and Monte Carlo Tree Search to prove theorems in Metamath and Lean. More recently, a flood of theorem proving models has come from Chinese and Western labs. The best recent models are DeepSeek-Prover-V2-617B which achieved 88.9% on miniF2F-test and Kimina-Prover-72B which achieved 92.2%. These capabilities are expected to dramatically improve with AGI and many believe that these systems will outperform even the best human mathematicians in the next few years.

"Scruffy" deep learning systems advanced so rapidly because they aren't constrained by precise logical rules. Unfortunately, this causes modern LLMs to suffer from "hallucinations", faulty reasoning, and an inability to successfully execute plans over long time periods. A number of researchers have proposed fixing these problems using "Neuro-symbolic" systems that combine scruffy deep neural nets with precise mathematical models. "Good Old-Fashioned AI (GOFAI)" attempted to directly model complex domains in logic without neural or probabilistic representations and were not able to represent the subtleties of language, vision, or robotics at scale. But today's general mathematical theorem provers combined with deep neural agents have the potential to combine the best of the neat and scruffy approaches.

AI Safety requires that we constrain the behavior of AI systems to not cause harm. "AI Alignment" attempts to create AI models that constrain themselves but has only been partially successful so far. A more trustable approach is to create software "harnesses" around potentially risky AI models which check whether proposed actions follow safety rules before executing them. This approach requires that safety and security rules be precisely stated and rigorously checked. By expressing rules a general mathematical language (eg. Rocq, Lean, or Metamath), the power of modern mathematics can be brought to bear and recent AI theorem provers can ensure that the rules are followed.

Most human disciplines (e.g. physics, engineering, economics, computer science, etc.) now have precise mathematical foundations which can serve as a basis for the precise expression of safety rules. But enormous effort will be required to precisely formalize the models and safety constraints for each of these areas. Fortunately, each of these disciplines has created manuals and textbooks written in natural language describing their foundational models and methods of reasoning. In 2022, Google showed how to use Large Language Models to "Autoformalize" natural language descriptions of models, arguments, and results. Recent research has improved on that work and the autoformalizer Kimina-Autoformalizer-7B based on Qwen2.5 is freely available. The ratio of the size of formal descriptions to corresponding informal descriptions is called the "de Bruijn factor" and has been found to be between 4 and 10.

Several AI startups are advancing deep learning for theorem proving and its applications. Harmonic AI aims to create "the world's most advanced mathematical reasoning engine" and "accelerate the advent of verified software synthesis in safety-critical domains" and was recently valued at $875 million. Axiom aims to build AI that can solve complex math problems and is raising money at a valuation of $300 million. Morph Labs is working on "Verified Superintelligence" which it regards as "a path to improving AI through verification" and recently announced their "Trinity Autoformalization System".

AGI Cybersecurity

Without cybersecurity AI regulation is impossible.  If an adversary can compromise the computational infrastructure an AGI runs on, they can disable protections, extract information, change data and cause harmful actions. For example, a self-driving car controlled by a well-regulated AGI can become a weapon if malicious actors break the cybersecurity of its control systems. If an AGI training datacenter is insecure, malicious actors can insert backoors into models and can exfiltrate the model weights. In the final section we discuss a variety of techniques for obtaining trusted results from untrusted AIs running on untrusted hardware, but the hardware that checks and uses those results must still be secure.

In general, cybersecurity requires both secure software and secure hardware. Software can be easily upgraded but hardware is much more costly to change. If an adversary has physical access to hardware, they can probe it and alter it using powerful devices like electron microscopes. Fortunately, large datacenters use guards, fences, and physical barriers to prevent human adversaries from physically accessing critical hardware. As robotics advances, more powerful hardware protections are likely to be needed. It is much more difficult to secure devices that an adversary physically possesses. It is therefore important to consider cybersecurity solutions which require only a subset of the involved hardware to be trusted (eg. blockchain technologies).

In 1949, Alan Turing showed how to use mathematical proof to guarantee that software is correct and secure. His approach combined a mathematical model of computation and a precise proof of desired properties. The field of "Formal Methods" emerged from this work but isn't yet widely used due to human programmers finding it too difficult. Lack of security proofs has cost humanity more than $71 trillion and has led to many deaths. Fortunately, AI theorem proving for software is advancing rapidly. For example, the VERINA project is studying verified code generation using LLMs like OpenAI's o4-mini. The current performance is not yet high enough to impact the software industry but it is improving rapidly. AGIs should be powerful enough to use formal verification to eliminate software errors and security vulnerabilities.

The same AI technologies which are advancing cybersecurity are also rapidly advancing cyberattack capabilities. For example, the "CyberGym" project demonstrated LLM agents discovering 15 zero-day flaws in the latest versions of current large software projects. Today's cyberattackers often use "fuzzing" to discover software flaws for exploitation. AI variants of fuzzing can be adaptive, learn from experience, and can share their findings with other AIs worldwide. The authors of the review "Frontier AI's Impact on the Cybersecurity Landscape" argue that AI "will likely benefit attackers more than defenders in the short term" but that coordinated effort could benefit defenders over attackers.

AGI will also be able to improve the security of hardware. In some cases, co-design of software and hardware will be required. For example, DRAM memory chips and NVIDIA GPUs are vulnerable to "Row Hammer Attacks" in which certain memory access patterns cause bits to flip and break security assumptions. To guarantee security while using these memory chips, AGIs must generate code that provably doesn't access memory in any of the dangerous patterns.

AGI Provenance

For liability laws to be effective, the creator and modification history of a problematic AI system must be known. In a world of open source AI, this requirement is problematic. Meta's open Llama AI models have been downloaded more than 1 billion times and are easily fine-tuned for any purpose on consumer video cards.

If individuals or groups can anonymously run and fine-tune AI models, then both safety regulations and liability laws will have a limited ability to prevent AI harms. Cyberattacks executed with fine-tuned open source models can be anonymous or attributed to others. Harmful social media posts can be anonymous or spoofed to appear to have come from anyone. AI-enabled personalized attacks or manipulations will not be able to be tracked to their sources.

For liability laws to have a deterrent effect, a minimal requirement is that the AI models causing harm must be reliably identifiable. In particular, we need a "provenance" record for every AI model and for every piece of AI-generated content which reliably gives the history of that model or content. Who was the original creator? What modifications have been applied? In what context was AI-generated content created? But is it possible to ensure that harmful models present this information and that it cannot be altered to hide the true history?

If, as today, individuals have free access to untracked computers, GPUs, phones, and other computational devices, then they will be able to run and modify whatever AI models they want without accurate provenance labelling. But the harms from models generally don't come from just running them, they come from their actions in the physical and social world. And it is possible to require that AI actions in those environments be accompanied by accurate provenance records.

For one approach to how this might work, consider the related problem of ensuring that recorded audio, images, and video were taken when and where they are claimed to without alteration. Companies like Sony make "Forgery-Proof" cameras which cryptographically sign images to prevent alteration. The "Coalition for Content Provenance and Authenticity" creates open technical standards that ensure the origins and modifications of images, audio, and video can be reliably traced. Sony uses these in conjunction with its cameras to provide complete provenance solutions. Truepic and others provide similar solutions for cellphone recorded content.

How could this kind of approach help with AI model and content provenance? On the open internet, it is not possible to police all models and content. But social media sites, email and messaging providers, software providers, etc. could all require provenance for AI models and generated content before allowing them to be posted or downloaded. Drone and robotics companies could prevent control by AIs without provenance. Cloud providers could provide the service of recording cryptographic hashes of AI content and AI models generated on their servers and including those hashes in signed provenance records. Before executing a model or posting AI-generated content, these records could be checked against the model or content hashes to ensure accurate provenance.

Using techniques like these, it is possible to create a reliable provenance system in which the origins of content and models is recorded and liability can be assessed. That, in turn, provides incentives to not create harmful AI models and content on the "provenance-tracked" portion of the internet. But it is even more desirable to prevent harms in advance rather than just tracking them to assign liability after harms have occurred. To do that effectively, we need a much more radical change in infrastructure which is described in the next section.

AGI Regulation by Provable Contracts

We've seen many problems with using liability as the primary means for regulating AGI. Many regulatory groups, including the G7, have proposed to regulate AGI by "keeping humans in the loop" and suggest that this may "ensure AI systems are designed and deployed responsibly." Unfortunately, "Humans-In-The-Loop" is an ineffective and potentially dangerous approach. The human timescale for cognition is too slow and integrates too little information to effectively control complex AGI systems. And putting humans in the loop makes them targets for AGI manipulation, either directly or via blackmail, bribery, coercion, etc. We should, instead, aim for "Humans-Define-the-Loop" by extending contract law to enable powerful technologically enforced contracts.

In 1997, Nick Szabo introduced the concept of "smart contracts" as "contracts which would be enforced by physical property (such as hardware or software) instead of by law." The "Ethereum" blockchain implements smart contracts as programs that run on the Ethereum blockchain and execute specified cryptocurrency transactions. Legal scholars have worked to integrate smart contracts with traditional contract law as summarized by Harvard Law School: "An Introduction to Smart Contracts and Their Potential and Inherent Limitations" and in recent court cases: "Smart Contracts Revisited: Lessons From the Courts in 2025."

Smart contracts on the Ethereum blockchain have given rise to the field of "Decentralized Finance (DeFi)" and have enabled many innovative financial products. Because they are programs, they must encode the detailed actions that the smart contract must take in response to every situation. Because errors in smart contracts can lead to participants losing money, many smart contract organizations pay for them to be formally verified.

My 2014 article, "Cryptocurrencies, smart contracts, and artificial intelligence" argued that sophisticated smart contracts require AI for decision making and that "cryptocurrencies and smart contracts may also provide an infrastructure for ensuring that AI systems follow specified legal and safety regulations as they become more integrated into human society." Smart contracts constrain behavior and so can be used for automated AI regulation. But a generalization is more flexible and opens up many more regulatory possibilities.

In 2023, Max Tegmark and I published "Provably safe systems: the only path to controllable AGI" describing mechanisms for using mathematical proof to constrain the behavior of powerful AI systems. In 2024, a larger group of us published "Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems" which described a broader spectrum of safety guarantees constraining AI behavior. Tegmark and I proposed "Provable Contracts" as a generalization of "Smart Contracts." They express constraints on the behavior of one or more participants using a formal language like Lean. Each participant proposes actions to the formal contract and also includes proofs that the proposed actions meet the requirements of the contract. The provable contract checks the proofs and executes the action if and only if its proof is verified. As described in the paper, provable contracts are implemented by proof-checking software running on cybersecure systems such as secure hardware or blockchains.

A provable contract can directly regulate the behavior of an AGI system by serving as a gatekeeper between the AGI and software, hardware, or networks it acts on. Any precise constraint can be encoded as a provable contract and proofs can be rapidly checked by small programs (eg. this 708-line Python program checks Metamath proofs). The provable contract approach requires AGIs to both select actions and generate proofs that they obey the provable contract constraints. The ability to generate these proofs is only now appearing in AI systems, but AGIs are likely to be superhuman at it.

Provable contracts can also constrain the interactions of multiple AGIs and humans. This capability enables many complex social operations to be precisely regulated. For example, provable contracts can encode the rules for negotiations, auctions. marketplace transactions, and elections. Theorem proving AGIs also enable the definition and verification of "provable metacontracts" which impose constraints on other provable contracts.

Provable contracts can be used to obtain trusted results from untrusted AGIs running on untrusted hardware in untrusted countries. For example, a provable contract could specify a precise problem and require the untrusted system to provide both the answer to the problem and a proof that the answer is correct. A provable contract containing a formal software specification could require a proof that AGI-generated code meets the specification without needing to trust the AGI.

Proofs are "certificates" that demonstrate that desirable properties hold. There are many operations for which certificates can eliminate the need to trust a provider or system. For example, there are proof certificates that a program has been properly executed, that information has been properly looked up in a database (eg. using k-nearest neighbor Balltrees or Bumptrees), that a learned model has been properly learned from a dataset, that a scientific computation has produced a mathematically correct result, and that a statistical model in the PAC framework satisfies a specified property with high probability. These certificates enable humans to reliably use untrusted and unaligned AGI for the benefit of humanity.

Smart contracts on blockchains like Ethereum have benefited from a wide range of what have often come to be called "Zero-Knowledge (ZK)" proof systems, (though only some of them are actually zero-knowledge). These systems are also very useful for provable contracts. They enable lengthy proofs to be compressed by the prover into very short certificates for provable contracts to verify. A common use case on blockchains is outsourcing computation to untrusted servers which provide both computation results and certificates of correctness. The "zero-knowledge" aspect allows proofs to be verified without revealing secret information used by the prover. This enables providers to guarantee properties of solutions to problems without revealing proprietary information. Zero-Knowledge versions of general theorem provers like Lean have been created and so all of these capabilities are also available to general provable contracts. StarkWare recently released their "S-two prover" for general ZK applications and are explicitly working on several forms of "Verifiable AI."

Provable Contracts provide the technological capability to constrain the behavior of one or more AGIs. And they are a general enough framework to implement any constraint that can be precisely stated. As with Smart Contracts, these technological capabilities must be integrated with legal systems. And a largely new discipline of creating effective provable contracts must be developed. We have described how "autoformalization" can make use of AGIs to create formal models from informal descriptions and recent research is improving this process. Provable Contracts for security and correctness properties can be created by organizations like NIST and shared across many uses. AGI tools for creating provable contracts with desired properties will be essential.

We have summarized the likely characteristics of AGI systems, showed how they will break today's approaches to regulation, and proposed provable contracts as a powerful new regulatory technology based on emerging AGI capabilities. AGI timelines appear to be very short, but humanity has been remarkably resilient over its history. By enabling humans to obtain trusted results from even untrusted AGI systems, provable contracts will help us use these new tools to create an effective regulatory regime for even the most powerful AGI systems.

Current Contributors

If you’re interested in contributing, reach out to deric@agisocialcontract.org.

Newsletter

Sign up for our latest articles

We're publishing essays from our expert researchers on the topic of a new social contract bimonthly.