Aumann's Agreement Theorem (1976) establishes that two Bayesian rational agents with common priors and common knowledge of each other's posterior beliefs cannot agree to disagree. Their posteriors must coincide. This paper applies Aumann's framework to AI agents built on large language models (LLMs), a domain in which the theorem's conditions appear, at first glance, to be unusually well satisfied. LLMs trained on overlapping data are often assumed to share something like common priors, and in m…
Read moreAumann's Agreement Theorem (1976) establishes that two Bayesian rational agents with common priors and common knowledge of each other's posterior beliefs cannot agree to disagree. Their posteriors must coincide. This paper applies Aumann's framework to AI agents built on large language models (LLMs), a domain in which the theorem's conditions appear, at first glance, to be unusually well satisfied. LLMs trained on overlapping data are often assumed to share something like common priors, and in multi-agent protocols their outputs are shared between participants. Yet LLM-based agents routinely produce divergent outputs on identical inputs, and multi-agent systems built from them are increasingly deployed in debate, deliberation, and consensus protocols that implicitly treat this divergence as epistemically meaningful. We argue that Aumann's theorem fails to apply to these agents not because the prior or rationality conditions are violated in the familiar ways they are violated for humans, but for a more fundamental reason: LLMs do not possess beliefs in the sense the theorem requires. Their outputs are samples from conditional probability distributions over token sequences, not reports of posterior probabilities conditioned on private information. We formalize the distinction between genuine disagreement, which carries epistemic content because it signals the existence of unshared evidence, and what we term pseudo-disagreement, which has the surface form of disagreement but arises from stochastic variation in generation processes that lack epistemic states. We show formally that pseudo-disagreement does not satisfy the informational conditions that make genuine disagreement epistemically valuable, and we trace the implications for multi-agent debate protocols, consensus methods, LLM-as-judge paradigms, and the broader practice of treating AI outputs as bearing on questions of truth. Our analysis applies specifically to autoregressive language models and the multi-agent systems built from them; AI systems with fundamentally different architectures, such as those maintaining explicit world models or calibrated Bayesian uncertainty estimates, may require separate treatment.