Tom Everitt: Publications

More details

53

Reasoning about causality in games
with Lewis Hammond, James Fox, Ryan Carey, Alessandro Abate, and Michael Wooldridge

Artificial Intelligence 320 (C): 103919. 2023.

Causal reasoning and game-theoretic reasoning are fundamental topics in artificial intelligence, among many other disciplines: this paper is concerned with their intersection. Despite their importance, a formal framework that supports both these forms of reasoning has, until now, been lacking. We offer a solution in the form of (structural) causal games, which can be seen as extending Pearl's causal hierarchy to the game-theoretic domain, or as extending Koller and Milch's multi-agent influence …Read more
Causal reasoning and game-theoretic reasoning are fundamental topics in artificial intelligence, among many other disciplines: this paper is concerned with their intersection. Despite their importance, a formal framework that supports both these forms of reasoning has, until now, been lacking. We offer a solution in the form of (structural) causal games, which can be seen as extending Pearl's causal hierarchy to the game-theoretic domain, or as extending Koller and Milch's multi-agent influence diagrams to the causal domain. We then consider three key questions: i) How can the (causal) dependencies in games - either between variables, or between strategies - be modelled in a uniform, principled manner? ii) How may causal queries be computed in causal games, and what assumptions does this require? iii) How do causal games compare to existing formalisms? To address question i), we introduce mechanised games, which encode dependencies between agents' decision rules and the distributions governing the game. In response to question ii), we present definitions of predictions, interventions, and counterfactuals, and discuss the assumptions required for each. Regarding question iii), we describe correspondences between causal games and other formalisms, and explain how causal games can be used to answer queries that other causal or game-theoretic models do not support. Finally, we highlight possible applications of causal games, aided by an extensive open-source Python library.

Science, Logic, and Mathematics
39

Discovering agents
with Zachary Kenton, Ramana Kumar, Sebastian Farquhar, Jonathan Richens, and Matt MacDermott

Artificial Intelligence 322 (C): 103963. 2023.

Causal models of agents have been used to analyse the safety aspects of machine learning systems. But identifying agents is non-trivial -- often the causal model is just assumed by the modeler without much justification -- and modelling failures can lead to mistakes in the safety analysis. This paper proposes the first formal causal definition of agents -- roughly that agents are systems that would adapt their policy if their actions influenced the world in a different way. From this we derive t…Read more
Causal models of agents have been used to analyse the safety aspects of machine learning systems. But identifying agents is non-trivial -- often the causal model is just assumed by the modeler without much justification -- and modelling failures can lead to mistakes in the safety analysis. This paper proposes the first formal causal definition of agents -- roughly that agents are systems that would adapt their policy if their actions influenced the world in a different way. From this we derive the first causal discovery algorithm for discovering agents from empirical data, and give algorithms for translating between causal models and game-theoretic influence diagrams. We demonstrate our approach by resolving some previous confusions caused by incorrect causal modelling of agents.

Computer Science
28

Classification by decomposition: a novel approach to classification of symmetric $$2\times 2$$ games
with Mikael Böörs, Tobias Wängberg, and Marcus Hutter

Theory and Decision 93 (3): 463-508. 2022.

In this paper, we provide a detailed review of previous classifications of 2 × 2 games and suggest a mathematically simple way to classify the symmetric 2 × 2 games based on a decomposition of the payoff matrix into a cooperative and a zero-sum part. We argue that differences in the interaction between the parts is what makes games interesting in different ways. Our claim is supported by evolutionary computer experiments and findings in previous literature. In addition, we provide a method for u…Read more
In this paper, we provide a detailed review of previous classifications of 2 × 2 games and suggest a mathematically simple way to classify the symmetric 2 × 2 games based on a decomposition of the payoff matrix into a cooperative and a zero-sum part. We argue that differences in the interaction between the parts is what makes games interesting in different ways. Our claim is supported by evolutionary computer experiments and findings in previous literature. In addition, we provide a method for using a stereographic projection to create a compact 2-d representation of the game space.
18

Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective
with Marcus Hutter, Ramana Kumar, and Victoria Krakovna

Synthese 198 (Suppl 27): 6435-6467. 2021.

Can humans get arbitrarily capable reinforcement learning agents to do their bidding? Or will sufficiently capable RL agents always find ways to bypass their intended objectives by shortcutting their reward signal? This question impacts how far RL can be scaled, and whether alternative paradigms must be developed in order to build safe artificial general intelligence. In this paper, we study when an RL agent has an instrumental goal to tamper with its reward process, and describe design principl…Read more
Can humans get arbitrarily capable reinforcement learning agents to do their bidding? Or will sufficiently capable RL agents always find ways to bypass their intended objectives by shortcutting their reward signal? This question impacts how far RL can be scaled, and whether alternative paradigms must be developed in order to build safe artificial general intelligence. In this paper, we study when an RL agent has an instrumental goal to tamper with its reward process, and describe design principles that prevent instrumental goals for two different types of reward tampering. Combined, the design principles can prevent reward tampering from being an instrumental goal. The analysis benefits from causal influence diagrams to provide intuitive yet precise formalizations.

Philosophy of Action, Misc Machine Learning

Tom Everitt

Reasoning about causality in games
with Lewis Hammond, James Fox, Ryan Carey, Alessandro Abate, and Michael Wooldridge

Artificial Intelligence 320 (C): 103919. 2023.

Discovering agents
with Zachary Kenton, Ramana Kumar, Sebastian Farquhar, Jonathan Richens, and Matt MacDermott

Artificial Intelligence 322 (C): 103963. 2023.

Classification by decomposition: a novel approach to classification of symmetric $$2\times 2$$ games
with Mikael Böörs, Tobias Wängberg, and Marcus Hutter

Theory and Decision 93 (3): 463-508. 2022.

Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective
with Marcus Hutter, Ramana Kumar, and Victoria Krakovna

Synthese 198 (Suppl 27): 6435-6467. 2021.

Tom Everitt

Reasoning about causality in games with Lewis Hammond, James Fox, Ryan Carey, Alessandro Abate, and Michael Wooldridge Artificial Intelligence 320 (C): 103919. 2023.

Discovering agents with Zachary Kenton, Ramana Kumar, Sebastian Farquhar, Jonathan Richens, and Matt MacDermott Artificial Intelligence 322 (C): 103963. 2023.

Classification by decomposition: a novel approach to classification of symmetric $$2\times 2$$ games with Mikael Böörs, Tobias Wängberg, and Marcus Hutter Theory and Decision 93 (3): 463-508. 2022.

Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective with Marcus Hutter, Ramana Kumar, and Victoria Krakovna Synthese 198 (Suppl 27): 6435-6467. 2021.

Reasoning about causality in games
with Lewis Hammond, James Fox, Ryan Carey, Alessandro Abate, and Michael Wooldridge

Artificial Intelligence 320 (C): 103919. 2023.

Discovering agents
with Zachary Kenton, Ramana Kumar, Sebastian Farquhar, Jonathan Richens, and Matt MacDermott

Artificial Intelligence 322 (C): 103963. 2023.

Classification by decomposition: a novel approach to classification of symmetric $$2\times 2$$ games
with Mikael Böörs, Tobias Wängberg, and Marcus Hutter

Theory and Decision 93 (3): 463-508. 2022.

Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective
with Marcus Hutter, Ramana Kumar, and Victoria Krakovna

Synthese 198 (Suppl 27): 6435-6467. 2021.