•  139
    Reasoning about causality in games
    with Lewis Hammond, James Fox, Ryan Carey, Alessandro Abate, and Michael Wooldridge
    Artificial Intelligence 320 (C): 103919. 2023.
    Causal reasoning and game-theoretic reasoning are fundamental topics in artificial intelligence, among many other disciplines: this paper is concerned with their intersection. Despite their importance, a formal framework that supports both these forms of reasoning has, until now, been lacking. We offer a solution in the form of (structural) causal games, which can be seen as extending Pearl's causal hierarchy to the game-theoretic domain, or as extending Koller and Milch's multi-agent influence …Read more
  •  93
    Discovering agents
    with Zachary Kenton, Ramana Kumar, Sebastian Farquhar, Jonathan Richens, and Matt MacDermott
    Artificial Intelligence 322 (C): 103963. 2023.
    Causal models of agents have been used to analyse the safety aspects of machine learning systems. But identifying agents is non-trivial -- often the causal model is just assumed by the modeler without much justification -- and modelling failures can lead to mistakes in the safety analysis. This paper proposes the first formal causal definition of agents -- roughly that agents are systems that would adapt their policy if their actions influenced the world in a different way. From this we derive t…Read more
  •  24
    Can humans get arbitrarily capable reinforcement learning agents to do their bidding? Or will sufficiently capable RL agents always find ways to bypass their intended objectives by shortcutting their reward signal? This question impacts how far RL can be scaled, and whether alternative paradigms must be developed in order to build safe artificial general intelligence. In this paper, we study when an RL agent has an instrumental goal to tamper with its reward process, and describe design principl…Read more
  •  118
    Classification by decomposition: a novel approach to classification of symmetric $$2\times 2$$ games
    with Mikael Böörs, Tobias Wängberg, and Marcus Hutter
    Theory and Decision 93 (3): 463-508. 2022.
    In this paper, we provide a detailed review of previous classifications of 2 × 2 games and suggest a mathematically simple way to classify the symmetric 2 × 2 games based on a decomposition of the payoff matrix into a cooperative and a zero-sum part. We argue that differences in the interaction between the parts is what makes games interesting in different ways. Our claim is supported by evolutionary computer experiments and findings in previous literature. In addition, we provide a method for u…Read more