Ramana Kumar: Publications

More details

38

Discovering agents
with Zachary Kenton, Sebastian Farquhar, Jonathan Richens, Matt MacDermott, and Tom Everitt

Artificial Intelligence 322 (C): 103963. 2023.

Causal models of agents have been used to analyse the safety aspects of machine learning systems. But identifying agents is non-trivial -- often the causal model is just assumed by the modeler without much justification -- and modelling failures can lead to mistakes in the safety analysis. This paper proposes the first formal causal definition of agents -- roughly that agents are systems that would adapt their policy if their actions influenced the world in a different way. From this we derive t…Read more
Causal models of agents have been used to analyse the safety aspects of machine learning systems. But identifying agents is non-trivial -- often the causal model is just assumed by the modeler without much justification -- and modelling failures can lead to mistakes in the safety analysis. This paper proposes the first formal causal definition of agents -- roughly that agents are systems that would adapt their policy if their actions influenced the world in a different way. From this we derive the first causal discovery algorithm for discovering agents from empirical data, and give algorithms for translating between causal models and game-theoretic influence diagrams. We demonstrate our approach by resolving some previous confusions caused by incorrect causal modelling of agents.

Computer Science
18

Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective
with Tom Everitt, Marcus Hutter, and Victoria Krakovna

Synthese 198 (Suppl 27): 6435-6467. 2021.

Can humans get arbitrarily capable reinforcement learning agents to do their bidding? Or will sufficiently capable RL agents always find ways to bypass their intended objectives by shortcutting their reward signal? This question impacts how far RL can be scaled, and whether alternative paradigms must be developed in order to build safe artificial general intelligence. In this paper, we study when an RL agent has an instrumental goal to tamper with its reward process, and describe design principl…Read more
Can humans get arbitrarily capable reinforcement learning agents to do their bidding? Or will sufficiently capable RL agents always find ways to bypass their intended objectives by shortcutting their reward signal? This question impacts how far RL can be scaled, and whether alternative paradigms must be developed in order to build safe artificial general intelligence. In this paper, we study when an RL agent has an instrumental goal to tamper with its reward process, and describe design principles that prevent instrumental goals for two different types of reward tampering. Combined, the design principles can prevent reward tampering from being an instrumental goal. The analysis benefits from causal influence diagrams to provide intuitive yet precise formalizations.

Philosophy of Action, Misc Machine Learning

Ramana Kumar

Discovering agents
with Zachary Kenton, Sebastian Farquhar, Jonathan Richens, Matt MacDermott, and Tom Everitt

Artificial Intelligence 322 (C): 103963. 2023.

Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective
with Tom Everitt, Marcus Hutter, and Victoria Krakovna

Synthese 198 (Suppl 27): 6435-6467. 2021.

Ramana Kumar

Discovering agents with Zachary Kenton, Sebastian Farquhar, Jonathan Richens, Matt MacDermott, and Tom Everitt Artificial Intelligence 322 (C): 103963. 2023.

Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective with Tom Everitt, Marcus Hutter, and Victoria Krakovna Synthese 198 (Suppl 27): 6435-6467. 2021.

Discovering agents
with Zachary Kenton, Sebastian Farquhar, Jonathan Richens, Matt MacDermott, and Tom Everitt

Artificial Intelligence 322 (C): 103963. 2023.

Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective
with Tom Everitt, Marcus Hutter, and Victoria Krakovna

Synthese 198 (Suppl 27): 6435-6467. 2021.