•  1023
    Towards Shutdownable Agents via Stochastic Choice
    with Elliott Thornley, Christos Ziakas, Leyton Ho, and Louis Thomson
    Technical A.I. Safety Conference. 2025.
    The POST-Agents Proposal (PAP) is an idea for ensuring that advanced artificial agents never resist shutdown. A key part of the PAP is using a novel ‘Discounted Reward for Same-Length Trajectories (DReST)’ reward function to train agents to (1) pursue goals effectively conditional on each trajectory-length (be ‘USEFUL’), and (2) choose stochastically between different trajectorylengths (be ‘NEUTRAL’ about trajectory-lengths). In this paper, we propose evaluation metrics for USEFULNESS and NEUTRA…Read more