•  71
    Learning and Sustaining Shared Normative Systems via Bayesian Rule Induction in Markov Games
    with Zhi-Xuan Tan
    Aamas '24: Proceedings of the 23Rd International Conference on Autonomous Agents and Multiagent Systems 2024 1510-1520. 2024.
    A universal feature of human societies is the adoption of systems of rules and norms in the service of cooperative ends. How can we build learning agents that do the same, so that they may flexibly cooperate with the human institutions they are embedded in? We hypothesize that agents can achieve this by assuming there exists a shared set of norms that most others comply with while pursuing their individual desires, even if they do not know the exact content of those norms. By assuming shared nor…Read more
  •  107
    Realist and Pluralist Conceptions of Intelligence and Their Implications on AI Research
    Proceedings of the AAAI Conference on Artificial Intelligence 40 (44): 37791-37801. 2026.
    In this paper, we argue that current AI (alignment) research operates on a spectrum between two different underlying conceptions of intelligence: Intelligence Realism, which holds that intelligence represents a single, universal capacity measurable across all systems, and Intelligence Pluralism, which views intelligence as diverse, context-dependent capacities that cannot be reduced to a single universal measure. Through an analysis of current debates in AI research, we demonstrate how the conce…Read more
  •  81
    The Stories We Govern By: AI, Risk, and the Power of Imaginaries
    Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 6 (2): 1939-1950. 2025.
    This paper examines how competing sociotechnical imaginaries of artificial intelligence (AI) risk shape governance decisions and regulatory constraints. Drawing on concepts from science and technology studies, we analyse three dominant narrative groups: existential risk proponents, who emphasise catastrophic AGI scenarios; accelerationists, who portray AI as a transformative force to be unleashed; and critical AI scholars, who foreground present-day harms rooted in systemic inequality. Through a…Read more
  •  129
    Every explanation faces a trade-off between informativeness and compression (Kinney and Lombrozo, 2022). On the one hand, we want to aim for as much detailed and correct information as possible, informativeness, on the other hand, we want to ensure that a human can process and comprehend the explanation, compression. Current methods in eXplainable AI (XAI) try to satisfy this trade-off statically, outputting one fixed, non-adjustable explanation that sits somewhere on the spectrum between inform…Read more
  •  967
    Mechanistic interpretability (MI) aims to explain how neural networks work by uncovering their underlying mechanisms. As the field grows in influence, it is increasingly important to examine not just models themselves, but the assumptions, concepts and explanatory strategies implicit in MI research. We argue that mechanistic interpretability needs philosophy as an ongoing partner in clarifying its concepts, refining its methods, and navigating the epistemic and ethical complexities of interpreti…Read more