Ninell Oldenburg (University of Copenhagen): Publications

More details

University of Copenhagen

Doctoral student

Homepage

Areas of Specialization

Impact of Artificial Intelligence

The Nature of Artificial Intelligence

Standpoint Epistemology

Cognitive Sciences, Misc

Areas of Interest

Artificial Intelligence Safety

Impact of Artificial Intelligence

Standpoint Epistemology

The Nature of Artificial Intelligence

Cognitive Sciences, Misc

89

Learning and Sustaining Shared Normative Systems via Bayesian Rule Induction in Markov Games
with Zhi-Xuan Tan

Aamas '24: Proceedings of the 23Rd International Conference on Autonomous Agents and Multiagent Systems 2024 1510-1520. 2024.

A universal feature of human societies is the adoption of systems of rules and norms in the service of cooperative ends. How can we build learning agents that do the same, so that they may flexibly cooperate with the human institutions they are embedded in? We hypothesize that agents can achieve this by assuming there exists a shared set of norms that most others comply with while pursuing their individual desires, even if they do not know the exact content of those norms. By assuming shared nor…Read more
A universal feature of human societies is the adoption of systems of rules and norms in the service of cooperative ends. How can we build learning agents that do the same, so that they may flexibly cooperate with the human institutions they are embedded in? We hypothesize that agents can achieve this by assuming there exists a shared set of norms that most others comply with while pursuing their individual desires, even if they do not know the exact content of those norms. By assuming shared norms, a newly introduced agent can infer the norms of an existing population from observations of compliance and violation. Furthermore, groups of agents can converge to a shared set of norms, even if they initially diverge in their beliefs about what the norms are. This in turn enables the stability of the normative system: since agents can bootstrap common knowledge of the norms, this leads the norms to be widely adhered to, enabling new entrants to rapidly learn those norms. We formalize this framework in the context of Markov games and demonstrate its operation in a multi-agent environment via approximately Bayesian rule induction of obligative and prohibitive norms. Using our approach, agents are able to rapidly learn and sustain a variety of cooperative institutions, including resource management norms and compensation for pro-social labor, promoting collective welfare while still allowing agents to act in their own interests.

Computer Science Systems Science Cognitive Sciences, Misc
130

Realist and Pluralist Conceptions of Intelligence and Their Implications on AI Research
with Ruchira Dhar and Anders Søgaard

Proceedings of the AAAI Conference on Artificial Intelligence 40 (44): 37791-37801. 2026.

In this paper, we argue that current AI (alignment) research operates on a spectrum between two different underlying conceptions of intelligence: Intelligence Realism, which holds that intelligence represents a single, universal capacity measurable across all systems, and Intelligence Pluralism, which views intelligence as diverse, context-dependent capacities that cannot be reduced to a single universal measure. Through an analysis of current debates in AI research, we demonstrate how the conce…Read more
In this paper, we argue that current AI (alignment) research operates on a spectrum between two different underlying conceptions of intelligence: Intelligence Realism, which holds that intelligence represents a single, universal capacity measurable across all systems, and Intelligence Pluralism, which views intelligence as diverse, context-dependent capacities that cannot be reduced to a single universal measure. Through an analysis of current debates in AI research, we demonstrate how the conceptions remain largely implicit yet fundamentally shape how empirical evidence gets interpreted across a wide range of areas. More significantly, the underlying views generate fundamentally different research strands across three areas. Methodologically, they produce different approaches to model selection, benchmark design, and experimental validation. Interpretively, they lead to contradictory readings of scaling laws and system limitations. Regarding AI risk, they generate categorically different assessments of risk and alignment approaches: the ones viewing superintelligence as the biggest risk and searching for unified alignment solutions, the others seeing different threats in many different domains and searching for context-specific solutions. We argue that making explicit these underlying assumptions can contribute to a clearer understanding of the disagreements in this research space and, potentially, a more context-sensitive approach to alignment research.

Epistemology, Miscellaneous Philosophy, General Works Cognitive Sciences, Misc
109

The Stories We Govern By: AI, Risk, and the Power of Imaginaries
with Gleb Papyshev

Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 6 (2): 1939-1950. 2025.

This paper examines how competing sociotechnical imaginaries of artificial intelligence (AI) risk shape governance decisions and regulatory constraints. Drawing on concepts from science and technology studies, we analyse three dominant narrative groups: existential risk proponents, who emphasise catastrophic AGI scenarios; accelerationists, who portray AI as a transformative force to be unleashed; and critical AI scholars, who foreground present-day harms rooted in systemic inequality. Through a…Read more
This paper examines how competing sociotechnical imaginaries of artificial intelligence (AI) risk shape governance decisions and regulatory constraints. Drawing on concepts from science and technology studies, we analyse three dominant narrative groups: existential risk proponents, who emphasise catastrophic AGI scenarios; accelerationists, who portray AI as a transformative force to be unleashed; and critical AI scholars, who foreground present-day harms rooted in systemic inequality. Through an analysis of representative manifesto-style texts, we explore how these imaginaries differ across four dimensions: normative visions of the future, diagnoses of the present social order, views on science and technology, and perceived human agency in managing AI risks. Our findings reveal how these narratives embed distinct assumptions about risk and have the potential to progress into policy-making processes by narrowing the space for alternative governance approaches. We argue against speculative dogmatism and for moving beyond deterministic imaginaries toward regulatory strategies that are grounded in pragmatism.

Political Science Business Ethics and Public Policy Social Sciences, Misc Sociology
152

Navigating the informativeness-compression trade-off in XAI
with Anders Søgaard

AI and Ethics 5 (5): 925-4942. 2025.

Every explanation faces a trade-off between informativeness and compression (Kinney and Lombrozo, 2022). On the one hand, we want to aim for as much detailed and correct information as possible, informativeness, on the other hand, we want to ensure that a human can process and comprehend the explanation, compression. Current methods in eXplainable AI (XAI) try to satisfy this trade-off statically, outputting one fixed, non-adjustable explanation that sits somewhere on the spectrum between inform…Read more
Every explanation faces a trade-off between informativeness and compression (Kinney and Lombrozo, 2022). On the one hand, we want to aim for as much detailed and correct information as possible, informativeness, on the other hand, we want to ensure that a human can process and comprehend the explanation, compression. Current methods in eXplainable AI (XAI) try to satisfy this trade-off statically, outputting one fixed, non-adjustable explanation that sits somewhere on the spectrum between informativeness and compression. However, some current XAI methods fail to meet the expectations of users and developers such that several failures have been reported in the literature which often come with user-specific knowledge gaps and good-enough understanding. In this work, we propose Dynamic XAI to navigate the trade-off interactively. We argue how this simple idea can help overcome the trade-off by eliminating gaps in user-specific understanding and preventing misunderstandings. We conclude by situating our approach within the broader ethical considerations around XAI.

Philosophy, Misc Other Academic Areas Science, Logic, and Mathematics
1015

Mechanistic Interpretability Needs Philosophy
with Iwan Williams, Ruchira Dhar, Joshua Hatherley, Constanza Fierro, Sandrine R. Schiller, Filippos Stamatiou, and Anders Søgaard

Mechanistic interpretability (MI) aims to explain how neural networks work by uncovering their underlying mechanisms. As the field grows in influence, it is increasingly important to examine not just models themselves, but the assumptions, concepts and explanatory strategies implicit in MI research. We argue that mechanistic interpretability needs philosophy as an ongoing partner in clarifying its concepts, refining its methods, and navigating the epistemic and ethical complexities of interpreti…Read more
Mechanistic interpretability (MI) aims to explain how neural networks work by uncovering their underlying mechanisms. As the field grows in influence, it is increasingly important to examine not just models themselves, but the assumptions, concepts and explanatory strategies implicit in MI research. We argue that mechanistic interpretability needs philosophy as an ongoing partner in clarifying its concepts, refining its methods, and navigating the epistemic and ethical complexities of interpreting AI systems. There is significant unrealised potential for progress in MI to be gained through deeper engagement with philosophers and philosophical frameworks. Taking three open problems from the MI literature as examples, this paper illustrates the value philosophy can add to MI research, and outlines a path toward deeper interdisciplinary dialogue.

Natural Language Processing Artificial Intelligence Safety Interpretability in Artificial Intelligence Read more
Natural Language Processing Artificial Intelligence Safety Interpretability in Artificial Intelligence Explainability in Artificial Intelligence Large Language Models Representation in Connectionism Artificial Intelligence Methodology

Ninell Oldenburg

Learning and Sustaining Shared Normative Systems via Bayesian Rule Induction in Markov Games
with Zhi-Xuan Tan

Aamas '24: Proceedings of the 23Rd International Conference on Autonomous Agents and Multiagent Systems 2024 1510-1520. 2024.

Realist and Pluralist Conceptions of Intelligence and Their Implications on AI Research
with Ruchira Dhar and Anders Søgaard

Proceedings of the AAAI Conference on Artificial Intelligence 40 (44): 37791-37801. 2026.

The Stories We Govern By: AI, Risk, and the Power of Imaginaries
with Gleb Papyshev

Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 6 (2): 1939-1950. 2025.

Navigating the informativeness-compression trade-off in XAI
with Anders Søgaard

AI and Ethics 5 (5): 925-4942. 2025.

Mechanistic Interpretability Needs Philosophy
with Iwan Williams, Ruchira Dhar, Joshua Hatherley, Constanza Fierro, Sandrine R. Schiller, Filippos Stamatiou, and Anders Søgaard

Ninell Oldenburg

Learning and Sustaining Shared Normative Systems via Bayesian Rule Induction in Markov Games with Zhi-Xuan Tan Aamas '24: Proceedings of the 23Rd International Conference on Autonomous Agents and Multiagent Systems 2024 1510-1520. 2024.

Realist and Pluralist Conceptions of Intelligence and Their Implications on AI Research with Ruchira Dhar and Anders Søgaard Proceedings of the AAAI Conference on Artificial Intelligence 40 (44): 37791-37801. 2026.

The Stories We Govern By: AI, Risk, and the Power of Imaginaries with Gleb Papyshev Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 6 (2): 1939-1950. 2025.

Navigating the informativeness-compression trade-off in XAI with Anders Søgaard AI and Ethics 5 (5): 925-4942. 2025.

Mechanistic Interpretability Needs Philosophy with Iwan Williams, Ruchira Dhar, Joshua Hatherley, Constanza Fierro, Sandrine R. Schiller, Filippos Stamatiou, and Anders Søgaard

Learning and Sustaining Shared Normative Systems via Bayesian Rule Induction in Markov Games
with Zhi-Xuan Tan

Aamas '24: Proceedings of the 23Rd International Conference on Autonomous Agents and Multiagent Systems 2024 1510-1520. 2024.

Realist and Pluralist Conceptions of Intelligence and Their Implications on AI Research
with Ruchira Dhar and Anders Søgaard

Proceedings of the AAAI Conference on Artificial Intelligence 40 (44): 37791-37801. 2026.

The Stories We Govern By: AI, Risk, and the Power of Imaginaries
with Gleb Papyshev

Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 6 (2): 1939-1950. 2025.

Navigating the informativeness-compression trade-off in XAI
with Anders Søgaard

AI and Ethics 5 (5): 925-4942. 2025.

Mechanistic Interpretability Needs Philosophy
with Iwan Williams, Ruchira Dhar, Joshua Hatherley, Constanza Fierro, Sandrine R. Schiller, Filippos Stamatiou, and Anders Søgaard