•  967
    Mechanistic interpretability (MI) aims to explain how neural networks work by uncovering their underlying mechanisms. As the field grows in influence, it is increasingly important to examine not just models themselves, but the assumptions, concepts and explanatory strategies implicit in MI research. We argue that mechanistic interpretability needs philosophy as an ongoing partner in clarifying its concepts, refining its methods, and navigating the epistemic and ethical complexities of interpreti…Read more
  •  54
    Epistemic Drift in Mind-Model Systems
    with Nina Rajcic and Ava Elizabeth Scott
    Minds and Machines 36 (1): 18. 2026.
    LLMs have been shown to induce representations akin to real-world information structures. According to the standard account, this is because LLMs model the human mind, and minds model the world. Thus, LLMs and minds share properties that ultimately converge upon similar structures. We contribute to this discourse in two ways: a) We argue for an alternative account: LLMs model minds, and minds model worlds, but minds also model LLMs, which in turn extend the capacities of minds. In sum, the world…Read more
  •  107
    Realist and Pluralist Conceptions of Intelligence and Their Implications on AI Research
    Proceedings of the AAAI Conference on Artificial Intelligence 40 (44): 37791-37801. 2026.
    In this paper, we argue that current AI (alignment) research operates on a spectrum between two different underlying conceptions of intelligence: Intelligence Realism, which holds that intelligence represents a single, universal capacity measurable across all systems, and Intelligence Pluralism, which views intelligence as diverse, context-dependent capacities that cannot be reduced to a single universal measure. Through an analysis of current debates in AI research, we demonstrate how the conce…Read more
  •  129
    Every explanation faces a trade-off between informativeness and compression (Kinney and Lombrozo, 2022). On the one hand, we want to aim for as much detailed and correct information as possible, informativeness, on the other hand, we want to ensure that a human can process and comprehend the explanation, compression. Current methods in eXplainable AI (XAI) try to satisfy this trade-off statically, outputting one fixed, non-adjustable explanation that sits somewhere on the spectrum between inform…Read more
  •  176
    Postmortem avatars (PMAs) — AI systems that simulate a deceased person by being fine-tuned on data they generated or that was generated about them — have attracted growing scholarly attention, yet their potential role in clinical settings remains largely unexplored. This paper examines the ethics of deploying PMAs as therapeutic tools in grief therapy. Drawing on the dual-process model of grief, the theory of continuing bonds, and the philosophical framework of fictionalism, we propose two poten…Read more
  •  223
    Defining Knowledge: Bridging Epistemology and Large Language Models
    with Constanza Fierro, Ruchira Dhar, Filippos Stamatiou, and Nicolas Garneau
    Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing 2024. 2024.
    Knowledge claims are abundant in the literature on large language models (LLMs); but can we say that GPT-4 truly "knows" the Earth is round? To address this question, we review standard definitions of knowledge in epistemology and we formalize interpretations applicable to LLMs. In doing so, we identify inconsistencies and gaps in how current NLP research conceptualizes knowledge with respect to epistemological frameworks. Additionally, we conduct a survey of 100 professional philosophers and co…Read more
  •  179
    Federation opacity and the promise of federated learning in healthcare
    with Joshua Hatherley, Angela Ballantyne, and Ruben Pauwels
    American Journal of Bioethics. forthcoming.
    Federated learning (FL) is a machine learning (ML) approach that allows multiple devices or institutions to collaboratively train an ML model without sharing their local data with a third-party. It has recently received significant attention as a promising way to overcome longstanding ethical obstacles to training medical ML models with patient health data. This paper examines the promise of FL in healthcare from an ethical perspective. It argues that medical FL generates a new variety of opacit…Read more
  •  17
    Mele’s Digital Zygote: Developer Responsibility for Neural Networks
    Science and Engineering Ethics 31 (6): 40. 2025.
    Should developers be held responsible for the predictions of their neural networks—and if not, does that introduce a responsibility gap? The claim that neural networks introduce a responsibility gap has seen significant pushback, with philosophers arguing that the gap can be bridged, or did not exist in the first place. We show how the responsibility gap turns on whether we can distinguish between foreseeable and unforeseeable neural network predictions. Empirical facts about neural networks tel…Read more
  •  17
    Externalist XAI?
    Theoria 91 (2). 2024.
    Developers of artificial intelligence (AI) often cannot explain the inferences their neural networks make, at least not in ways that satisfy user needs. XAI—explainable artificial intelligence—aims to develop techniques for providing such explanations. XAI researchers have adopted techniques that, to philosophers, seem representationalist–internalist, leading some philosophers to call for more externalist alternatives. But is explaining AI models through causally related external factors feasibl…Read more
  •  860
    Federated learning (FL) is a machine learning approach that allows multiple devices or institutions to collaboratively train a model without sharing their local data with a third-party. FL is considered a promising way to address patient privacy concerns in medical artificial intelligence. The ethical risks of medical FL systems themselves, however, have thus far been underexamined. This paper aims to address this gap. We argue that medical FL presents a new variety of opacity -- federation opa…Read more
  •  117
    Externalist XAI?
    Theoria 91 (2). 2025.
    Developers of artificial intelligence (AI) often cannot explain the inferences their neural networks make, at least not in ways that satisfy user needs. XAI—explainable artificial intelligence—aims to develop techniques for providing such explanations. XAI researchers have adopted techniques that, to philosophers, seem representationalist–internalist, leading some philosophers to call for more externalist alternatives. But is explaining AI models through causally related external factors feasibl…Read more
  •  45
    Is Unsupervised Clustering Somehow Truer?
    Minds and Machines 34 (4). 2024.
    Scientists increasingly approach the world through machine learning techniques, but philosophers of science often question their epistemic status. Some philosophers have argued that the use of unsupervised clustering algorithms is more justified than the use of supervised classification, because supervised classification is more biased, and because (parametric) simplicity plays a different and more interesting role in unsupervised clustering. I call these arguments the No-Bias Argument and the S…Read more
  •  128
    On the Opacity of Deep Neural Networks
    Canadian Journal of Philosophy 1-16. 2023.
    Deep neural networks are said to be opaque, impeding the development of safe and trustworthy artificial intelligence, but where this opacity stems from is less clear. What are the sufficient properties for neural network opacity? Here, I discuss five common properties of deep neural networks and two different kinds of opacity. Which of these properties are sufficient for what type of opacity? I show how each kind of opacity stems from only one of these five properties, and then discuss to what e…Read more
  •  91
    Identity Theory and Falsifiability
    Acta Analytica 39 (4): 737-748. 2024.
    I identify a class of arguments against multiple realization (MR): BookofSand arguments. The arguments are in their general form successful under reasonably uncontroversial assumptions, but this, on the other hand, turns the table on identity theory: If arguments from MR can always be refuted by BookofSand arguments, is identity theory falsifiable? In the absence of operational demarcation criteria, it is not. I suggest a parameterized formal demarcation principle for brain state/process types a…Read more
  •  133
    On Hedden's proof that machine learning fairness metrics are flawed
    Inquiry: An Interdisciplinary Journal of Philosophy 68 (4): 1198-1217. 2025.
    1. Fairness is about the just distribution of society's resources, and in ML, the main resource being distributed is model performance, e.g. the translation quality produced by machine translation...
  •  139
    Most, if not all, philosophers agree that computers cannot learn what words refers to from raw text alone. While many attacked Searle’s Chinese Room thought experiment, no one seemed to question this most basic assumption. For how can computers learn something that is not in the data? Emily Bender and Alexander Koller ( 2020 ) recently presented a related thought experiment—the so-called Octopus thought experiment, which replaces the rule-based interlocutor of Searle’s thought experiment with a …Read more
  •  113
    Understanding models understanding language
    Synthese 200 (6): 1-16. 2022.
    Landgrebe and Smith :2061–2081, 2021) present an unflattering diagnosis of recent advances in what they call language-centric artificial intelligence—perhaps more widely known as natural language processing: The models that are currently employed do not have sufficient expressivity, will not generalize, and are fundamentally unable to induce linguistic semantics, they say. The diagnosis is mainly derived from an analysis of the widely used Transformer architecture. Here I address a number of mis…Read more
  •  106
    Polyadic dynamic logics for hpsg parsing
    with Martin Lange
    Journal of Logic, Language and Information 18 (2): 159-198. 2009.
    Head-driven phrase structure grammar (HPSG) is one of the most prominent theories employed in deep parsing of natural language. Many linguistic theories are arguably best formalized in extensions of modal or dynamic logic (Keller, Feature logics, infinitary descriptions and grammar, 1993; Kracht, Linguistics Philos 18:401–458, 1995; Moss and Tiede, In: Blackburn, van Benthem, and Wolther (eds.) Handbook of modal logic, 2006), and HPSG seems to be no exception. Adequate extensions of dynamic logi…Read more
  •  39
    Compound constructions: A reply to Bundgaard et al
    Semiotica 2008 (169): 163-169. 2008.