•  967
    Mechanistic interpretability (MI) aims to explain how neural networks work by uncovering their underlying mechanisms. As the field grows in influence, it is increasingly important to examine not just models themselves, but the assumptions, concepts and explanatory strategies implicit in MI research. We argue that mechanistic interpretability needs philosophy as an ongoing partner in clarifying its concepts, refining its methods, and navigating the epistemic and ethical complexities of interpreti…Read more
  •  107
    Realist and Pluralist Conceptions of Intelligence and Their Implications on AI Research
    Proceedings of the AAAI Conference on Artificial Intelligence 40 (44): 37791-37801. 2026.
    In this paper, we argue that current AI (alignment) research operates on a spectrum between two different underlying conceptions of intelligence: Intelligence Realism, which holds that intelligence represents a single, universal capacity measurable across all systems, and Intelligence Pluralism, which views intelligence as diverse, context-dependent capacities that cannot be reduced to a single universal measure. Through an analysis of current debates in AI research, we demonstrate how the conce…Read more
  •  223
    Defining Knowledge: Bridging Epistemology and Large Language Models
    Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing 2024. 2024.
    Knowledge claims are abundant in the literature on large language models (LLMs); but can we say that GPT-4 truly "knows" the Earth is round? To address this question, we review standard definitions of knowledge in epistemology and we formalize interpretations applicable to LLMs. In doing so, we identify inconsistencies and gaps in how current NLP research conceptualizes knowledge with respect to epistemological frameworks. Additionally, we conduct a survey of 100 professional philosophers and co…Read more