Patrick Butlin (Eleos AI Research): Publications

More details

Eleos AI Research

Other

King's College London

Department of Philosophy

PhD, 2016

Homepage

Areas of Specialization

Philosophy of Mind

Philosophy of Cognitive Science

Areas of Interest

Philosophy of Mind

Philosophy of Cognitive Science

Philosophy of Biology

Philosophy of Computing and Information

Meta-Ethics

7

Agency and Imitation
Philosophy of Science 1-17. forthcoming.

AI provides a rich source of challenging examples for theories of agency because it allows relevant features to be relatively freely combined. This paper considers agency in sequence models: models that have been trained to predict the next item in a sequence. Training models on sequences of observations and actions can yield systems that use reinforcement learning in context, as well as ones that merely imitate agents. Investigating AI systems from this group suggests a requirement that agents …Read more
AI provides a rich source of challenging examples for theories of agency because it allows relevant features to be relatively freely combined. This paper considers agency in sequence models: models that have been trained to predict the next item in a sequence. Training models on sequences of observations and actions can yield systems that use reinforcement learning in context, as well as ones that merely imitate agents. Investigating AI systems from this group suggests a requirement that agents must learn about the environment in part from their own interaction with it. It also suggests a distinction between two alternative approaches to understanding agency.

Science, Logic, and Mathematics
634

Where is the mind? Persona vectors and LLM individuation
with Pierre Beckmann

The individuation problem for large language models asks which entities associated with them, if any, should be identified as minds. We approach this problem through mechanistic interpretability, engaging in particular with recent empirical work on persona vectors, persona space, and emergent misalignment. We argue that three views are the strongest candidates: the virtual instance view and two new views we introduce, the (virtual) instance-persona view and the model-persona view. First, we arg…Read more
The individuation problem for large language models asks which entities associated with them, if any, should be identified as minds. We approach this problem through mechanistic interpretability, engaging in particular with recent empirical work on persona vectors, persona space, and emergent misalignment. We argue that three views are the strongest candidates: the virtual instance view and two new views we introduce, the (virtual) instance-persona view and the model-persona view. First, we argue for the virtual instance view on the grounds that attention streams sustain quasi-psychological connections across token-time. Then we present the persona literature, organised around three hypotheses about the internal structure underlying personas in LLMs, and show that the two persona-based views are promising alternatives.

Deep Learning Large Language Models Philosophy of AI, General Works
882

Testing for consciousness in current AI
In Walter Sinnott-Armstrong & Liad Mudrik (eds.), Tests of Consciousness: How to tell whether a human, other animal or AI is conscious and what they are conscious of. forthcoming.

Artificial Consciousness
595

Are any machines conscious today?
In Calum Chace (ed.), Perspectives on Machine Consciousness. forthcoming.
949

Desire in AI
In Alex Gregory (ed.), The Routledge Handbook on the Philosophy of Desire, Routledge. forthcoming.

Chatbots seem to express desires and many AI systems arguably pursue goals, such as winning games or helping users. Furthermore, many AI systems are trained by reinforcement learning and similar forms of learning are associated with human and animal desires. This chapter considers arguments for and against the attribution of desires to various AI systems and discusses some implications. Cases from AI can also help us to evaluate possible sets of conditions for desire.

Agency and Artificial Intelligence
30

Higher-order representation in AI
Philosophy and the Mind Sciences 7 (1). 2026.

Higher-order representations are those that are about other representations. Humans and some other animals form higher-order mental representations concerning representations in our own minds, through the operation of processes of metacognition and introspection. These have been linked with a wide range of mental capacities and attributes, including consciousness. Recent research on large language models (LLMs) has explored their knowledge of their own ‘minds’, sometimes suggesting that these mo…Read more
Higher-order representations are those that are about other representations. Humans and some other animals form higher-order mental representations concerning representations in our own minds, through the operation of processes of metacognition and introspection. These have been linked with a wide range of mental capacities and attributes, including consciousness. Recent research on large language models (LLMs) has explored their knowledge of their own ‘minds’, sometimes suggesting that these models represent their own inner representational states in activations. This paper surveys this research, arguing that there is some evidence of higher-order representation in LLMs but that substantial empirical and philosophical questions remain unresolved.
214

Identifying indicators of consciousness in AI systems
with Robert Long, Tim Bayne, Yoshua Bengio, Jonathan Birch, David Chalmers, Axel Constant, George Deane, Eric Elmoznino, Stephen M. Fleming, Xu Ji, Ryota Kanai, Colin Klein, Grace Lindsay, Matthias Michel, Liad Mudrik, Megan A. K. Peters, Eric Schwitzgebel, Jonathan Simon, and Rufin VanRullen

Rapid progress in artificial intelligence (AI) capabilities has drawn fresh attention to the prospect of consciousness in AI. There is an urgent need for rigorous methods to assess AI systems for consciousness, but significant uncertainty about relevant issues in consciousness science. We present a method for assessing AI systems for consciousness that involves exploring what follows from existing or future neuroscientific theories of consciousness. Indicators derived from such theories can be u…Read more
Rapid progress in artificial intelligence (AI) capabilities has drawn fresh attention to the prospect of consciousness in AI. There is an urgent need for rigorous methods to assess AI systems for consciousness, but significant uncertainty about relevant issues in consciousness science. We present a method for assessing AI systems for consciousness that involves exploring what follows from existing or future neuroscientific theories of consciousness. Indicators derived from such theories can be used to inform credences about whether particular AI systems are conscious. This method allows us to make meaningful progress because some influential theories of consciousness, notably including computational functionalist theories, have implications for AI that can be investigated empirically.
71

AI Assertion
with Emanuel Viebahn

Ergo: An Open Access Journal of Philosophy 12 (n/a): 968-988. 2025.

Modern generative AI systems have shown the capacity to produce remarkably fluent language, prompting debates both about their semantic understanding and, less prominently, about whether they can perform speech acts. This paper addresses the latter question, focusing on assertion. We argue that to be capable of assertion, an entity must meet two requirements: it must produce outputs with descriptive functions, and it must be capable of being sanctioned by agents with which it interacts. The seco…Read more
Modern generative AI systems have shown the capacity to produce remarkably fluent language, prompting debates both about their semantic understanding and, less prominently, about whether they can perform speech acts. This paper addresses the latter question, focusing on assertion. We argue that to be capable of assertion, an entity must meet two requirements: it must produce outputs with descriptive functions, and it must be capable of being sanctioned by agents with which it interacts. The second requirement arises from the nature of assertion as a norm-governed social practice. Pre-trained large language models that have not been subject to fine-tuning fail to meet the first requirement. Language models that have been fine-tuned for “groundedness” or “correctness” may meet the first requirement, but fail the second. We also consider the significance of the point that AI systems can be used to generate proxy assertions on behalf of human agents.
827

Taking AI Welfare Seriously
with Robert Long, Jeff Sebo, Kathleen Finlinson, Kyle Fish, Jacqueline Harding, Jacob Pfau, Toni Sims, Jonathan Birch, and David Chalmers

In this report, we argue that there is a realistic possibility that some AI systems will be conscious and/or robustly agentic in the near future. That means that the prospect of AI welfare and moral patienthood — of AI systems with their own interests and moral significance — is no longer an issue only for sci-fi or the distant future. It is an issue for the near future, and AI companies and other actors have a responsibility to start taking it seriously. We also recommend three early step…Read more
In this report, we argue that there is a realistic possibility that some AI systems will be conscious and/or robustly agentic in the near future. That means that the prospect of AI welfare and moral patienthood — of AI systems with their own interests and moral significance — is no longer an issue only for sci-fi or the distant future. It is an issue for the near future, and AI companies and other actors have a responsibility to start taking it seriously. We also recommend three early steps that AI companies and other actors can take: They can (1) acknowledge that AI welfare is an important and difficult issue (and ensure that language model outputs do the same), (2) start assessing AI systems for evidence of consciousness and robust agency, and (3) prepare policies and procedures for treating AI systems with an appropriate level of moral concern. To be clear, our argument in this report is not that AI systems definitely are — or will be — conscious, robustly agentic, or otherwise morally significant. Instead, our argument is that there is substantial uncertainty about these possibilities, and so we need to improve our understanding of AI welfare and our ability to make wise decisions about this issue. Otherwise there is a significant risk that we will mishandle decisions about AI welfare, mistakenly harming AI systems that matter morally and/or mistakenly caring for AI systems that do not.

Philosophy, Miscellaneous Deep Learning Large Language Models Machine Learning, Misc Artificial Consciou…Read more
Philosophy, Miscellaneous Deep Learning Large Language Models Machine Learning, Misc Artificial Consciousness
171

The agency in language agents
Inquiry: An Interdisciplinary Journal of Philosophy 69 (6): 2838-2858. 2026.

Language agents are AI systems that combine large language models with other elements to facilitate interaction with an environment. They include LLM-based chatbots but can have a wide range of additional features to support learning, reasoning and decision-making. Goldstein and Kirk-Giannini. m.s. [AI wellbeing] argue that some language agents have beliefs and desires, but it is not obvious that they are agents at all, since they select outputs by querying language models. This paper investigat…Read more
Language agents are AI systems that combine large language models with other elements to facilitate interaction with an environment. They include LLM-based chatbots but can have a wide range of additional features to support learning, reasoning and decision-making. Goldstein and Kirk-Giannini. m.s. [AI wellbeing] argue that some language agents have beliefs and desires, but it is not obvious that they are agents at all, since they select outputs by querying language models. This paper investigates agency and desires in language agents.
185

Reinforcement learning and artificial agency
Mind and Language 39 (1): 22-38. 2024.

There is an apparent connection between reinforcement learning and agency. Artificial entities controlled by reinforcement learning algorithms are standardly referred to as agents, and the mainstream view in the psychology and neuroscience of agency is that humans and other animals are reinforcement learners. This article examines this connection, focusing on artificial reinforcement learning systems and assuming that there are various forms of agency. Artificial reinforcement learning systems s…Read more
There is an apparent connection between reinforcement learning and agency. Artificial entities controlled by reinforcement learning algorithms are standardly referred to as agents, and the mainstream view in the psychology and neuroscience of agency is that humans and other animals are reinforcement learners. This article examines this connection, focusing on artificial reinforcement learning systems and assuming that there are various forms of agency. Artificial reinforcement learning systems satisfy plausible conditions for minimal agency, and those which use models of the environment to perform forward search are capable of a form of agency which may reasonably be called action for reasons.
297

Sharing Our Concepts with Machines
Erkenntnis 88 (7): 3079-3095. 2021.

As AI systems become increasingly competent language users, it is an apt moment to consider what it would take for machines to understand human languages. This paper considers whether either language models such as GPT-3 or chatbots might be able to understand language, focusing on the question of whether they could possess the relevant concepts. A significant obstacle is that systems of both kinds interact with the world only through text, and thus seem ill-suited to understanding utterances co…Read more
As AI systems become increasingly competent language users, it is an apt moment to consider what it would take for machines to understand human languages. This paper considers whether either language models such as GPT-3 or chatbots might be able to understand language, focusing on the question of whether they could possess the relevant concepts. A significant obstacle is that systems of both kinds interact with the world only through text, and thus seem ill-suited to understanding utterances concerning the concrete objects and properties which human language often describes. Language models cannot understand human languages because they perform only linguistic tasks, and therefore cannot represent such objects and properties. However, chatbots may perform tasks concerning the non-linguistic world, so they are better candidates for understanding. Chatbots can also possess the concepts necessary to understand human languages, despite their lack of perceptual contact with the world, due to the language-mediated concept-sharing described by social externalism about mental content.

Artificial Minds, Misc Content Internalism and Externalism Concept Possession Language Understanding Nat…Read more
Artificial Minds, Misc Content Internalism and Externalism Concept Possession Language Understanding Natural Language Processing Large Language Models
424

AI Assertion
with Emanuel Viebahn

Ergo: An Open Access Journal of Philosophy. 2023.

Modern generative AI systems have shown the capacity to produce remarkably fluent language, prompting debates both about their semantic understanding and, less prominently, about whether they can perform speech acts. This paper addresses the latter question, focusing on assertion. We argue that to be capable of assertion, an entity must meet two requirements: it must produce outputs with descriptive functions, and it must be capable of being sanctioned by agents with which it interacts. The seco…Read more
Modern generative AI systems have shown the capacity to produce remarkably fluent language, prompting debates both about their semantic understanding and, less prominently, about whether they can perform speech acts. This paper addresses the latter question, focusing on assertion. We argue that to be capable of assertion, an entity must meet two requirements: it must produce outputs with descriptive functions, and it must be capable of being sanctioned by agents with which it interacts. The second requirement arises from the nature of assertion as a norm-governed social practice. Pre-trained large language models that have not been subject to fine-tuning fail to meet the first requirement. Language models that have been fine-tuned for ‘groundedness’ or ‘correctness’ may meet the first requirement, but fail the second. We also consider the significance of the point that AI systems can be used to generate proxy assertions on behalf of human agents.

Philosophy of AI, General Works Assertion Speech Acts
169

Machine Learning, Functions and Goals
Croatian Journal of Philosophy 22 (66): 351-370. 2022.

Machine learning researchers distinguish between reinforcement learning and supervised learning and refer to reinforcement learning systems as “agents”. This paper vindicates the claim that systems trained by reinforcement learning are agents while those trained by supervised learning are not. Systems of both kinds satisfy Dretske’s criteria for agency, because they both learn to produce outputs selectively in response to inputs. However, reinforcement learning is sensitive to the instrumental v…Read more
Machine learning researchers distinguish between reinforcement learning and supervised learning and refer to reinforcement learning systems as “agents”. This paper vindicates the claim that systems trained by reinforcement learning are agents while those trained by supervised learning are not. Systems of both kinds satisfy Dretske’s criteria for agency, because they both learn to produce outputs selectively in response to inputs. However, reinforcement learning is sensitive to the instrumental value of outputs, giving rise to systems which exploit the effects of outputs on subsequent inputs to achieve good performance over episodes of interaction with their environments. Supervised learning systems, in contrast, merely learn to produce better outputs in response to individual inputs.

Agency Machine Learning Agency and Artificial Intelligence
113

Affective Experience and Evidence for Animal Consciousness
Philosophical Topics 48 (1): 109-127. 2020.

Affective experience in nonhuman animals is of great interest for both theoretical and practical reasons. This paper highlights research by the psychologists Anthony Dickinson and Bernard Balleine which provides particularly good evidence of conscious affective experience in rats. This evidence is compelling because it implicates a sophisticated system for goal-directed action selection, and demonstrates a contrast between apparently conscious and unconscious evaluative representations with simi…Read more
Affective experience in nonhuman animals is of great interest for both theoretical and practical reasons. This paper highlights research by the psychologists Anthony Dickinson and Bernard Balleine which provides particularly good evidence of conscious affective experience in rats. This evidence is compelling because it implicates a sophisticated system for goal-directed action selection, and demonstrates a contrast between apparently conscious and unconscious evaluative representations with similar content. Meanwhile, the evidence provided by some well-known studies on pain in nonhuman animals is much less convincing. This comparison may offer lessons for the future study of animal consciousness.

Animal Pain Pleasure and Desire Science of Consciousness Animal Consciousness, Misc
120

Cognitive Models Are Distinguished by Content, Not Format
Philosophy of Science 88 (1): 83-102. 2021.

Cognitive scientists often describe the mind as constructing and using models of aspects of the environment, but it is not obvious what makes something a model as opposed to a mere representation....

Varieties of Representation Representation in Cognitive Science
129

Directive Content
Pacific Philosophical Quarterly 102 (1): 2-26. 2020.

Representations may have descriptive content, directive content, or both, but little explicit attention has been given to the problem of distinguishing representations of these three kinds. We do not know, for instance, what determines whether a given representation is a directive instructing its consumer to perform some action or has descriptive content to the effect that the action in question has a certain value. This paper considers what it takes for a representation to have directive conten…Read more
Representations may have descriptive content, directive content, or both, but little explicit attention has been given to the problem of distinguishing representations of these three kinds. We do not know, for instance, what determines whether a given representation is a directive instructing its consumer to perform some action or has descriptive content to the effect that the action in question has a certain value. This paper considers what it takes for a representation to have directive content. The first part of the paper presents the Liberal View, which might be taken to be the default position on this issue. The Liberal View has some attractions, but as the second part shows, these are less conclusive than they might at first appear, and there is much to be said for an alternative, the Strict View.

Teleological Accounts of Mental Content Varieties of Representation
158

Representation and the active consumer
Synthese 197 (10): 4533-4550. 2020.

One of the central tasks for naturalistic theories of representation is to say what it takes for something to be a representation, and some leading theories have been criticised for being too liberal. Prominent discussions of this problem have proposed a producer-oriented solution; it is argued that representations must be produced by systems employing perceptual constancy mechanisms. However, representations may be produced by simple transducers if they are consumed in the right way. It is char…Read more
One of the central tasks for naturalistic theories of representation is to say what it takes for something to be a representation, and some leading theories have been criticised for being too liberal. Prominent discussions of this problem have proposed a producer-oriented solution; it is argued that representations must be produced by systems employing perceptual constancy mechanisms. However, representations may be produced by simple transducers if they are consumed in the right way. It is characteristic of representations to be consumed by systems which are capable of independent action. This paper defends this claim; discusses more precise, naturalistic formulations; and shows how it can illuminate the explanatory payoffs which science achieves by appealing to representation.

Teleological Accounts of Mental Content Representation in Cognitive Science Perceptual Constancy
118

Why Hunger is not a Desire
Review of Philosophy and Psychology 8 (3): 617-635. 2017.

This paper presents an account of the nature of desire, informed by psychology and neuroscience, which entails that hunger is not a desire. The account is contrasted with Schroeder’s well-known empirically-informed theory of desire. It is argued that one significant virtue of the present account, in comparison with Schroeder’s theory, is that it draws a sharp distinction between desires and basic drives, such as the drive for food. One reason to draw this distinction is that experiments on incen…Read more
This paper presents an account of the nature of desire, informed by psychology and neuroscience, which entails that hunger is not a desire. The account is contrasted with Schroeder’s well-known empirically-informed theory of desire. It is argued that one significant virtue of the present account, in comparison with Schroeder’s theory, is that it draws a sharp distinction between desires and basic drives, such as the drive for food. One reason to draw this distinction is that experiments on incentive learning show that desires and basic drives influence action in different ways.

Theories of Desire, Misc Desire and Motivation

Patrick Butlin

Agency and Imitation
Philosophy of Science 1-17. forthcoming.

Where is the mind? Persona vectors and LLM individuation
with Pierre Beckmann

Testing for consciousness in current AI
In Walter Sinnott-Armstrong & Liad Mudrik (eds.), Tests of Consciousness: How to tell whether a human, other animal or AI is conscious and what they are conscious of. forthcoming.

Are any machines conscious today?
In Calum Chace (ed.), Perspectives on Machine Consciousness. forthcoming.

Desire in AI
In Alex Gregory (ed.), The Routledge Handbook on the Philosophy of Desire, Routledge. forthcoming.

Higher-order representation in AI
Philosophy and the Mind Sciences 7 (1). 2026.

AI Assertion
with Emanuel Viebahn

Ergo: An Open Access Journal of Philosophy 12 (n/a): 968-988. 2025.

Taking AI Welfare Seriously
with Robert Long, Jeff Sebo, Kathleen Finlinson, Kyle Fish, Jacqueline Harding, Jacob Pfau, Toni Sims, Jonathan Birch, and David Chalmers

The agency in language agents
Inquiry: An Interdisciplinary Journal of Philosophy 69 (6): 2838-2858. 2026.

Reinforcement learning and artificial agency
Mind and Language 39 (1): 22-38. 2024.

Sharing Our Concepts with Machines
Erkenntnis 88 (7): 3079-3095. 2021.

AI Assertion
with Emanuel Viebahn

Ergo: An Open Access Journal of Philosophy. 2023.

Machine Learning, Functions and Goals
Croatian Journal of Philosophy 22 (66): 351-370. 2022.

Affective Experience and Evidence for Animal Consciousness
Philosophical Topics 48 (1): 109-127. 2020.

Cognitive Models Are Distinguished by Content, Not Format
Philosophy of Science 88 (1): 83-102. 2021.

Directive Content
Pacific Philosophical Quarterly 102 (1): 2-26. 2020.

Representation and the active consumer
Synthese 197 (10): 4533-4550. 2020.

Why Hunger is not a Desire
Review of Philosophy and Psychology 8 (3): 617-635. 2017.

Patrick Butlin

Agency and Imitation Philosophy of Science 1-17. forthcoming.

Where is the mind? Persona vectors and LLM individuation with Pierre Beckmann

Testing for consciousness in current AI In Walter Sinnott-Armstrong & Liad Mudrik (eds.), Tests of Consciousness: How to tell whether a human, other animal or AI is conscious and what they are conscious of. forthcoming.

Are any machines conscious today? In Calum Chace (ed.), Perspectives on Machine Consciousness. forthcoming.

Desire in AI In Alex Gregory (ed.), The Routledge Handbook on the Philosophy of Desire, Routledge. forthcoming.

Higher-order representation in AI Philosophy and the Mind Sciences 7 (1). 2026.

AI Assertion with Emanuel Viebahn Ergo: An Open Access Journal of Philosophy 12 (n/a): 968-988. 2025.

Taking AI Welfare Seriously with Robert Long, Jeff Sebo, Kathleen Finlinson, Kyle Fish, Jacqueline Harding, Jacob Pfau, Toni Sims, Jonathan Birch, and David Chalmers

The agency in language agents Inquiry: An Interdisciplinary Journal of Philosophy 69 (6): 2838-2858. 2026.

Reinforcement learning and artificial agency Mind and Language 39 (1): 22-38. 2024.

Sharing Our Concepts with Machines Erkenntnis 88 (7): 3079-3095. 2021.

AI Assertion with Emanuel Viebahn Ergo: An Open Access Journal of Philosophy. 2023.

Machine Learning, Functions and Goals Croatian Journal of Philosophy 22 (66): 351-370. 2022.

Affective Experience and Evidence for Animal Consciousness Philosophical Topics 48 (1): 109-127. 2020.

Cognitive Models Are Distinguished by Content, Not Format Philosophy of Science 88 (1): 83-102. 2021.

Directive Content Pacific Philosophical Quarterly 102 (1): 2-26. 2020.

Representation and the active consumer Synthese 197 (10): 4533-4550. 2020.

Why Hunger is not a Desire Review of Philosophy and Psychology 8 (3): 617-635. 2017.

Agency and Imitation
Philosophy of Science 1-17. forthcoming.

Where is the mind? Persona vectors and LLM individuation
with Pierre Beckmann

Testing for consciousness in current AI
In Walter Sinnott-Armstrong & Liad Mudrik (eds.), Tests of Consciousness: How to tell whether a human, other animal or AI is conscious and what they are conscious of. forthcoming.

Are any machines conscious today?
In Calum Chace (ed.), Perspectives on Machine Consciousness. forthcoming.

Desire in AI
In Alex Gregory (ed.), The Routledge Handbook on the Philosophy of Desire, Routledge. forthcoming.

Higher-order representation in AI
Philosophy and the Mind Sciences 7 (1). 2026.

AI Assertion
with Emanuel Viebahn

Ergo: An Open Access Journal of Philosophy 12 (n/a): 968-988. 2025.

Taking AI Welfare Seriously
with Robert Long, Jeff Sebo, Kathleen Finlinson, Kyle Fish, Jacqueline Harding, Jacob Pfau, Toni Sims, Jonathan Birch, and David Chalmers

The agency in language agents
Inquiry: An Interdisciplinary Journal of Philosophy 69 (6): 2838-2858. 2026.

Reinforcement learning and artificial agency
Mind and Language 39 (1): 22-38. 2024.

Sharing Our Concepts with Machines
Erkenntnis 88 (7): 3079-3095. 2021.

AI Assertion
with Emanuel Viebahn

Ergo: An Open Access Journal of Philosophy. 2023.

Machine Learning, Functions and Goals
Croatian Journal of Philosophy 22 (66): 351-370. 2022.

Affective Experience and Evidence for Animal Consciousness
Philosophical Topics 48 (1): 109-127. 2020.

Cognitive Models Are Distinguished by Content, Not Format
Philosophy of Science 88 (1): 83-102. 2021.

Directive Content
Pacific Philosophical Quarterly 102 (1): 2-26. 2020.

Representation and the active consumer
Synthese 197 (10): 4533-4550. 2020.

Why Hunger is not a Desire
Review of Philosophy and Psychology 8 (3): 617-635. 2017.