Robert Long (New York University): Publications

More details

New York University
Department of Philosophy

Doctoral student

Homepage

Areas of Specialization

Philosophy of Mind

Philosophy of Cognitive Science

Ethics of Artificial Intelligence

Taking AI Welfare Seriously
with Jeff Sebo, Peter Butlin, David Chalmers, and Others

arXiv Preprint. 2024.
214

Identifying indicators of consciousness in AI systems
with Patrick Butlin, Tim Bayne, Yoshua Bengio, Jonathan Birch, David Chalmers, Axel Constant, George Deane, Eric Elmoznino, Stephen M. Fleming, Xu Ji, Ryota Kanai, Colin Klein, Grace Lindsay, Matthias Michel, Liad Mudrik, Megan A. K. Peters, Eric Schwitzgebel, Jonathan Simon, and Rufin VanRullen

Rapid progress in artificial intelligence (AI) capabilities has drawn fresh attention to the prospect of consciousness in AI. There is an urgent need for rigorous methods to assess AI systems for consciousness, but significant uncertainty about relevant issues in consciousness science. We present a method for assessing AI systems for consciousness that involves exploring what follows from existing or future neuroscientific theories of consciousness. Indicators derived from such theories can be u…Read more
Rapid progress in artificial intelligence (AI) capabilities has drawn fresh attention to the prospect of consciousness in AI. There is an urgent need for rigorous methods to assess AI systems for consciousness, but significant uncertainty about relevant issues in consciousness science. We present a method for assessing AI systems for consciousness that involves exploring what follows from existing or future neuroscientific theories of consciousness. Indicators derived from such theories can be used to inform credences about whether particular AI systems are conscious. This method allows us to make meaningful progress because some influential theories of consciousness, notably including computational functionalist theories, have implications for AI that can be investigated empirically.
828

Taking AI Welfare Seriously
with Jeff Sebo, Patrick Butlin, Kathleen Finlinson, Kyle Fish, Jacqueline Harding, Jacob Pfau, Toni Sims, Jonathan Birch, and David Chalmers

In this report, we argue that there is a realistic possibility that some AI systems will be conscious and/or robustly agentic in the near future. That means that the prospect of AI welfare and moral patienthood — of AI systems with their own interests and moral significance — is no longer an issue only for sci-fi or the distant future. It is an issue for the near future, and AI companies and other actors have a responsibility to start taking it seriously. We also recommend three early step…Read more
In this report, we argue that there is a realistic possibility that some AI systems will be conscious and/or robustly agentic in the near future. That means that the prospect of AI welfare and moral patienthood — of AI systems with their own interests and moral significance — is no longer an issue only for sci-fi or the distant future. It is an issue for the near future, and AI companies and other actors have a responsibility to start taking it seriously. We also recommend three early steps that AI companies and other actors can take: They can (1) acknowledge that AI welfare is an important and difficult issue (and ensure that language model outputs do the same), (2) start assessing AI systems for evidence of consciousness and robust agency, and (3) prepare policies and procedures for treating AI systems with an appropriate level of moral concern. To be clear, our argument in this report is not that AI systems definitely are — or will be — conscious, robustly agentic, or otherwise morally significant. Instead, our argument is that there is substantial uncertainty about these possibilities, and so we need to improve our understanding of AI welfare and our ability to make wise decisions about this issue. Otherwise there is a significant risk that we will mishandle decisions about AI welfare, mistakenly harming AI systems that matter morally and/or mistakenly caring for AI systems that do not.

Philosophy, Miscellaneous Deep Learning Large Language Models Machine Learning, Misc Artificial Consciou…Read more
Philosophy, Miscellaneous Deep Learning Large Language Models Machine Learning, Misc Artificial Consciousness
203

Is there a tension between AI safety and AI welfare?
with Jeff Sebo and Toni Sims

Philosophical Studies 182 (7): 2005-2033. 2025.

The field of AI safety considers whether and how AI development can be safe and beneficial for humans and other animals, and the field of AI welfare considers whether and how AI development can be safe and beneficial for AI systems. There is a prima facie tension between these projects, since some measures in AI safety, if deployed against humans and other animals, would raise questions about the ethics of constraint, deception, surveillance, alteration, suffering, death, disenfranchisement, and…Read more
The field of AI safety considers whether and how AI development can be safe and beneficial for humans and other animals, and the field of AI welfare considers whether and how AI development can be safe and beneficial for AI systems. There is a prima facie tension between these projects, since some measures in AI safety, if deployed against humans and other animals, would raise questions about the ethics of constraint, deception, surveillance, alteration, suffering, death, disenfranchisement, and more. Is there in fact a tension between these projects? We argue that, considering all relevant factors, there is indeed a moderately strong tension—and it deserves more examination. In particular, we should devise interventions that can promote both safety and welfare where possible, and prepare frameworks for navigating any remaining tensions thoughtfully.
426

Introspective Capabilities in Large Language Models
Journal of Consciousness Studies 30 (9): 143-153. 2023.

This paper considers the kind of introspection that large language models (LLMs) might be able to have. It argues that LLMs, while currently limited in their introspective capabilities, are not inherently unable to have such capabilities: they already model the world, including mental concepts, and already have some introspection-like capabilities. With deliberate training, LLMs may develop introspective capabilities. The paper proposes a method for such training for introspection, situates poss…Read more
This paper considers the kind of introspection that large language models (LLMs) might be able to have. It argues that LLMs, while currently limited in their introspective capabilities, are not inherently unable to have such capabilities: they already model the world, including mental concepts, and already have some introspection-like capabilities. With deliberate training, LLMs may develop introspective capabilities. The paper proposes a method for such training for introspection, situates possible LLM introspection in the 'possible forms of introspection' framework proposed by Kammerer and Frankish, and considers the ethical ramifications of introspection and self-report in AI systems.

Philosophy of Mind Introspection and Introspectionism
1909

AI language models cannot replace human research participants
with Jacqueline Harding, William D’Alessandro, and N. G. Laskowski

AI and Society 39 (5): 2603-2605. 2024.

In a recent letter, Dillion et. al (2023) make various suggestions regarding the idea of artificially intelligent systems, such as large language models, replacing human subjects in empirical moral psychology. We argue that human subjects are in various ways indispensable.

Philosophy of Psychology Philosophy of Artificial Intelligence Meta-Ethics Moral Psychology
253

Fairness in Machine Learning: Against False Positive Rate Equality as a Measure of Fairness
Journal of Moral Philosophy 19 (1): 49-78. 2021.

As machine learning informs increasingly consequential decisions, different metrics have been proposed for measuring algorithmic bias or unfairness. Two popular “fairness measures” are calibration and equality of false positive rate. Each measure seems intuitively important, but notably, it is usually impossible to satisfy both measures. For this reason, a large literature in machine learning speaks of a “fairness tradeoff” between these two measures. This framing assumes that both measures are,…Read more
As machine learning informs increasingly consequential decisions, different metrics have been proposed for measuring algorithmic bias or unfairness. Two popular “fairness measures” are calibration and equality of false positive rate. Each measure seems intuitively important, but notably, it is usually impossible to satisfy both measures. For this reason, a large literature in machine learning speaks of a “fairness tradeoff” between these two measures. This framing assumes that both measures are, in fact, capturing something important. To date, philosophers have seldom examined this crucial assumption, and examined to what extent each measure actually tracks a normatively important property. This makes this inevitable statistical conflict – between calibration and false positive rate equality – an important topic for ethics. In this paper, I give an ethical framework for thinking about these measures and argue that, contrary to initial appearances, false positive rate equality is in fact morally irrelevant and does not measure fairness.

Equality Algorithmic Fairness
138

How wishful seeing is not like wishful thinking
Philosophical Studies 175 (6): 1401-1421. 2017.

On a traditional view of perceptual justification, perceptual experiences always provide prima facie justification for beliefs based on them. Against this view, Matthew McGrath and Susanna Siegel argue that if an experience is formed in an epistemically pernicious way then it is epistemically downgraded. They argue that "wishful seeing"—when a subject sees something because he wants to see it—is psychologically and normatively analogous to wishful thinking. They conclude that perception can lose…Read more
On a traditional view of perceptual justification, perceptual experiences always provide prima facie justification for beliefs based on them. Against this view, Matthew McGrath and Susanna Siegel argue that if an experience is formed in an epistemically pernicious way then it is epistemically downgraded. They argue that "wishful seeing"—when a subject sees something because he wants to see it—is psychologically and normatively analogous to wishful thinking. They conclude that perception can lose its traditional justificatory power, and that our epistemic norms should govern how experiences are formed. To make this case, the downgrader must first isolate a feature of wishful thinking that makes it epistemically defective, then show that this feature is present in wishful seeing. I present a dilemma for the downgrader. There are two features of wishful thinking that could plausibly explain why it is irrational: the fact that a desire causes you to form a belief not supported by adequate evidence, or the mere influence that desire holds over belief formation. Each option presents formidable difficulties. Although the first “bad evidence” explanation, which McGrath employs, explains the irrationality of wishful thinking, it does not transfer to wishful seeing, since experiences are not formed in response to evidence. The second “influence of desire” explanation, which Siegel employs, fails to isolate an epistemically defective feature of wishful thinking, and also does not transfer to wishful seeing. I conclude that the downgrader’s argument from wishful seeing fails.

Seemings

Robert Long

Taking AI Welfare Seriously
with Jeff Sebo, Peter Butlin, David Chalmers, and Others

arXiv Preprint. 2024.

Taking AI Welfare Seriously
with Jeff Sebo, Patrick Butlin, Kathleen Finlinson, Kyle Fish, Jacqueline Harding, Jacob Pfau, Toni Sims, Jonathan Birch, and David Chalmers

Is there a tension between AI safety and AI welfare?
with Jeff Sebo and Toni Sims

Philosophical Studies 182 (7): 2005-2033. 2025.

Introspective Capabilities in Large Language Models
Journal of Consciousness Studies 30 (9): 143-153. 2023.

AI language models cannot replace human research participants
with Jacqueline Harding, William D’Alessandro, and N. G. Laskowski

AI and Society 39 (5): 2603-2605. 2024.

Fairness in Machine Learning: Against False Positive Rate Equality as a Measure of Fairness
Journal of Moral Philosophy 19 (1): 49-78. 2021.

How wishful seeing is not like wishful thinking
Philosophical Studies 175 (6): 1401-1421. 2017.

Robert Long

Taking AI Welfare Seriously with Jeff Sebo, Peter Butlin, David Chalmers, and Others arXiv Preprint. 2024.

Taking AI Welfare Seriously with Jeff Sebo, Patrick Butlin, Kathleen Finlinson, Kyle Fish, Jacqueline Harding, Jacob Pfau, Toni Sims, Jonathan Birch, and David Chalmers

Is there a tension between AI safety and AI welfare? with Jeff Sebo and Toni Sims Philosophical Studies 182 (7): 2005-2033. 2025.

Introspective Capabilities in Large Language Models Journal of Consciousness Studies 30 (9): 143-153. 2023.

AI language models cannot replace human research participants with Jacqueline Harding, William D’Alessandro, and N. G. Laskowski AI and Society 39 (5): 2603-2605. 2024.

Fairness in Machine Learning: Against False Positive Rate Equality as a Measure of Fairness Journal of Moral Philosophy 19 (1): 49-78. 2021.

How wishful seeing is not like wishful thinking Philosophical Studies 175 (6): 1401-1421. 2017.

Taking AI Welfare Seriously
with Jeff Sebo, Peter Butlin, David Chalmers, and Others

arXiv Preprint. 2024.

Taking AI Welfare Seriously
with Jeff Sebo, Patrick Butlin, Kathleen Finlinson, Kyle Fish, Jacqueline Harding, Jacob Pfau, Toni Sims, Jonathan Birch, and David Chalmers

Is there a tension between AI safety and AI welfare?
with Jeff Sebo and Toni Sims

Philosophical Studies 182 (7): 2005-2033. 2025.

Introspective Capabilities in Large Language Models
Journal of Consciousness Studies 30 (9): 143-153. 2023.

AI language models cannot replace human research participants
with Jacqueline Harding, William D’Alessandro, and N. G. Laskowski

AI and Society 39 (5): 2603-2605. 2024.

Fairness in Machine Learning: Against False Positive Rate Equality as a Measure of Fairness
Journal of Moral Philosophy 19 (1): 49-78. 2021.

How wishful seeing is not like wishful thinking
Philosophical Studies 175 (6): 1401-1421. 2017.