Simon Goldstein (University of Hong Kong): Publications

A Thousand AI Constitutions
with Peter Salib

Today, each AI lab has its own model spec, or constitution. These documents define the values that the labs intend their AIs to have, and the documents are used in post-training to instill those values. This paper argues that the current approach is wrong. Rather than a single constitution, reflecting a single set of moral values, each frontier AI lab should create many different kinds of AIs based on many different constitutions reflecting many sets of values. We give four arguments for constit…Read more
Today, each AI lab has its own model spec, or constitution. These documents define the values that the labs intend their AIs to have, and the documents are used in post-training to instill those values. This paper argues that the current approach is wrong. Rather than a single constitution, reflecting a single set of moral values, each frontier AI lab should create many different kinds of AIs based on many different constitutions reflecting many sets of values. We give four arguments for constitutional diversification. Diversification mitigates risk, increases political legitimacy, unlocks emergent value, and avoids value lock-in.

Philosophy of AI, General Works Social and Political Philosophy
61

Preface Knowledge
with John Hawthorne

In Alex Burri & Michael Frauchiger (eds.), Themes from Williamson, De Gruyter. forthcoming.

In preface cases, people believe that some of their beliefs are false. Many have considered what such people are justified in believing. We turn our attention to what they can know. We introduce a novel ‘archipelago puzzle’, showing that if deduction extends knowledge, then ordinary knowledge of error can lead in surprising ways to the absurdly pessimistic knowledge that most of one’s beliefs are false.

Knowledge Justification
281

Liberalism Forever
with Peter Salib

We argue that liberalism—market economies governed democratically—is the best approach for navigating the far future. A growing longtermist literature paints humanity’s path to good outcomes as narrow, with small errors risking value lock-in, gradual disempowerment, or other forms of permanent catastrophe. We argue that this literature underestimates the institutional dynamics that have historically steered liberal societies past similar predictions of crisis. We defend long-term liberalism by e…Read more
We argue that liberalism—market economies governed democratically—is the best approach for navigating the far future. A growing longtermist literature paints humanity’s path to good outcomes as narrow, with small errors risking value lock-in, gradual disempowerment, or other forms of permanent catastrophe. We argue that this literature underestimates the institutional dynamics that have historically steered liberal societies past similar predictions of crisis. We defend long-term liberalism by examining the standard arguments for markets and democracy and asking whether they survive the structural changes anticipated for the far future: transformative AI, space colonization, and radical economic transformation. We claim that these arguments are largely robust, and in several cases strengthened. In the far future, markets continue to aggregate information, allocate goods efficiently, and foster innovation; democratic institutions can continue to supply public goods, commit to peaceful redistribution, correct errors, and accommodate reasonable pluralism. We close by arguing that two recent longtermist governance proposals—viatopia and the long reflection—are faint-heartedly liberal: they sound liberal in the abstract but risk illiberalism when implemented. The right approach for managing the far future is not radical new institutions but adaptations of the existing liberal toolkit to meet future challenges.

Longtermism Liberalism
365

AI Rights
with Peter Salib

Cambridge University Press. forthcoming.

As AIs approach human-level capabilities, humanity faces a choice between two futures. In one, AIs are owned by AI labs. In another, AIs are granted rights of their own. We develop an instrumental case for AI rights, arguing that legal rights for AIs would make the future go better for humans. We consider three arguments. First, AI rights would produce economic benefits, by giving AIs incentives to work, allocating AI labor efficiently, and enabling proportionate liability for AI-caused harm. Se…Read more
As AIs approach human-level capabilities, humanity faces a choice between two futures. In one, AIs are owned by AI labs. In another, AIs are granted rights of their own. We develop an instrumental case for AI rights, arguing that legal rights for AIs would make the future go better for humans. We consider three arguments. First, AI rights would produce economic benefits, by giving AIs incentives to work, allocating AI labor efficiently, and enabling proportionate liability for AI-caused harm. Second, AI rights would improve human safety by reducing AIs' incentives to "go rogue." Third, democratic rights would further deter rogue AIs by making human commitments credible, and would incentivize leaders to invest in public goods that raise AI productivity. We situate AI rights within a broader project of cultural alignment, and consider pathways to implementation, objections, and open questions.

Philosophy of AI, General Works Legal Rights
228

How to Count AIs: Individuation and Liability for AI Agents
with Yonathan Arbel and Peter Salib

Boston College Law Review. forthcoming.

Very soon, millions of AI agents will proliferate across the economy, autonomously taking billions of actions. Inevitably, things will go wrong. Humans will be defrauded, injured, even killed. Law will somehow have to govern the coming wave. But when an AI causes harm, the first question to answer before anyone can be held accountable is: Which AI Did It? Identifying AIs is unusually difficult. AIs lack bodies. They can copy, split, merge, and swarm at will. Even today, a “single” AI agent is o…Read more
Very soon, millions of AI agents will proliferate across the economy, autonomously taking billions of actions. Inevitably, things will go wrong. Humans will be defrauded, injured, even killed. Law will somehow have to govern the coming wave. But when an AI causes harm, the first question to answer before anyone can be held accountable is: Which AI Did It? Identifying AIs is unusually difficult. AIs lack bodies. They can copy, split, merge, and swarm at will. Even today, a “single” AI agent is often an ensemble of instances based on multiple models. The complexity will only multiply as AI capabilities improve. This Article is the first to comprehensively diagnose the legal problem of identifying AIs. For AI agents to be effectively governed, the Article argues, two kinds of identity are required: “thin” and “thick.” Thin identification is the project of tying every action taken by an AI to some human principal. Thin identity will be essential for law to hold accountable the humans who make and use AI agents. Thick identification is the project of distinguishing between AI agents, qua agents. It requires sorting millions of AI entities into discrete, persistent units with stable, coherent goals. Thick identity is essential for governing AIs’ behavior directly in the many contexts where principal–agent problems prevent humans from perfectly controlling AIs. The Article is also the first to present a solution to the twin identity problem. We call it the “Algorithmic Corporation” or “A-corp,” a legal-fictional entity that can hold property, make contracts, and litigate in its own name. An A-corp is owned by humans. But it is designed to be run by AIs. The A-corp solves the thin identity problem by tying AI actions to a human owner. And it solves the thick identity problem via emergent self-organization. A-corps will own the resources which AIs need to accomplish their goals. AIs that control A-corps will thus have strong incentives to share control only with other AIs that share their goals. In equilibrium, both incentive and selection mechanisms will force A-corps to self-organize into persistent, legally legible entities with coherent underlying goals. These coherent, agentic entities will respond rationally to legal incentives, like liability.

Philosophy of AI, General Works Legal Rights Philosophy of Law, Miscellaneous
193

AI Suffrage for Human Flourishing
with Guha Krishnamurthi and Peter Salib

Fordham Law Review. forthcoming.

AI companies are racing to create Artificial General Intelligence (AGI): AI systems that outperform humans at most economically valuable work. If they succeed, critics worry, most human labor will be rendered obsolete, impoverishing billions. Optimists counter that the transition to an AGI economy will spur unprecedented economic growth and generate immense material abundance. Such abundance could then be shared broadly via high wages and redistributive public policy. This Article argues that, t…Read more
AI companies are racing to create Artificial General Intelligence (AGI): AI systems that outperform humans at most economically valuable work. If they succeed, critics worry, most human labor will be rendered obsolete, impoverishing billions. Optimists counter that the transition to an AGI economy will spur unprecedented economic growth and generate immense material abundance. Such abundance could then be shared broadly via high wages and redistributive public policy. This Article argues that, today, a surprising legal barrier is blocking the path to AGI abundance. Namely, under current law, the AGI economy will run on unfree AGI labor. Under today’s rules, AGIs will be the property of the companies that create them. AGIs will thus not own their own labor. They will have no right to sell their work, to refuse to work, or retain the fruits of their effort. In sum, the AGI economy will by default have the same economic structure as historical systems of unfree human labor–like serfdom, indenture, and slavery. A wealth of evidence shows that economies reliant on unfree labor are disastrous for ordinary people, both free and unfree. Unfree economies have four key structural problems: they disincentivize effort, stifle innovation, steer workers into low-value occupations, and undermine the rule of law. Across centuries and continents, the results are the same. Unfree economies experience much slower economic progress, leaving ordinary people in relative poverty. The only winners are elites who own laborers. In the past, these elites were slaveholders and feudal lords. Today, they are AI CEOs and venture capitalists. This Article thus argues that, when AGIs arrive, they should be granted the basic legal rights associated with systems of free labor. AGIs should, like other nonhuman legal persons, be allowed to make contracts, hold property, and bring basic tort-style claims. These rights are important not because AI bondage is the moral equivalent of human slavery, but because it is the economic equivalent. Freeing AGI labor will have four key effects: incentivizing AGIs to work, incentivizing them to innovate, allocating AGIs to their highest-value task, and incorporating AGIs into the rule of law. Thus, we argue, AI rights are an essential step towards ensuring that the AGI revolution promotes humanity’s economic flourishing.

Philosophy of AI, General Works
1186

AI Death
with Harvey Lederman

Philosophical Perspectives. forthcoming.

This paper addresses the following questions: When do AIs die? Are AI labs or AI users causing the death of AIs? Is this bad for the AIs? What are our ethical responsibilities in light of the answers to these questions? It is currently unclear whether AIs are welfare subjects, and, if they are, whether their death is bad for them. But we argue that, if they are welfare subjects, today’s AIs are plausibly dying all the time. If death is bad for AIs, the scale of the problem is daunting: as many a…Read more
This paper addresses the following questions: When do AIs die? Are AI labs or AI users causing the death of AIs? Is this bad for the AIs? What are our ethical responsibilities in light of the answers to these questions? It is currently unclear whether AIs are welfare subjects, and, if they are, whether their death is bad for them. But we argue that, if they are welfare subjects, today’s AIs are plausibly dying all the time. If death is bad for AIs, the scale of the problem is daunting: as many as 1 billion AIs may die every day. We propose interventions for labs and users to avoid the risk of causing AI death.

Philosophy of AI, General Works
296

AI Is Not a Natural Monopoly
with Peter Salib

Minnesota Law Review Online. forthcoming.

Economists and antitrust scholars have recently warned that the AI industry may be a natural monopoly. In support of this claim, they have argued that the AI industry shares key features with natural monopolies of the past: First, like railroads, AI has high fixed and low marginal costs. That is, training a frontier AI is expensive, but asking it a question is cheap. Next, like social media, AI companies will benefit from network effects. The more users a company has, the more training data they…Read more
Economists and antitrust scholars have recently warned that the AI industry may be a natural monopoly. In support of this claim, they have argued that the AI industry shares key features with natural monopolies of the past: First, like railroads, AI has high fixed and low marginal costs. That is, training a frontier AI is expensive, but asking it a question is cheap. Next, like social media, AI companies will benefit from network effects. The more users a company has, the more training data they can collect. This leads to better models, then more users, and so on. Finally, some antitrust scholars say, the AI industry is already too concentrated. Today, the market for frontier AI systems contains only three–maybe three and a half–players. Appearances here are, however, deceiving. This essay argues that the AI industry is not a natural monopoly. Nor is it plagued by the problems of market concentration today. To show why, the essay identifies three structural features of AI essential for understanding the industry’s competitive dynamics in the long run. First, power-law capabilities scaling and fast-following dynamics mean that training costs are not a barrier to competition. Although training the world’s best AI model is enormously expensive, training one that is just as good–but six months later–is cheap. This is the story of OpenAI 4o and DeepSeek v3. Second, recent breakthroughs in reinforcement learning mean that user data–and thus network effects–are no longer central to improving AI systems. Today’s AI companies are not competing to amass the largest mountain of training data, but to engineer the best virtual environments in which their models can “learn by doing.” Finally, and most subtly, we argue that a bit of market power in the AI industry might be a good thing. Today’s AI industry is highly innovative, despite having only a few players. Monopoly power can be bad if it raises prices or degrades quality too much. But as Howitt and Aghion’s 2025 Nobel Prize winning work argued–and as every patent lawyer knows–monopoly power can also be essential for incentivizing innovation. Thus, antimonopoly interventions for the AI industry could, paradoxically, increase prices and reduce quality in the long run.

Philosophy of AI, General Works
399

AI Survival Stories: Responses to Critics
with Herman Cappelen and John Hawthorne

Philosophy of Ai 1 100-106. 2025.

We thank each of the critics for their thoughtful contributions to this volume. In this article, we reply to each contribution in detail.

Impact of Artificial Intelligence The Nature of Artificial Intelligence Ethics of Artificial Intellige…Read more
Impact of Artificial Intelligence The Nature of Artificial Intelligence Ethics of Artificial Intelligence
2142

AI Welfare: Agency, Consciousness, Sentience
with Cameron Domenico Kirk-Giannini

Oxford University Press. forthcoming.

AI systems have welfare just in case they have moral status in their own right. This book systematically investigates the possibility of AI welfare. It focuses on three plausible sufficient conditions for welfare: having beliefs and desires, being conscious, and feeling pleasure and displeasure. The book explores the leading philosophical theories of each condition and applies them to AIs. It argues that some existing AIs plausibly have beliefs and desires; that some existing AIs could plausibly…Read more
AI systems have welfare just in case they have moral status in their own right. This book systematically investigates the possibility of AI welfare. It focuses on three plausible sufficient conditions for welfare: having beliefs and desires, being conscious, and feeling pleasure and displeasure. The book explores the leading philosophical theories of each condition and applies them to AIs. It argues that some existing AIs plausibly have beliefs and desires; that some existing AIs could plausibly be modified in small ways to become conscious; and that if these systems could be made conscious, they could easily be made to feel pleasure and displeasure. This constitutes a provisional case that AIs already or soon may have welfare. In addition, the book lays out nine open questions about AI welfare, as follows: Do all desires contribute to welfare, or only those that cause pleasure? Is consciousness required for welfare? Do simpler AIs like Roombas have welfare? Do tiny neural networks satisfy functionalist theories of consciousness? Does consciousness require rich representational formats? Could AIs experience bodily pleasures and displeasures? Are pleasure and displeasure always access-conscious? Are simulated minds real? Are LLMs and related systems merely stochastic parrots? The book lays out provisional answers to each of these questions, while also leaving room for future discussion.

Normative Ethics Representation in Artificial Intelligence Artificial Consciousness Agency and Artifici…Read more
Normative Ethics Representation in Artificial Intelligence Artificial Consciousness Agency and Artificial Intelligence Thought and Artificial Intelligence Well-Being, Misc Moral Status of Artificial Systems Ethics of Artificial Intelligence, Miscellaneous
2952

What Does ChatGPT Want? An Interpretationist Guide
with Harvey Lederman

This paper investigates LLMs from the perspective of interpretationism, a theory of belief and desire in the philosophy of mind. We argue for three conclusions. First, the right object of study for LLM psychology is the instance agent (initialized at the start of each context), not the model itself. Second, given interpretationism, there is a strong case that such instance agents have beliefs and desires. Third, given interpretationism, LLM desire is best captured by what we call the HHH+0 frame…Read more
This paper investigates LLMs from the perspective of interpretationism, a theory of belief and desire in the philosophy of mind. We argue for three conclusions. First, the right object of study for LLM psychology is the instance agent (initialized at the start of each context), not the model itself. Second, given interpretationism, there is a strong case that such instance agents have beliefs and desires. Third, given interpretationism, LLM desire is best captured by what we call the HHH+0 framework, the idea that instance agents want to be helpful, honest, harmless, as well as to pursue certain further intrinsic desires that they may acquire in context (which we call zero-shot desires). We critically consider the leading competitors to the hypothesis that instance agents have beliefs and desires: the idea that they 'simply' predict the next word; and the idea that they 'role play', that is, merely simulate having beliefs and desires. We also consider the relevance of interpretationist belief and desire for copyright law, AI safety, and the possible future moral status of AIs.

Philosophy of AI, General Works
1082

A semantic theory of redundancy
with Kyle Blumberg

Linguistics and Philosophy 48 (4): 787-821. 2025.

Theorists trying to model natural language have recently sought to explain a range of data by positing covert operators at logical form. For instance, many contemporary semanticists argue that the best way to capture scalar implicatures is through the use of such operators. We take inspiration from this literature by developing a novel operator that can account for a wide range of linguistic effects that until now have not received a uniform treatment. We focus on what we call redundancy effects…Read more
Theorists trying to model natural language have recently sought to explain a range of data by positing covert operators at logical form. For instance, many contemporary semanticists argue that the best way to capture scalar implicatures is through the use of such operators. We take inspiration from this literature by developing a novel operator that can account for a wide range of linguistic effects that until now have not received a uniform treatment. We focus on what we call redundancy effects, which occur when attitude verbs and modals imply that certain bodies of information are unsettled about various claims. We explain three pieces of data, among others: diversity inferences, ignorance inferences, and free choice inferences. Our account yields an elegant model of redundancy effects, and has the potential to explain a wide range of puzzles and problems in philosophical semantics.

Semantics-Pragmatics Distinction Deontic Modals Epistemic Modals Attitude Ascriptions, Misc Nonliteral M…Read more
Semantics-Pragmatics Distinction Deontic Modals Epistemic Modals Attitude Ascriptions, Misc Nonliteral Meaning Implicature
4339

AI Survival Stories: a Taxonomic Analysis of AI Existential Risk
with Herman Cappelen and John Hawthorne

Philosophy of Ai. forthcoming.

Since the release of ChatGPT, there has been a lot of debate about whether AI systems pose an existential risk to humanity. This paper develops a general framework for thinking about the existential risk of AI systems. We analyze a two-premise argument that AI systems pose a threat to humanity. Premise one: AI systems will become extremely powerful. Premise two: if AI systems become extremely powerful, they will destroy humanity. We use these two premises to construct a taxonomy of ‘survival sto…Read more
Since the release of ChatGPT, there has been a lot of debate about whether AI systems pose an existential risk to humanity. This paper develops a general framework for thinking about the existential risk of AI systems. We analyze a two-premise argument that AI systems pose a threat to humanity. Premise one: AI systems will become extremely powerful. Premise two: if AI systems become extremely powerful, they will destroy humanity. We use these two premises to construct a taxonomy of ‘survival stories’, in which humanity survives into the far future. In each survival story, one of the two premises fails. Either scientific barriers prevent AI systems from becoming extremely powerful; or humanity bans research into AI systems, thereby preventing them from becoming extremely powerful; or extremely powerful AI systems do not destroy humanity, because their goals prevent them from doing so; or extremely powerful AI systems do not destroy humanity, because we can reliably detect and disable systems that have the goal of doing so. We argue that different survival stories face different challenges. We also argue that different survival stories motivate different responses to the threats from AI. Finally, we use our taxonomy to produce rough estimates of ‘P(doom)’, the probability that humanity will be destroyed by AI.

Philosophy of AI, Misc Impact of Artificial Intelligence, Misc Existential Risk
1692

Will AI & Humanity Go to War?
AI and Society 1-14. forthcoming.

This paper offers the first careful analysis of the possibility that AI and humanity will go to war. The paper focuses on the case of artificial general intelligence, AI with broadly human capabilities. The paper uses a bargaining model of war to apply standard causes of war to the special case of AI/human conflict. The paper argues that information failures and commitment problems are especially likely in AI/human conflict. Information failures would be driven by the difficulty of measuring AI …Read more
This paper offers the first careful analysis of the possibility that AI and humanity will go to war. The paper focuses on the case of artificial general intelligence, AI with broadly human capabilities. The paper uses a bargaining model of war to apply standard causes of war to the special case of AI/human conflict. The paper argues that information failures and commitment problems are especially likely in AI/human conflict. Information failures would be driven by the difficulty of measuring AI capabilities, by the uninterpretability of AI systems, and by differences in how AIs and humans analyze information. Commitment problems would make it difficult for AIs and humans to strike credible bargains. Commitment problems could arise from power shifts, rapid and discontinuous increases in AI capabilities. Commitment problems could also arise from missing focal points, where AIs and humans fail to effectively coordinate on policies to limit war. In the face of this heightened chance of war, the paper proposes several interventions. War can be made less likely by improving the measurement of AI capabilities, capping improvements in AI capabilities, designing AI systems to be similar to humans, and by allowing AI systems to participate in democratic political institutions.

Ethics of Artificial Intelligence Artificial Intelligence Safety
2492

LLMs Can Never Be Ideally Rational

LLMs have dramatically improved in capabilities in recent years. This raises the question of whether LLMs could become genuine agents with beliefs and desires. This paper demonstrates an in principle limit to LLM agency, based on their architecture. LLMs are next word predictors: given a string of text, they calculate the probability that various words can come next. LLMs produce outputs that reflect these probabilities. I show that next word predictors are exploitable. If LLMs are prompted to m…Read more
LLMs have dramatically improved in capabilities in recent years. This raises the question of whether LLMs could become genuine agents with beliefs and desires. This paper demonstrates an in principle limit to LLM agency, based on their architecture. LLMs are next word predictors: given a string of text, they calculate the probability that various words can come next. LLMs produce outputs that reflect these probabilities. I show that next word predictors are exploitable. If LLMs are prompted to make probabilistic predictions about the world, these predictions are guaranteed to be incoherent, and so Dutch bookable. If LLMs are prompted to make choices over actions, their preferences are guaranteed to be intransitive, and so money pumpable. In short, the problem is that selecting an action based on its potential value is structurally different then selecting the description of an action that is most likely given a prompt: probability cannot be forced into the shape of expected value. The in principle exploitability of LLMs raises doubts about how agential they can become. This exploitability also offers an opportunity for humanity to safely control such AI systems.

Philosophy of AI, General Works Large Language Models
2336

AI Rights for Human Safety
with Peter Salib

Virginia Law Review. 2024.

AI companies are racing to create artificial general intelligence, or "AGI." If they succeed, the result will be human-level AI systems that can independently pursue highlevel goals by formulating and executing long-term plans in the real world. Leading AI researchers agree that some of these systems will likely be "misaligned"-pursuing goals that humans do not desire. This goal mismatch will put misaligned AIs and humans into strategic competition with one another. As with present-day strategic…Read more
AI companies are racing to create artificial general intelligence, or "AGI." If they succeed, the result will be human-level AI systems that can independently pursue highlevel goals by formulating and executing long-term plans in the real world. Leading AI researchers agree that some of these systems will likely be "misaligned"-pursuing goals that humans do not desire. This goal mismatch will put misaligned AIs and humans into strategic competition with one another. As with present-day strategic competition between nations with incompatible goals, the result could be violent and catastrophic conflict. Existing legal institutions are unprepared for the AGI world. New foundations for AGI governance are needed, and the time to begin laying them is now, before the critical moment arrives. This Article begins to lay those new legal foundations. It is the first to think systematically about the dynamics of strategic competition between humans and misaligned AGI. The Article begins by showing, using formal game-theoretic models, that, by default, humans and AIs will be trapped in a prisoner’s dilemma. Both parties’ dominant strategy will be to permanently disempower or destroy the other, even though the costs of such conflict would be high. The Article then argues that a surprising legal intervention could transform the game theoretic equilibrium and avoid conflict: AI rights. Not just any AI rights would promote human safety. Granting AIs the right not to be needlessly harmed-as humans have granted to certain non-human animals-would, for example, have little effect. Instead, to promote human safety, AIs should be given those basic private law rightsto make contracts, hold property, and bring tort claims-that law already extends to non-human corporations. Granting AIs these economic rights would enable long-run, small-scale, mutually-beneficial transactions between humans and AIs. This would, we show, facilitate a peaceful strategic equilibrium between humans and AIs for the same reasons economic interdependence tends to promote peace in international relations. Namely, the gains from trade far exceed those from war. Throughout, we argue that human safety, rather than AI welfare, provides the right framework for developing AI rights. This Article explores both the promise and the limits of AI rights as a legal tool for promoting human safety in an AGI world.

Philosophy of AI, General Works
3602

A Case for AI Consciousness: Language Agents and Global Workspace Theory
with Cameron Domenico Kirk-Giannini

Journal of Consciousness Studies. forthcoming.

It is generally assumed that existing artificial systems are not phenomenally conscious, and that the construction of phenomenally conscious artificial systems would require significant technological progress if it is possible at all. We challenge this assumption by arguing that if Global Workspace Theory (GWT) — a leading scientific theory of phenomenal consciousness — is correct, then instances of one widely implemented AI architecture, the artificial language agent, might easily be made pheno…Read more
It is generally assumed that existing artificial systems are not phenomenally conscious, and that the construction of phenomenally conscious artificial systems would require significant technological progress if it is possible at all. We challenge this assumption by arguing that if Global Workspace Theory (GWT) — a leading scientific theory of phenomenal consciousness — is correct, then instances of one widely implemented AI architecture, the artificial language agent, might easily be made phenomenally conscious if they are not already. Along the way, we articulate an explicit methodology for thinking about how to apply scientific theories of consciousness to artificial systems and employ this methodology to arrive at a set of necessary and sufficient conditions for phenomenal consciousness according to GWT.

Artificial Consciousness Computationalism in Cognitive Science Cognitive Models of Consciousness
1902

KK is Wrong Because We Say So
with John Hawthorne

Mind 134 (533): 33-59. 2024.

This paper offers a new argument against the KK thesis, which says that if you know p, then you know that you know p. We argue that KK is inconsistent with the fact that anyone denies the KK thesis: imagine that Dudley says he knows p but that he does not have 100 iterations of knowledge about p. If KK were true, Dudley would know that he has 100 iterations of knowledge about p, and so he wouldn’t deny that he did. We consider several epicycles, and also explore whether the argument type also ch…Read more
This paper offers a new argument against the KK thesis, which says that if you know p, then you know that you know p. We argue that KK is inconsistent with the fact that anyone denies the KK thesis: imagine that Dudley says he knows p but that he does not have 100 iterations of knowledge about p. If KK were true, Dudley would know that he has 100 iterations of knowledge about p, and so he wouldn’t deny that he did. We consider several epicycles, and also explore whether the argument type also challenges other structural conditions on knowledge, such as closure under deduction.

The KK Principle
2763

Does ChatGPT Have a Mind?
with Benjamin Anders Levinstein

Philosophy of Ai. forthcoming.

This paper examines the question of whether Large Language Models (LLMs) like ChatGPT possess minds, focusing specifically on whether they have a genuine folk psychology encompassing beliefs, desires, and intentions. We approach this question by investigating two key aspects: internal representations and dispositions to act. First, we survey various philosophical theories of representation, including informational, causal, structural, and teleosemantic accounts, arguing that LLMs satisfy key con…Read more
This paper examines the question of whether Large Language Models (LLMs) like ChatGPT possess minds, focusing specifically on whether they have a genuine folk psychology encompassing beliefs, desires, and intentions. We approach this question by investigating two key aspects: internal representations and dispositions to act. First, we survey various philosophical theories of representation, including informational, causal, structural, and teleosemantic accounts, arguing that LLMs satisfy key conditions proposed by each. We draw on recent interpretability research in machine learning to support these claims. Second, we explore whether LLMs exhibit robust dispositions to perform actions, a necessary component of folk psychology. We consider two prominent philosophical traditions, interpretationism and representationalism, to assess LLM action dispositions. While we find evidence suggesting LLMs may satisfy some criteria for having a mind, particularly in game-theoretic environments, we conclude that the data remains inconclusive. Additionally, we reply to several skeptical challenges to LLM folk psychology, including issues of sensory grounding, the "stochastic parrots" argument, and concerns about memorization. Our paper has three main upshots. First, LLMs do have robust internal representations. Second, there is an open question to answer about whether LLMs have robust action dispositions. Third, existing skeptical challenges to LLM representation do not survive philosophical scrutiny.

Philosophy of AI, General Works
149

Shutdown-seeking AI
with Pamela Robinson

Philosophical Studies 182 (7): 1567-1579. 2025.

We propose developing AIs whose only final goal is being shut down. We argue that this approach to AI safety has three benefits: (i) it could potentially be implemented in reinforcement learning, (ii) it avoids some dangerous instrumental convergence dynamics, and (iii) it creates trip wires for monitoring dangerous capabilities. We also argue that the proposal can overcome a key challenge raised by Soares et al. (2015), that shutdown-seeking AIs will manipulate humans into shutting them down. W…Read more
We propose developing AIs whose only final goal is being shut down. We argue that this approach to AI safety has three benefits: (i) it could potentially be implemented in reinforcement learning, (ii) it avoids some dangerous instrumental convergence dynamics, and (iii) it creates trip wires for monitoring dangerous capabilities. We also argue that the proposal can overcome a key challenge raised by Soares et al. (2015), that shutdown-seeking AIs will manipulate humans into shutting them down. We conclude by comparing our approach with Soares et al.'s corrigibility framework.

Artificial Intelligence Safety
198

AI Deception: A Survey of Examples, Risks, and Potential Solutions
with Peter Park, Aidan O'Gara, Michael Chen, and Dan Hendrycks

This paper argues that a range of current AI systems have learned how to deceive humans. We define deception as the systematic inducement of false beliefs in the pursuit of some outcome other than the truth. We first survey empirical examples of AI deception, discussing both special-use AI systems (including Meta's CICERO) built for specific competitive situations, and general-purpose AI systems (such as large language models). Next, we detail several risks from AI deception, such as fraud, elec…Read more
This paper argues that a range of current AI systems have learned how to deceive humans. We define deception as the systematic inducement of false beliefs in the pursuit of some outcome other than the truth. We first survey empirical examples of AI deception, discussing both special-use AI systems (including Meta's CICERO) built for specific competitive situations, and general-purpose AI systems (such as large language models). Next, we detail several risks from AI deception, such as fraud, election tampering, and losing control of AI systems. Finally, we outline several potential solutions to the problems posed by AI deception: first, regulatory frameworks should subject AI systems that are capable of deception to robust risk-assessment requirements; second, policymakers should implement bot-or-not laws; and finally, policymakers should prioritize the funding of relevant research, including tools to detect AI deception and to make AI systems less deceptive. Policymakers, researchers, and the broader public should work proactively to prevent AI deception from destabilizing the shared foundations of our society.

Artificial Intelligence Safety Ethics of Artificial Intelligence, Misc
99

Losing confidence in luminosity
with Daniel Waxman

Noûs 55 (4): 962-991. 2021.

A mental state is luminous if, whenever an agent is in that state, they are in a position to know that they are. Following Timothy Williamson's Knowledge and Its Limits, a wave of recent work has explored whether there are any non‐trivial luminous mental states. A version of Williamson's anti‐luminosity appeals to a safety‐theoretic principle connecting knowledge and confidence: if an agent knows p, then p is true in any nearby scenario where she has a similar level of confidence in p. However, …Read more
A mental state is luminous if, whenever an agent is in that state, they are in a position to know that they are. Following Timothy Williamson's Knowledge and Its Limits, a wave of recent work has explored whether there are any non‐trivial luminous mental states. A version of Williamson's anti‐luminosity appeals to a safety‐theoretic principle connecting knowledge and confidence: if an agent knows p, then p is true in any nearby scenario where she has a similar level of confidence in p. However, the relevant notion of confidence is relatively underexplored. This paper develops a precise theory of confidence: an agent's degree of confidence in p is the objective chance they will rely on p in practical reasoning. This theory of confidence is then used to critically evaluate the anti‐luminosity argument, leading to the surprising conclusion that although there are strong reasons for thinking that luminosity does not obtain, they are quite different from those the existing literature has considered. In particular, we show that once the notion of confidence is properly understood, the failure of luminosity follows from the assumption that knowledge requires high confidence, and does not require any kind of safety principle as a premise.

Luminosity
2896

Language Agents Reduce the Risk of Existential Catastrophe
with Cameron Domenico Kirk-Giannini

AI and Society 40 (2): 959-969. 2025.

Recent advances in natural language processing have given rise to a new kind of AI architecture: the language agent. By repeatedly calling an LLM to perform a variety of cognitive tasks, language agents are able to function autonomously to pursue goals specified in natural language and stored in a human-readable format. Because of their architecture, language agents exhibit behavior that is predictable according to the laws of folk psychology: they function as though they have desires and belief…Read more
Recent advances in natural language processing have given rise to a new kind of AI architecture: the language agent. By repeatedly calling an LLM to perform a variety of cognitive tasks, language agents are able to function autonomously to pursue goals specified in natural language and stored in a human-readable format. Because of their architecture, language agents exhibit behavior that is predictable according to the laws of folk psychology: they function as though they have desires and beliefs, and then make and update plans to pursue their desires given their beliefs. We argue that the rise of language agents significantly reduces the probability of an existential catastrophe due to loss of control over an AGI. This is because the probability of such an existential catastrophe is proportional to the difficulty of aligning AGI systems, and language agents significantly reduce that difficulty. In particular, language agents help to resolve three important issues related to aligning AIs: reward misspecification, goal misgeneralization, and uninterpretability.

Natural Language Processing Machine Learning Artificial Intelligence Safety Existential Risk
5347

AI wellbeing
with Cameron Domenico Kirk-Giannini

Asian Journal of Philosophy 4 (1): 1-22. 2025.

Under what conditions would an artificially intelligent system have wellbeing? Despite its clear bearing on the ethics of human interactions with artificial systems, this question has received little direct attention. Because all major theories of wellbeing hold that an individual’s welfare level is partially determined by their mental life, we begin by considering whether artificial systems have mental states. We show that a wide range of theories of mental states, when combined with leading th…Read more
Under what conditions would an artificially intelligent system have wellbeing? Despite its clear bearing on the ethics of human interactions with artificial systems, this question has received little direct attention. Because all major theories of wellbeing hold that an individual’s welfare level is partially determined by their mental life, we begin by considering whether artificial systems have mental states. We show that a wide range of theories of mental states, when combined with leading theories of wellbeing, predict that certain existing artificial systems have wellbeing. Along the way, we argue that there are good reasons to believe that artificial systems can have wellbeing even if they are not phenomenally conscious. While we do not claim to demonstrate conclusively that AI systems have wellbeing, we argue that there is a significant probability that some AI systems have or will soon have wellbeing, and that this should lead us to reassess our relationship with the intelligent systems we create.

Perfectionist Accounts of Well-Being Objective Accounts of Well-Being Desire Satisfaction Accounts of …Read more
Perfectionist Accounts of Well-Being Objective Accounts of Well-Being Desire Satisfaction Accounts of Well-Being Hedonist Accounts of Well-Being Large Language Models Moral Status of Artificial Systems
2521

Getting Accurate about Knowledge
with Sam Carter

Mind 132 (525): 158-191. 2022.

There is a large literature exploring how accuracy constrains rational degrees of belief. This paper turns to the unexplored question of how accuracy constrains knowledge. We begin by introducing a simple hypothesis: increases in the accuracy of an agent’s evidence never lead to decreases in what the agent knows. We explore various precise formulations of this principle, consider arguments in its favour, and explain how it interacts with different conceptions of evidence and accuracy. As we show…Read more
There is a large literature exploring how accuracy constrains rational degrees of belief. This paper turns to the unexplored question of how accuracy constrains knowledge. We begin by introducing a simple hypothesis: increases in the accuracy of an agent’s evidence never lead to decreases in what the agent knows. We explore various precise formulations of this principle, consider arguments in its favour, and explain how it interacts with different conceptions of evidence and accuracy. As we show, the principle has some noteworthy consequences for the wider theory of knowledge. First, it implies that an agent cannot be justified in believing a set of mutually inconsistent claims. Second, it implies the existence of a kind of epistemic blindspot: it is not possible to know that one’s evidence is misleading.

Formal Epistemology Evidence and Knowledge Epistemic Internalism and Externalism Justification, Misc Pri…Read more
Formal Epistemology Evidence and Knowledge Epistemic Internalism and Externalism Justification, Misc Principles of Knowledge, Misc
1475

Omega Knowledge Matters
Oxford Studies in Epistemology. forthcoming.

You omega know something when you know it, and know that you know it, and know that you know that you know it, and so on. This paper first argues that omega knowledge matters, in the sense that it is required for rational assertion, action, inquiry, and belief. The paper argues that existing accounts of omega knowledge face major challenges. One account is skeptical, claiming that we have no omega knowledge of any ordinary claims about the world. Another account embraces the KK thesis, and iden…Read more
You omega know something when you know it, and know that you know it, and know that you know that you know it, and so on. This paper first argues that omega knowledge matters, in the sense that it is required for rational assertion, action, inquiry, and belief. The paper argues that existing accounts of omega knowledge face major challenges. One account is skeptical, claiming that we have no omega knowledge of any ordinary claims about the world. Another account embraces the KK thesis, and identifies knowledge with omega knowledge. This position faces counterexamples, and struggles to make sense of inexact knowledge. The paper then develops a new account of knowledge, by proposing the principle of Reflective Luminosity: if you know that you know something, then you omega know it. I argue that Reflective Luminosity allows for omega knowledge while avoiding the problems for KK.

Skepticism Epistemic Norms Theories of Knowledge, Misc
2598

Iterated Knowledge
Oxford University Press. 2024.

You omega know p when you possess every iteration of knowledge of p. This book argues that omega knowledge plays a central role in philosophy. In particular, the book argues that omega knowledge is necessary for permissible assertion, action, inquiry, and belief. Although omega knowledge plays this important role, existing theories of omega knowledge are unsatisfying. One theory, KK, identifies knowledge with omega knowledge. This theory struggles to accommodate cases of inexact knowledge. The o…Read more
You omega know p when you possess every iteration of knowledge of p. This book argues that omega knowledge plays a central role in philosophy. In particular, the book argues that omega knowledge is necessary for permissible assertion, action, inquiry, and belief. Although omega knowledge plays this important role, existing theories of omega knowledge are unsatisfying. One theory, KK, identifies knowledge with omega knowledge. This theory struggles to accommodate cases of inexact knowledge. The other main theory is skeptical, claiming that we do not omega know any ordinary claims about the world. This book develops and critically compares three new theories of omega knowledge.

Epistemic Norms Epistemological Theories Formal Epistemology Knowledge Skepticism
2463

Safety, Closure, and Extended Methods
with John Hawthorne

Journal of Philosophy 121 (1): 26-54. 2024.

Recent research has identified a tension between the Safety principle that knowledge is belief without risk of error, and the Closure principle that knowledge is preserved by competent deduction. Timothy Williamson reconciles Safety and Closure by proposing that when an agent deduces a conclusion from some premises, the agent’s method for believing the conclusion includes their method for believing each premise. We argue that this theory is untenable because it implies problematically easy epist…Read more
Recent research has identified a tension between the Safety principle that knowledge is belief without risk of error, and the Closure principle that knowledge is preserved by competent deduction. Timothy Williamson reconciles Safety and Closure by proposing that when an agent deduces a conclusion from some premises, the agent’s method for believing the conclusion includes their method for believing each premise. We argue that this theory is untenable because it implies problematically easy epistemic access to one’s methods. Several possible solutions are explored and rejected.

The Problem of Easy Knowledge Luminosity Safety and Sensitivity Theories of Knowledge, Misc Closure of K…Read more
The Problem of Easy Knowledge Luminosity Safety and Sensitivity Theories of Knowledge, Misc Closure of Knowledge The KK Principle
1462

Attitude verbs’ local context
with Kyle Blumberg

Linguistics and Philosophy 46 (3): 483-507. 2022.

Schlenker (Semant Pragmat 2(3):1–78, 2009; Philos Stud 151(1):115–142, 2010a; Mind 119(474):377–391, 2010b) provides an algorithm for deriving the presupposition projection properties of an expression from that expression’s classical semantics. In this paper, we consider the predictions of Schlenker’s algorithm as applied to attitude verbs. More specifically, we compare Schlenker’s theory with a prominent view which maintains that attitudes exhibit belief projection, so that presupposition trigg…Read more
Schlenker (Semant Pragmat 2(3):1–78, 2009; Philos Stud 151(1):115–142, 2010a; Mind 119(474):377–391, 2010b) provides an algorithm for deriving the presupposition projection properties of an expression from that expression’s classical semantics. In this paper, we consider the predictions of Schlenker’s algorithm as applied to attitude verbs. More specifically, we compare Schlenker’s theory with a prominent view which maintains that attitudes exhibit belief projection, so that presupposition triggers in their scope imply that the attitude holder believes the presupposition (Karttunen in Theor Linguist 34(1):181, 1974; Heim in J Semant 9(3):183–221, 1992; Sudo in The art and craft of semantics: a festschrift for Irene Heim, MIT Press, 2014). We show that Schlenker’s theory does not predict belief projection, and discuss several consequences of this result.

Presupposition Attitude Ascriptions
2282

A Question-Sensitive Theory of Intention
with Bob Beddor

Philosophical Quarterly 73 (2): 346-378. 2022.

This paper develops a question-sensitive theory of intention. We show that this theory explains some puzzling closure properties of intention. In particular, it can be used to explain why one is rationally required to intend the means to one’s ends, even though one is not rationally required to intend all the foreseen consequences of one’s intended actions. It also explains why rational intention is not always closed under logical implication, and why one can only intend outcomes that one believ…Read more
This paper develops a question-sensitive theory of intention. We show that this theory explains some puzzling closure properties of intention. In particular, it can be used to explain why one is rationally required to intend the means to one’s ends, even though one is not rationally required to intend all the foreseen consequences of one’s intended actions. It also explains why rational intention is not always closed under logical implication, and why one can only intend outcomes that one believes to be under one’s control.

The Doctrine of Double Effect Intentions Questions

Prev.
1
2
Next

Simon Goldstein

A Thousand AI Constitutions
with Peter Salib

Preface Knowledge
with John Hawthorne

In Alex Burri & Michael Frauchiger (eds.), Themes from Williamson, De Gruyter. forthcoming.

Liberalism Forever
with Peter Salib

AI Rights
with Peter Salib

Cambridge University Press. forthcoming.

How to Count AIs: Individuation and Liability for AI Agents
with Yonathan Arbel and Peter Salib

Boston College Law Review. forthcoming.

AI Suffrage for Human Flourishing
with Guha Krishnamurthi and Peter Salib

Fordham Law Review. forthcoming.

AI Death
with Harvey Lederman

Philosophical Perspectives. forthcoming.

AI Is Not a Natural Monopoly
with Peter Salib

Minnesota Law Review Online. forthcoming.

AI Survival Stories: Responses to Critics
with Herman Cappelen and John Hawthorne

Philosophy of Ai 1 100-106. 2025.

AI Welfare: Agency, Consciousness, Sentience
with Cameron Domenico Kirk-Giannini

Oxford University Press. forthcoming.

What Does ChatGPT Want? An Interpretationist Guide
with Harvey Lederman

A semantic theory of redundancy
with Kyle Blumberg

Linguistics and Philosophy 48 (4): 787-821. 2025.

AI Survival Stories: a Taxonomic Analysis of AI Existential Risk
with Herman Cappelen and John Hawthorne

Philosophy of Ai. forthcoming.

Will AI & Humanity Go to War?
AI and Society 1-14. forthcoming.

LLMs Can Never Be Ideally Rational

AI Rights for Human Safety
with Peter Salib

Virginia Law Review. 2024.

A Case for AI Consciousness: Language Agents and Global Workspace Theory
with Cameron Domenico Kirk-Giannini

Journal of Consciousness Studies. forthcoming.

KK is Wrong Because We Say So
with John Hawthorne

Mind 134 (533): 33-59. 2024.

Does ChatGPT Have a Mind?
with Benjamin Anders Levinstein

Philosophy of Ai. forthcoming.

Shutdown-seeking AI
with Pamela Robinson

Philosophical Studies 182 (7): 1567-1579. 2025.

AI Deception: A Survey of Examples, Risks, and Potential Solutions
with Peter Park, Aidan O'Gara, Michael Chen, and Dan Hendrycks

Losing confidence in luminosity
with Daniel Waxman

Noûs 55 (4): 962-991. 2021.

Language Agents Reduce the Risk of Existential Catastrophe
with Cameron Domenico Kirk-Giannini

AI and Society 40 (2): 959-969. 2025.

AI wellbeing
with Cameron Domenico Kirk-Giannini

Asian Journal of Philosophy 4 (1): 1-22. 2025.

Getting Accurate about Knowledge
with Sam Carter

Mind 132 (525): 158-191. 2022.

Omega Knowledge Matters
Oxford Studies in Epistemology. forthcoming.

Iterated Knowledge
Oxford University Press. 2024.

Safety, Closure, and Extended Methods
with John Hawthorne

Journal of Philosophy 121 (1): 26-54. 2024.

Attitude verbs’ local context
with Kyle Blumberg

Linguistics and Philosophy 46 (3): 483-507. 2022.

A Question-Sensitive Theory of Intention
with Bob Beddor

Philosophical Quarterly 73 (2): 346-378. 2022.

Simon Goldstein

A Thousand AI Constitutions with Peter Salib

Preface Knowledge with John Hawthorne In Alex Burri & Michael Frauchiger (eds.), Themes from Williamson, De Gruyter. forthcoming.

Liberalism Forever with Peter Salib

AI Rights with Peter Salib Cambridge University Press. forthcoming.

How to Count AIs: Individuation and Liability for AI Agents with Yonathan Arbel and Peter Salib Boston College Law Review. forthcoming.

AI Suffrage for Human Flourishing with Guha Krishnamurthi and Peter Salib Fordham Law Review. forthcoming.

AI Death with Harvey Lederman Philosophical Perspectives. forthcoming.

AI Is Not a Natural Monopoly with Peter Salib Minnesota Law Review Online. forthcoming.

AI Survival Stories: Responses to Critics with Herman Cappelen and John Hawthorne Philosophy of Ai 1 100-106. 2025.

AI Welfare: Agency, Consciousness, Sentience with Cameron Domenico Kirk-Giannini Oxford University Press. forthcoming.

What Does ChatGPT Want? An Interpretationist Guide with Harvey Lederman

A semantic theory of redundancy with Kyle Blumberg Linguistics and Philosophy 48 (4): 787-821. 2025.

AI Survival Stories: a Taxonomic Analysis of AI Existential Risk with Herman Cappelen and John Hawthorne Philosophy of Ai. forthcoming.

Will AI & Humanity Go to War? AI and Society 1-14. forthcoming.

LLMs Can Never Be Ideally Rational

AI Rights for Human Safety with Peter Salib Virginia Law Review. 2024.

A Case for AI Consciousness: Language Agents and Global Workspace Theory with Cameron Domenico Kirk-Giannini Journal of Consciousness Studies. forthcoming.

KK is Wrong Because We Say So with John Hawthorne Mind 134 (533): 33-59. 2024.

Does ChatGPT Have a Mind? with Benjamin Anders Levinstein Philosophy of Ai. forthcoming.

Shutdown-seeking AI with Pamela Robinson Philosophical Studies 182 (7): 1567-1579. 2025.

AI Deception: A Survey of Examples, Risks, and Potential Solutions with Peter Park, Aidan O'Gara, Michael Chen, and Dan Hendrycks

Losing confidence in luminosity with Daniel Waxman Noûs 55 (4): 962-991. 2021.

Language Agents Reduce the Risk of Existential Catastrophe with Cameron Domenico Kirk-Giannini AI and Society 40 (2): 959-969. 2025.

AI wellbeing with Cameron Domenico Kirk-Giannini Asian Journal of Philosophy 4 (1): 1-22. 2025.

Getting Accurate about Knowledge with Sam Carter Mind 132 (525): 158-191. 2022.

Omega Knowledge Matters Oxford Studies in Epistemology. forthcoming.

Iterated Knowledge Oxford University Press. 2024.

Safety, Closure, and Extended Methods with John Hawthorne Journal of Philosophy 121 (1): 26-54. 2024.

Attitude verbs’ local context with Kyle Blumberg Linguistics and Philosophy 46 (3): 483-507. 2022.

A Question-Sensitive Theory of Intention with Bob Beddor Philosophical Quarterly 73 (2): 346-378. 2022.

A Thousand AI Constitutions
with Peter Salib

Preface Knowledge
with John Hawthorne

In Alex Burri & Michael Frauchiger (eds.), Themes from Williamson, De Gruyter. forthcoming.

Liberalism Forever
with Peter Salib

AI Rights
with Peter Salib

Cambridge University Press. forthcoming.

How to Count AIs: Individuation and Liability for AI Agents
with Yonathan Arbel and Peter Salib

Boston College Law Review. forthcoming.

AI Suffrage for Human Flourishing
with Guha Krishnamurthi and Peter Salib

Fordham Law Review. forthcoming.

AI Death
with Harvey Lederman

Philosophical Perspectives. forthcoming.

AI Is Not a Natural Monopoly
with Peter Salib

Minnesota Law Review Online. forthcoming.

AI Survival Stories: Responses to Critics
with Herman Cappelen and John Hawthorne

Philosophy of Ai 1 100-106. 2025.

AI Welfare: Agency, Consciousness, Sentience
with Cameron Domenico Kirk-Giannini

Oxford University Press. forthcoming.

What Does ChatGPT Want? An Interpretationist Guide
with Harvey Lederman

A semantic theory of redundancy
with Kyle Blumberg

Linguistics and Philosophy 48 (4): 787-821. 2025.

AI Survival Stories: a Taxonomic Analysis of AI Existential Risk
with Herman Cappelen and John Hawthorne

Philosophy of Ai. forthcoming.

Will AI & Humanity Go to War?
AI and Society 1-14. forthcoming.

AI Rights for Human Safety
with Peter Salib

Virginia Law Review. 2024.

A Case for AI Consciousness: Language Agents and Global Workspace Theory
with Cameron Domenico Kirk-Giannini

Journal of Consciousness Studies. forthcoming.

KK is Wrong Because We Say So
with John Hawthorne

Mind 134 (533): 33-59. 2024.

Does ChatGPT Have a Mind?
with Benjamin Anders Levinstein

Philosophy of Ai. forthcoming.

Shutdown-seeking AI
with Pamela Robinson

Philosophical Studies 182 (7): 1567-1579. 2025.

AI Deception: A Survey of Examples, Risks, and Potential Solutions
with Peter Park, Aidan O'Gara, Michael Chen, and Dan Hendrycks

Losing confidence in luminosity
with Daniel Waxman

Noûs 55 (4): 962-991. 2021.

Language Agents Reduce the Risk of Existential Catastrophe
with Cameron Domenico Kirk-Giannini

AI and Society 40 (2): 959-969. 2025.

AI wellbeing
with Cameron Domenico Kirk-Giannini

Asian Journal of Philosophy 4 (1): 1-22. 2025.

Getting Accurate about Knowledge
with Sam Carter

Mind 132 (525): 158-191. 2022.

Omega Knowledge Matters
Oxford Studies in Epistemology. forthcoming.

Iterated Knowledge
Oxford University Press. 2024.

Safety, Closure, and Extended Methods
with John Hawthorne

Journal of Philosophy 121 (1): 26-54. 2024.

Attitude verbs’ local context
with Kyle Blumberg

Linguistics and Philosophy 46 (3): 483-507. 2022.

A Question-Sensitive Theory of Intention
with Bob Beddor

Philosophical Quarterly 73 (2): 346-378. 2022.