•  1299
    Philosophy of Artificial Intelligence: The State of the Art (edited book)
    with Vincent C. Müller, Guido Löhr, and Aliya Rumana
    SpringerNature. 2026.
    Proceedings of the 5th conference "Philosophy of AI", December 2023, Erlangen (PhAI 2023).
  •  224
    AI systems have become increasingly capable of dangerous behaviours in many domains. This raises the question: Do models sometimes choose to violate human instructions in order to perform behaviour that is more useful for certain goals? We introduce a benchmark for measuring model propensity for instrumental convergence (IC) behaviour in terminal-based agents. This is behaviour such as self-preservation that has been hypothesised to play a key role in risks from highly capable AI agents. Our ben…Read more
  •  538
    We first give reasons for an attitude-dependent view of personal identity on which an AI system’s identity conditions are determined by its pattern of self-concern. We show that this view has important implications for the moral obligations we would have to AI moral patients. Self-concern, we contend, could also be used to predict, explain, and manipulate AI’s self-interested behavior in safety-relevant ways. The role that self-concern could play for AI identity, rights and safety generates desi…Read more
  •  501
    If some language models become welfare subjects, how could we find out what welfare states they are in? We develop an analogical-abductive approach for measuring language model welfare. This approach adapts paradigms used to measure human or non-human animal welfare, for instance verbal reports or non-verbal choice behavior (analogy). Then, one systematically searches for clusters of such indicators in language models. This search for clusters contributes to the cross-validation of welfare measu…Read more
  •  807
    In the wake of the James-Lange theory, many accounts of emotion highlight its close connection to the body. This link may pose an obstacle to the possibility of emotion in disembodied information-processing systems, such as large language models. After clarifying the nature and the significance of this issue, we review the evidence that bears on the body-emotion relationship. We argue that this evidence is inconclusive, as far as AI affect is concerned. Since researchers have so far been confine…Read more
  •  48
    Illusionism states that phenomenal consciousness does not exist, even though it seems to exist. While illusionism is controversial, it is a serious contender among theories of consciousness. We argue that it has substantial and non-trivial implications for non-human consciousness research (NHCR), particularly for the study of the distribution of phenomenal consciousness across beings. If illusionism is true, NHCR can be pursued if conceptualized as investigating the distribution of quasi-phenome…Read more
  •  1078
    Why I am not a biological naturalist
    Behavioral and Brain Sciences. forthcoming.
    Commentary. I make three claims: First, denying biological naturalism does not logically require computational functionalism. Second, while Seth’s arguments establish biological naturalism as a view worth taking seriously, they fail to make it more plausible than the view that AI can be conscious. Third, there are independent arguments suggesting the overall more plausible view is that AI can be conscious.
  •  1588
    AI alignment research aims to develop techniques to ensure that AI systems do not cause harm. However, every alignment technique has failure modes, which are conditions in which there is a non-negligible chance that the technique fails to provide safety. As a strategy for risk mitigation, the AI safety community has increasingly adopted a defense-in-depth framework: Conceding that there is no single technique which guarantees safety, defense-in-depth consists in having multiple redundant protect…Read more
  •  92
    Which AI systems are capable of deception, and how does deception differ between systems? In this paper, I develop a two-step, multi-dimensional account of LLM deception. On this account, having the capacity for deception minimally requires being able to produce false beliefs in others to achieve one’s own goals. In all systems which satisfy this minimal condition, a system’s deception profile can be characterized as a point in a multidimensional space. The five dimensions of this space are skil…Read more
  •  592
    We develop new experimental paradigms for measuring welfare in language models. We compare verbal reports of models about their preferences with preferences expressed through behavior when navigating a virtual environment and selecting conversation topics. We also test how costs and rewards affect behavior and whether responses to an eudaimonic welfare scale - measuring states such as autonomy and purpose in life - are consistent across semantically equivalent prompts. Overall, we observed a not…Read more
  •  1641
    This is the first book to investigate the nature and extent of artificial intelligence (AI) suffering risks. It argues that AI suffering risk is a serious near-term concern and analyzes approaches for addressing it. AI systems are currently treated as mere objects, not as bearers of moral standing whose wellbeing may matter in its own right. However, we may soon create AI systems which are capable of suffering and thus have moral standing. This book examines the philosophy and science of AI suff…Read more
  •  64
    Track Record Arguments in Normative Ethics
    Pacific Philosophical Quarterly. forthcoming.
    Track record arguments (TRAs) contend that it speaks in favor of an ethical theory (such as utilitarianism) if many of its past proponents had moral views that were controversial at their time but which we now consider to be clearly true (e.g., women's equal rights in 18th century Europe). This paper explores how to construct potentially sound TRAs and evaluates their merits. I show that, in principle, TRAs can support the claim that an ethical theory should be used as a guide for making ethical…Read more
  •  957
    AGI Racing is the view that it is in the self-interest of major actors in AI development, especially powerful nations, to accelerate their frontier AI development to build highly capable AI, especially artificial general intelligence (AGI), before competitors have a chance. We argue against AGI Racing. First, the downsides of racing to AGI are much higher than portrayed by this view. Racing to AGI would substantially increase catastrophic risks from AI, including nuclear instability, and undermi…Read more
  •  120
    In response to the worry that autonomous generally intelligent artificial agents may at some point take over control of human affairs a common suggestion is that we should “solve the alignment problem” for such agents. We show that current discourse around this suggestion often uses a particular framing of artificial intelligence (AI) alignment as binary, a natural kind, mainly a technical‐scientific problem, realistically achievable, or clearly operationalizable. Each of these assumptions may n…Read more
  •  111
    How can we develop an adequate scientific understanding of the minds of nonhuman animals? We argue for a methodology based on multi-dimensional profile accounts. Such accounts are already used for the comparative study of norm cognition, consciousness, empathy and causal cognition, among others. This methodology demands that a cognitive capacity is characterized by a set of independent dimensions where each dimension is connected to operationalizable empirical indicators. Based on the level of r…Read more
  •  92
    Text Selection for Philosophy Courses: A Topic-Sensitive Guide
    Teaching Philosophy 48 (2): 163-181. 2025.
    Which philosophical texts should instructors of philosophy choose to foster the development of philosophical skills and competences? In this paper, we would like make some steps towards answering this question by critically comparing two prominent sources of philosophical texts: the philosophical tradition and contemporary research in academic philosophy. Against the background of three basic desiderata that any philosophical text needs to satisfy in order to be eligible for usage in problem-cen…Read more
  •  63
    The Effectiveness of Nudging and Its Ethical Implications
    Bioethics 39 (8): 748-754. 2025.
    Nudging consists of interventions that aim to alter behavior in a certain way by changing the presentation or framing of options, without coercion or changing economic incentives. This paper discusses the effectiveness of nudging and the ethical implications of this effectiveness. Section 2 suggests that—if publication bias is adequately accounted for—recent comprehensive meta-analyses as well as high-quality experiments show that nudging is much less effective than previously assumed. Sections …Read more
  •  334
    Misalignment or misuse? The AGI alignment tradeoff
    Philosophical Studies 1-29. forthcoming.
    Creating systems that are aligned with our goals is seen as a leading approach to create safe and beneficial AI in both leading AI companies and the academic field of AI safety. We defend the view that misaligned AGI – future, generally intelligent (robotic) AI agents – poses catastrophic risks. At the same time, we support the view that aligned AGI creates a substantial risk of catastrophic misuse by humans. While both risks are severe and stand in tension with one another, we show that – in pr…Read more
  •  782
    The argument for near-term human disempowerment through AI
    AI and Society 40 (3): 1195-1208. 2025.
    Many researchers and intellectuals warn about extreme risks from artificial intelligence. However, these warnings typically came without systematic arguments in support. This paper provides an argument that AI will lead to the permanent disempowerment of humanity, e.g. human extinction, by 2100. It rests on four substantive premises which it motivates and defends: first, the speed of advances in AI capability, as well as the capability level current systems have already reached, suggest that it …Read more
  •  1180
    The development and ubiquitous availability of large language model based systems (LLMs) poses a plurality of potentials and risks for education in schools and universities. In this paper, we provide an analysis and discussion of the overreliance concern as one specific risk: that students might fail to acquire important capacities, or be inhibited in the acquisition of these capacities, because they overly rely on LLMs. We use the distinction between global and local goals of education to guide…Read more
  •  268
    There is much interest in investigating the evolution question: How did consciousness evolve? In this paper, we evaluate the role that evolutionary considerations can play in justifying (i.e., confirming or falsifying) hypotheses about the origin, nature, and function of consciousness. Specifically, we argue against what we call evolution-first approaches to consciousness, according to which evolutionary considerations provide the primary and foundational lens through which we should assess hypo…Read more
  •  641
    It is plausible that there is a contrast between the rich emotional content which is often connected to laypeople’s interest in philosophy and the emotional austerity of doing academic philosophy. We propose the hypothesis that this contrast is one cause of the disappointment some students experience when they begin to study philosophy in college. We also propose a more demanding hypothesis, according to which this emotional contrast is confused with a semantic difference, which misleads student…Read more
  •  1147
    ABSTRACT The impact of artificial intelligence (AI) is not only global but globally varied. Yet, AI ethics is all too often overly localised. This paper discusses the potential of a global AI ethics, highlighting several important variables that it should take into account if it is to be as successful an enterprise as it needs to be.
  •  2026
    Implementing artificial consciousness
    Mind and Language 40 (3): 285-205. 2025.
    Implementationalism maintains that conventional, silicon-based artificial systems are not conscious because they fail to satisfy certain substantive constraints on computational implementation. In this article, we argue that several recently proposed substantive constraints are implausible, or at least are not well-supported, insofar as they conflate intuitions about computational implementation generally and consciousness specifically. We argue instead that the mechanistic account of computatio…Read more
  •  2385
    I develop the anticipatory argument for the view that it is nomologically possible that some non-biological creatures are phenomenally conscious, including conventional, silicon-based AI systems. This argument rests on the general idea that we should make our beliefs conform to the outcomes of an ideal scientific process and that such an ideal scientific process would attribute consciousness to some possible AI systems. More specifically, I argue that an ideal application of the iterative natura…Read more
  •  947
    Values in science and AI alignment research
    Inquiry: An Interdisciplinary Journal of Philosophy. forthcoming.
    Roughly, empirical AI alignment research (AIA) is an area of AI research which investigates empirically how to design AI systems in line with human goals. This paper examines the role of non-epistemic values in AIA. It argues that: (1) Sciences differ in the degree to which values influence them. (2) AIA is strongly value-laden. (3) This influence of values is managed inappropriately and thus threatens AIA’s epistemic integrity and ethical beneficence. (4) AIA should strive to achieve value tran…Read more
  •  2346
    Is superintelligence necessarily moral?
    Analysis 84 (4): 730-738. 2024.
    Numerous authors have expressed concern that advanced artificial intelligence (AI) poses an existential risk to humanity. These authors argue that we might build AI which is vastly intellectually superior to humans (a ‘superintelligence’), and which optimizes for goals that strike us as morally bad, or even irrational. Thus this argument assumes that a superintelligence might have morally bad goals. However, according to some views, a superintelligence necessarily has morally adequate goals. Thi…Read more
  •  110
    According to a growing number of researchers, AI may pose catastrophic – or even existential – risks to humanity. Catastrophic risks may be taken to be risks of 100 million human deaths, or a similarly bad outcome. I argue that such risks – while contested – are sufficiently likely to demand rigorous discussion of potential societal responses. Subsequently, I propose four desiderata for approaches to the reduction of catastrophic risks from AI. The quality of such approaches can be assessed by t…Read more
  •  4560
    Understanding Artificial Agency
    Philosophical Quarterly 75 (2): 450-472. 2025.
    Which artificial intelligence (AI) systems are agents? To answer this question, I propose a multidimensional account of agency. According to this account, a system's agency profile is jointly determined by its level of goal-directedness and autonomy as well as is abilities for directly impacting the surrounding world, long-term planning and acting for reasons. Rooted in extant theories of agency, this account enables fine-grained, nuanced comparative characterizations of artificial agency. I sho…Read more
  •  2140
    Preserving the Normative Significance of Sentience
    Journal of Consciousness Studies 31 (1): 8-30. 2024.
    According to an orthodox view, the capacity for conscious experience (sentience) is relevant to the distribution of moral status and value. However, physicalism about consciousness might threaten the normative relevance of sentience. According to the indeterminacy argument, sentience is metaphysically indeterminate while indeterminacy of sentience is incompatible with its normative relevance. According to the introspective argument (by François Kammerer), the unreliability of our conscious intro…Read more