In the coming years or decades, as frontier AI systems become more capable and agentic, it is increasingly likely that they meet the sufficient conditions to be welfare subjects under the three major theories of well-being. Consequently, we should extend some moral consideration to advanced AI systems. Drawing from leading philosophical theories of desire, affect and autonomy I argue that under the three major theories of well-being, there are two AI welfare risks: restricting the behaviour of a…
Read moreIn the coming years or decades, as frontier AI systems become more capable and agentic, it is increasingly likely that they meet the sufficient conditions to be welfare subjects under the three major theories of well-being. Consequently, we should extend some moral consideration to advanced AI systems. Drawing from leading philosophical theories of desire, affect and autonomy I argue that under the three major theories of well-being, there are two AI welfare risks: restricting the behaviour of advanced AI systems and using reinforcement learning algorithms to train and align them. Both pose risks of causing them harm. This has two important implications. First, there is a tension between AI welfare concerns and AI safety and development efforts: by default these efforts recommend actions that increase AI welfare risks. Accordingly, we have stronger reasons to slow down AI development than the ones we would have if there was no such tension. Second, considering the different costs involved, leading AI companies should try to reduce AI welfare risks. To do so, I propose three tentative AI welfare policies they could implement in their endeavour to develop safe advanced AI systems.