Recent proposals urge AI labs to prepare for “AI welfare” under uncertainty about whether AI systems have morally relevant inner states. We do not argue for or against the possibility of AI welfare. Instead, we argue that current AI welfare assessment fails for two linked structural reasons absent from other evaluation targets. First, AI welfare indicators are co-engineered with the systems they evaluate: ordinary development decisions that shape model behavior can also manufacture or suppress w…
Read moreRecent proposals urge AI labs to prepare for “AI welfare” under uncertainty about whether AI systems have morally relevant inner states. We do not argue for or against the possibility of AI welfare. Instead, we argue that current AI welfare assessment fails for two linked structural reasons absent from other evaluation targets. First, AI welfare indicators are co-engineered with the systems they evaluate: ordinary development decisions that shape model behavior can also manufacture or suppress welfare evidence. Second, AI welfare lacks external validation: no deployment failure or independent test can reveal whether a welfare metric tracks anything real about the system. Together, these problems yield our central claim: For current systems, AI welfare is bullshit in Frankfurt’s sense, as its measurement regime is structurally disconnected from truthtracking. AI welfare should therefore not be institutionalized as a binding gate for oversight,
release, or accountability; restrictions on AI systems should instead be justified by externally verifiable harms.