Jen-Tse Huang (Johns Hopkins University): Publications

More details

Johns Hopkins University
Comuter Science

Post-doctoral fellow

Homepage

Baltimore, Maryland, United States of America

1389

AI Welfare is Bullshit
with Yunze Xiao, Gordon Dai, Shahan Ali Memon, Maarten Sap, and Mona Diab

International Conference on Machine Learning. forthcoming.

Recent proposals urge AI labs to prepare for “AI welfare” under uncertainty about whether AI systems have morally relevant inner states. We do not argue for or against the possibility of AI welfare. Instead, we argue that current AI welfare assessment fails for two linked structural reasons absent from other evaluation targets. First, AI welfare indicators are co-engineered with the systems they evaluate: ordinary development decisions that shape model behavior can also manufacture or suppress w…Read more
Recent proposals urge AI labs to prepare for “AI welfare” under uncertainty about whether AI systems have morally relevant inner states. We do not argue for or against the possibility of AI welfare. Instead, we argue that current AI welfare assessment fails for two linked structural reasons absent from other evaluation targets. First, AI welfare indicators are co-engineered with the systems they evaluate: ordinary development decisions that shape model behavior can also manufacture or suppress welfare evidence. Second, AI welfare lacks external validation: no deployment failure or independent test can reveal whether a welfare metric tracks anything real about the system. Together, these problems yield our central claim: For current systems, AI welfare is bullshit in Frankfurt’s sense, as its measurement regime is structurally disconnected from truthtracking. AI welfare should therefore not be institutionalized as a binding gate for oversight, release, or accountability; restrictions on AI systems should instead be justified by externally verifiable harms.

The Nature of Artificial Intelligence

Jen-Tse Huang

AI Welfare is Bullshit with Yunze Xiao, Gordon Dai, Shahan Ali Memon, Maarten Sap, and Mona Diab International Conference on Machine Learning. forthcoming.

AI Welfare is Bullshit
with Yunze Xiao, Gordon Dai, Shahan Ali Memon, Maarten Sap, and Mona Diab

International Conference on Machine Learning. forthcoming.