•  141
    Beyond Preferences in AI Alignment
    with Tan Zhi-Xuan, Matija Franklin, and Hal Ashton
    Philosophical Studies 182 (7): 1813-1863. 2025.
    The dominant practice of AI alignment assumes (1) that preferences are an adequate representation of human values, (2) that human rationality can be understood in terms of maximizing the satisfaction of preferences, and (3) that AI systems should be aligned with the preferences of one or more humans to ensure that they behave safely and in accordance with our values. Whether implicitly followed or explicitly endorsed, these commitments constitute what we term a preferentist approach to AI alignm…Read more