This paper explores what I call the reverse alignment problem (RAP). The alignment problem in artificial intelligence (AI) is the challenge of ensuring that superintelligent AI systems harmonize and promote wider social, personal, and environmental values. Standard formulations of the alignment problem depict the issue largely as a technical or engineering problem, one that concerns the right specification of the goals and objectives to be pursued to ensure machines are consistent with the inten…
Read moreThis paper explores what I call the reverse alignment problem (RAP). The alignment problem in artificial intelligence (AI) is the challenge of ensuring that superintelligent AI systems harmonize and promote wider social, personal, and environmental values. Standard formulations of the alignment problem depict the issue largely as a technical or engineering problem, one that concerns the right specification of the goals and objectives to be pursued to ensure machines are consistent with the intentions of the designer. This paper builds on this literature by exploring a bidirectional interaction between human values and the machines we use to effectuate our intentions. I argue that alignment works in both ways: we could align machines with the complex tapestry of human values (notoriously difficult); or we can reduce and simplify human values, preferences, and goals so as to be easier to satisfy—a process that is, or so I argue, already underway. Thus, the RAP identifies the way in which we are already paving the way for forms of alignment that may not be desirable, indeed, may be misaligned with the social values for which they were initially intended. To explore this interaction in depth, I look toward the notion of value capture, as formulated by C.T. Nguyen, as the foundational mechanism through which the reverse alignment operates. Value capture occurs when rich, multidimensional human values are reduced to simplified proxies that optimization systems can measure and maximize. These values are then fed back into society and often tacitly adopted, leading to further alignment with machine-readable interpretations of human behavior. As philosopher of technology Karen Crawford has observed on emotion recognition in machine learning, “Techniques have been developed to reduce the messiness of feelings, interior states, preferences, and identifications into something quantitative, detectable, and trackable” (Crawford 2021). Building on this insight, I argue that the RAP is the process in which the human dimension of the alignment problem is surreptitiously modified, modulated, and manipulated to align more easily with machines: alignment achieved in the reverse direction—not by making machines more like humans and, therefore, sensitive to contextual features of the value landscape, but by making humans more ‘machine-like’.