Eve Wang (Case Western Reserve University): Publications

More details

Case Western Reserve University
Department of Philosophy

Undergraduate

Cleveland, Ohio, United States of America

77

Post-Training as Distributional Intervention: The Mismatch Between Stated Goals and Actual Operations

Post-training is often described as the process by which a base language model is made more truthful, aligned, grounded, or reliable. Yet its engineering practice tells a more complicated story: RL pipelines are constrained against reward hacking and drift; fine-tuned models are compared against baselines; retrieval and reasoning systems still depend on verification. These practices suggest a gap between the language used to describe post-training and the operations by which post-training change…Read more
Post-training is often described as the process by which a base language model is made more truthful, aligned, grounded, or reliable. Yet its engineering practice tells a more complicated story: RL pipelines are constrained against reward hacking and drift; fine-tuned models are compared against baselines; retrieval and reasoning systems still depend on verification. These practices suggest a gap between the language used to describe post-training and the operations by which post-training changes deployed systems. This paper argues that the gap is not accidental. Post-training is better understood as distributional intervention. Supervised fine-tuning, reinforcement learning from human feedback, retrieval-augmented generation, direct preference optimization, and parameter-efficient adaptation differ in implementation; each changes the conditions under which outputs are sampled, selected, routed, or constrained. Verification enters through procedures that relate outputs to reference states. The paper first examines engineering clues within post-training practice, then names the mechanism they point toward. It then identifies four costs of the mismatch: loss of error diagnosability, misalignment between trust and reliability, compound failure under stacked interventions, and resource expenditure on a misframed target. It concludes by relocating the research question from installing truth to characterizing distributions and building verification systems around their outputs.

Large Language Models Impact of Artificial Intelligence
26

Li ji Zheng zhu hui jiao = (edited book)
Zhonghua shu ju. 2020.

Chinese Philosophy

Eve Wang

Post-Training as Distributional Intervention: The Mismatch Between Stated Goals and Actual Operations

Li ji Zheng zhu hui jiao = (edited book) Zhonghua shu ju. 2020.

Li ji Zheng zhu hui jiao = (edited book)
Zhonghua shu ju. 2020.