Post-training is often described as the process by which a base language model is made more truthful, aligned, grounded, or reliable. Yet its engineering practice tells a more complicated story: RL pipelines are constrained against reward hacking and drift; fine-tuned models are compared against baselines; retrieval and reasoning systems still depend on verification. These practices suggest a gap between the language used to describe post-training and the operations by which post-training change…
Read morePost-training is often described as the process by which a base language model is made more truthful, aligned, grounded, or reliable. Yet its engineering practice tells a more complicated story: RL pipelines are constrained against reward hacking and drift; fine-tuned models are compared against baselines; retrieval and reasoning systems still depend on verification. These practices suggest a gap between the language used to describe post-training and the operations by which post-training changes deployed systems.
This paper argues that the gap is not accidental. Post-training is better understood as distributional intervention. Supervised fine-tuning, reinforcement learning from human feedback, retrieval-augmented generation, direct preference optimization, and parameter-efficient adaptation differ in implementation; each changes the conditions under which outputs are sampled, selected, routed, or constrained. Verification enters through procedures that relate outputs to reference states.
The paper first examines engineering clues within post-training practice, then names the mechanism they point toward. It then identifies four costs of the mismatch: loss of error diagnosability, misalignment between trust and reliability, compound failure under stacked interventions, and resource expenditure on a misframed target. It concludes by relocating the research question from installing truth to characterizing distributions and building verification systems around their outputs.