•  65
    Post-training is often described as the process by which a base language model is made more truthful, aligned, grounded, or reliable. Yet its engineering practice tells a more complicated story: RL pipelines are constrained against reward hacking and drift; fine-tuned models are compared against baselines; retrieval and reasoning systems still depend on verification. These practices suggest a gap between the language used to describe post-training and the operations by which post-training change…Read more