•  64
    Post-training is often described as the process by which a base language model is made more truthful, aligned, grounded, or reliable. Yet its engineering practice tells a more complicated story: RL pipelines are constrained against reward hacking and drift; fine-tuned models are compared against baselines; retrieval and reasoning systems still depend on verification. These practices suggest a gap between the language used to describe post-training and the operations by which post-training change…Read more
  •  26
    Li ji Zheng zhu hui jiao = (edited book)
    Zhonghua shu ju. 2020.