•  40
    Transforming large collections of scientific publications to XML
    with M. Kohlhase, D. Ginev, and B. R. Miller
    lecting statistics about missing bindings and macros, and other errors. This guides debugging and development efforts, leading to iterative improvements in both the tools and the quality of the converted corpus. The build system thus serves as both a production conversion engine and software test harness. We have now processed the complete arχiv collection through 2006 consisting of more than 400,000 documents (a complete run is a processor-yearsize undertaking), continuously improving our succes…Read more