Cambridge, Massachusetts, United States of America
Areas of Interest
17th/18th Century Philosophy
  •  2
    As a young college student studying philosophy, Klein filled a notebook with short quotes from the world's greatest thinkers, hoping to find some guidance on how to live the best life he could. As he revisits the wisdom he relished in his youth, each extract is annotated with Klein's inimitable charm and insights. He tackles life's biggest questions-- and leaves us chuckling and enlightened.
  •  27
    Analyzing the Rate at Which Languages Lose the Influence of a Common Ancestor
    with Anna N. Rafferty and Thomas L. Griffiths
    Cognitive Science 38 (7): 1406-1431. 2014.
    Analyzing the rate at which languages change can clarify whether similarities across languages are solely the result of cognitive biases or might be partially due to descent from a common ancestor. To demonstrate this approach, we use a simple model of language evolution to mathematically determine how long it should take for the distribution over languages to lose the influence of a common ancestor and converge to a form that is determined by constraints on language learning. We show that model…Read more
  •  12
    Asymmetric Interpretations
    Journal des Economistes Et des Etudes Humaines 12 (1). 2002.
    Knowledge consists of the triad: information, interpretation, and judgement. Much of modern political economy has miscarried by proceeding as though knowledge were merely “information” that is, as though interpretation were symmetric and final. Economic prosperity depends greatly on new knowledge or “discovery” of profit opportunities that translate into social betterment. These discoveries are often a transcending of the working interpretation, not merely the acquisition of new information. The…Read more
  • Formal Ontology is a discipline whose business is to develop formal theories about general aspects of reality such as identity, dependence, parthood, truth-making, causality, etc. A foundational ontology is a specific consistent set of these ontological theories that support activities such as domain analysis, conceptual clarification, and meaning negotiation. A (well-founded) core ontology specifies, under a foundational ontology, the central concepts and relations of a given domain. Foundation…Read more
  •  26
    Hume and Smith on utility, agreeableness, propriety, and moral approval
    with Erik W. Matson and Colin Doran
    History of European Ideas 45 (5): 675-704. 2019.
    OVERVIEWWe ambitiously reexamine Smith’s moral theory in relation to Hume’s. We regard Smith's developments as glorious and important. We also see them as quite fully agreeable to Hume, as enhancement, not departure. But Smith represents matters otherwise! Why would Smith overstate disagreement with his best friend?One aspect of Smith’s enhancement, an aspect he makes very conspicuous, is that between moral approval and beneficialness there is another phase, namely, the moral judge's sense of pr…Read more
  •  7
    Table at Dimitri's Taverna : on seeking a philosophy of old age -- Old Greek's olive trees : on Epicurus's philosophy of fulfillment -- Deserted terrace : on time and worry beads -- Tasso's rain-spattered photographs : on solitary reflection -- Sirocco of youth's beauty : on existential authenticity -- Tintinnabulation of sheep bells : on mellowing to metaphysics -- Iphigenia's guest : on stoicism and old old age -- Burning boat in Kamini Harbor : on the timeliness of spirituality -- Returning h…Read more
  •  20
    This paper presents empirical studies and closely corresponding theoretical models of the performance of a chart parser exhaustively parsing the Penn Treebank with the Treebank’s own CFG grammar. We show how performance is dramatically affected by rule representation and tree transformations, but little by top-down vs. bottom-up strategies. We discuss grammatical saturation, including analysis of the strongly connected components of the phrasal nonterminals in the Treebank, and model how, as sen…Read more
  •  20
    This paper presents a novel approach to the unsupervised learning of syntactic analyses of natural language text. Most previous work has focused on maximizing likelihood according to generative PCFG models. In contrast, we employ a simpler probabilistic model over trees based directly on constituent identity and linear context, and use an EM-like iterative procedure to induce structure. This method produces much higher quality analyses, giving the best published results on the ATIS dataset.
  •  40
    While Ç ´Ò¿ µ methods for parsing probabilistic context-free grammars (PCFGs) are well known, a tabular parsing framework for arbitrary PCFGs which allows for botton-up, topdown, and other parsing strategies, has not yet been provided. This paper presents such an algorithm, and shows its correctness and advantages over prior work. The paper finishes by bringing out the connections between the algorithm and work on hypergraphs, which permits us to extend the presented Viterbi (best parse) algorith…Read more
  •  20
    Parsing and Hypergraphs
    with Christopher D. Manning
    While symbolic parsers can be viewed as deduction systems, this view is less natural for probabilistic parsers. We present a view of parsing as directed hypergraph analysis which naturally covers both symbolic and probabilistic parsing. We illustrate the approach by showing how a dynamic extension of Dijkstra’s algorithm can be used to construct a probabilistic chart parser with an Ç´Ò¿µ time bound for arbitrary PCFGs, while preserving as much of the flexibility of symbolic chart parsers as allow…Read more
  •  18
    Distributional Phrase Structure Induction
    with Christopher D. Manning
    Unsupervised grammar induction systems commonly judge potential constituents on the basis of their effects on the likelihood of the data. Linguistic justifications of constituency, on the other hand, rely on notions such as substitutability and varying external contexts. We describe two systems for distributional grammar induction which operate on such principles, using part-of-speech tags as the contextual features. The advantages and disadvantages of these systems are examined, including precis…Read more
  •  19
    We present a novel generative model for natural language tree structures in which semantic (lexical dependency) and syntactic (PCFG) structures are scored with separate models. This factorization provides conceptual simplicity, straightforward opportunities for separately improving the component models, and a level of performance comparable to similar, non-factored models. Most importantly, unlike other modern parsing models, the factored model admits an extremely effective A* parsing algorithm,…Read more
  •  36
    Accurate Unlexicalized Parsing
    with Christopher D. Manning
    We demonstrate that an unlexicalized PCFG can parse much more accurately than previously shown, by making use of simple, linguistically motivated state splits, which break down false independence assumptions latent in a vanilla treebank grammar. Indeed, its performance of 86.36% (LP/LR F1) is better than that of early lexicalized PCFG models, and surprisingly close to the current state-of-theart. This result has potential uses beyond establishing a strong lower bound on the maximum possible accu…Read more
  •  15
    This paper separates conditional parameter estima- tion, which consistently raises test set accuracy on statistical NLP tasks, from conditional model struc- tures, such as the conditional Markov model used for maximum-entropy tagging, which tend to lower accuracy. Error analysis on part-of-speech tagging shows that the actual tagging errors made by the conditionally structured model derive not only from label bias, but also from other ways in which the independence assumptions of the conditional…Read more
  •  24
    We discuss two named-entity recognition models which use characters and character n-grams either exclusively or as an important part of their data representation. The first model is a character-level HMM with minimal context information, and the second model is a maximum-entropy conditional markov model with substantially richer context features. Our best model achieves an overall F1 of 86.07% on the English test data (92.31% on the development data). This number represents a 25% error reduction …Read more
  •  30
    We present a generative distributional model for the unsupervised induction of natural language syntax which explicitly models constituent yields and contexts. Parameter search with EM produces higher quality analyses than previously exhibited by unsupervised systems, giving the best published unsupervised parsing results on the ATIS corpus. Experiments on Penn treebank sentences of comparable length show an even higher F1 of 71% on nontrivial brackets. We compare distributionally induced and ac…Read more
  •  12
    Method by subtracting off the error along several nonprincipal eigenvectors from the current iterate of the Power Method, making use of known nonprincipal eigenvalues of the Web hyperlink matrix. Empirically, we show that using Power Extrapolation speeds up PageRank computation by 30% on a Web graph of 80 million nodes in realistic scenarios over the standard power method, in a way that is simple to understand and implement.
  •  14
    Combining Heterogeneous Classifiers for Word-Sense Disambiguation
    with Christopher D. Manning and Kristina Toutanova
    This paper discusses ensembles of simple but heterogeneous classifiers for word-sense disambiguation, examining the Stanford-CS224N system entered in the SENSEVAL-2 English lexical sample task. First-order classifiers are combined by a second-order classifier, which variously uses majority voting, weighted voting, or a maximum entropy model. While individual first-order classifiers perform comparably to middle-scoring teams’ systems, the combination achieves high performance. We discuss trade-offs an…Read more
  •  13
    erative clustering. First, we show formally that the common heuristic agglomerative clustering algorithms – Ward’s method, single-link, complete-link, and a variant of group-average – are each equivalent to a hierarchical model-based method. This interpretation gives a theoretical explanation of the empirical behavior of these algorithms, as well as a principled approach to resolving practical issues, such as number of clusters or the choice of method. Second, we show how a model-based viewpoint…Read more
  •  17
    A* PCFG parsing can dramatically reduce the time required to find the exact Viterbi parse by conservatively estimating outside Viterbi probabilities. We discuss various estimates and give efficient algorithms for computing them. On Penn treebank sentences, our most detailed estimate reduces the total number of edges processed to less than 3% of that required by exhaustive parsing, and even a simpler estimate which can be pre-computed in under a minute still reduces the work by a factor of 5. The…Read more
  •  16
    We present an improved method for clustering in the presence of very limited supervisory information, given as pairwise instance constraints. By allowing instance-level constraints to have spacelevel inductive implications, we are able to successfully incorporate constraints for a wide range of data set types. Our method greatly improves on the previously studied constrained -means algorithm, generally requiring less than half as many constraints to achieve a given accuracy on a range of real-wo…Read more