In face-to-face interaction, speakers make multimodal contributions that exploit both the linguistic resources of spoken language and the visual and spatial affordances of gesture. In this paper, we argue that, in formulating and understanding such multimodal contributions, interlocutors apply the same principles of coherence that characterize the interpretation of natural language discourse. In particular, we use a close analysis of a series of naturally-occurring embodied discourses to argue f…
Read moreIn face-to-face interaction, speakers make multimodal contributions that exploit both the linguistic resources of spoken language and the visual and spatial affordances of gesture. In this paper, we argue that, in formulating and understanding such multimodal contributions, interlocutors apply the same principles of coherence that characterize the interpretation of natural language discourse. In particular, we use a close analysis of a series of naturally-occurring embodied discourses to argue for two key generalizations. First, communicators and their audiences draw on coherence relations to establish interpretive connections between successive gestures and between gestures and speech. Second, coherence relations facilitate meaning-making by resolving the underspecified meaning of each communicative act through constrained inference over entities, propositions, and spatial frames made salient in the prior discourse. Our approach to gesture interpretation improves on previous work in better accounting for its flexibility, in capturing its constraints, and in laying the groundwork for formal and computational models. At the same time, it shows that gesture provides an important source of evidence to sharpen the theory of coherence relations and contextual resolution.