The reliability and representativeness of the stimuli used in psychological experiments plays a critical role in the generalizability of their findings. To evaluate the potential impact of reliability and representativeness in psycholinguistics and the cognitive sciences more broadly, we conducted a case study using the domain of lexical ambiguity as a foil. We examined how often studies agreed on the ambiguity types assigned to a word (i.e., homonymy, polysemy, and monosemy), and how well the w…
Read moreThe reliability and representativeness of the stimuli used in psychological experiments plays a critical role in the generalizability of their findings. To evaluate the potential impact of reliability and representativeness in psycholinguistics and the cognitive sciences more broadly, we conducted a case study using the domain of lexical ambiguity as a foil. We examined how often studies agreed on the ambiguity types assigned to a word (i.e., homonymy, polysemy, and monosemy), and how well the words represented the populations underlying each ambiguity type. These analyses involved 3597 unique words (14792 tokens) from 240 studies. We observed that (1) there is substantial, albeit imperfect agreement in words being assigned to ambiguity types; (2) that coverage of the underlying populations is relatively poor and biased, with substantial re-use of some stimuli across studies; (3) some clusters of studies engage in substantial stimulus re-use, which although beneficial in some respects, may impact generalizability; and (4) in a series of pseudo-experiments, the aforementioned issues of reliability and representativeness could conceivably alter the reported patterns of effects observed in lexical decision, a popular experimental task. Taken together, our findings raise questions about issues of reliability and generalizability that could impact prior theoretical claims. We discuss our findings with respect to specific considerations related to lexical ambiguity, such as the challenge of ambiguity type labeling, as well as broader considerations relevant to the cognitive sciences, such as the theoretical basis for generalizing, and how we optimize the trade-off between replication and generalization. We close by offering targeted directions to improve research practices.