Large language models (LLMs) exhibit impressive performance across a range of apparently cognitive tasks. Mentalists holds that this performance is best explained by the fact that LLMs have mental states, while anti-mentalists hold that this performance should be explained some other way. In this note we address representationalist folk mentalism, which holds (a) possessing a folk mental state like belief or desire is a matter of having an internal representation with appropriate content and (b)…
Read moreLarge language models (LLMs) exhibit impressive performance across a range of apparently cognitive tasks. Mentalists holds that this performance is best explained by the fact that LLMs have mental states, while anti-mentalists hold that this performance should be explained some other way. In this note we address representationalist folk mentalism, which holds (a) possessing a folk mental state like belief or desire is a matter of having an internal representation with appropriate content and (b) that LLMs have folk psychological states of this sort (or at least robust precursors to such states). Although representationalist folk mentalism appears attractive, we argue that neither probing nor intervention studies have uncovered representations of the relevant sort in state-of-the-art LLMs. However, while it would be premature to accept representationalist folk mentalism, our argument provides a roadmap for mechanistic interpretability research going forward.