This study investigates how various large language models (LLMs) generate responses to moral reasoning dilemmas. It specifically examines LLM-generated responses using the Defining Issues Test (DIT-2), which measures abstract moral reasoning schemas, and the Intermediate Concepts Measure (ICM Educational leaders’ version), which assesses domain-specific professional moral reasoning. For DIT-2, Claude prioritizes the highest post-conventional moral reasoning, followed by Gemini Advanced and Gemin…
Read moreThis study investigates how various large language models (LLMs) generate responses to moral reasoning dilemmas. It specifically examines LLM-generated responses using the Defining Issues Test (DIT-2), which measures abstract moral reasoning schemas, and the Intermediate Concepts Measure (ICM Educational leaders’ version), which assesses domain-specific professional moral reasoning. For DIT-2, Claude prioritizes the highest post-conventional moral reasoning, followed by Gemini Advanced and Gemini. For the ICM Educational Leaders version, Gemini Advanced had the highest total ICM score, followed by Claude 3.5 Sonnet and Gemini. The findings indicate that some LLMs can generate responses consistent with sophisticated moral reasoning patterns, producing scores comparable to or exceeding graduate-level human participants; however, no direct comparisons with human participants were made in this study. This study provides a methodological framework for guiding larger-scale research into AI-generated and human moral reasoning patterns.