Biases and misunderstanding stemming from pre-training in Generative Pre-Trained Transformers are more likely for users of underrepresented English varieties, since the training dataset favors dominant Englishes (e.g., American English). We investigate (potential) bias in GPT-4 when it interacts with Trinidadian English Creole (TEC), a non-hegemonic English variety that partially overlaps with standardized English (SE) but still contains distinctive characteristics. (1) Comparable responses: we …
Read moreBiases and misunderstanding stemming from pre-training in Generative Pre-Trained Transformers are more likely for users of underrepresented English varieties, since the training dataset favors dominant Englishes (e.g., American English). We investigate (potential) bias in GPT-4 when it interacts with Trinidadian English Creole (TEC), a non-hegemonic English variety that partially overlaps with standardized English (SE) but still contains distinctive characteristics. (1) Comparable responses: we asked GPT-4 18 questions in TEC and SE and compared the content and detail of the responses. (2) Accurate translation: we assessed how accurate and authentic 29 TEC and 34 SE translations were. (3) Language knowledge and attitudes: we asked what language the prompts were written in and categorized the responses and examined any language attitudes that were exhibited. Content and detail in prompts were comparable. The model was proficient at translating TEC pronouns and many grammatical categories. It was weaker at processing spelling and vocabulary items. In addition, it produced several inauthentic features. Only 39% of TEC-generated sentences were fully grammatical. While GPT-4 was perfect at identifying SE, it was 21% accurate at identifying TEC, which it sometimes classified as English with “errors” and “corrected”. GPT-4’s scope of use is limited for non-hegemonic English users. It is problematic that some of its analyses perpetuate bias against underrepresented Englishes. Increased research on lesser-documented Englishes is necessary and we anticipate that this problem affects dialects of other languages. We intend to partner with Trinidadian stakeholders to train GPT-4 in the future.