Mitigating bias in LLM‑based scoring of English language learners
Mitigating bias in LLM-based scoring for English language learners (ELLs) requires a structured approach to ensure fairness and accuracy. Below is a summary of key strategies, challenges, and outcomes based on recent research. Different LLMs employ varied bias mitigation methods. For example, GPT-4 uses data augmentation to diversify training samples, while BERT relies on bias-aware training to adjust scoring for linguistic diversity. Advanced frameworks like BRIDGE (LLM-based data augmentation) and AutoSCORE (multi-agent scoring systems) show promise in reducing subgroup bias. A comparison of these models reveals: See the Techniques for Mitigating Bias in LLM-Based Scoring section for more details on these frameworks and their implementation.