Vol. 8 No. 12 (2024): JABADP-8-12
Articles

Hallucination in Low-Resource Languages: Amplified Risks and Mitigation Strategies for Multilingual LLMs

Mostafa Abdelrahman
Fayoum University, Department of Computer Science, El-Gamaa, Fayoum, Egypt.

Published 2024-12-10

How to Cite

Abdelrahman, M. (2024). Hallucination in Low-Resource Languages: Amplified Risks and Mitigation Strategies for Multilingual LLMs. Journal of Applied Big Data Analytics, Decision-Making, and Predictive Modelling Systems, 8(12), 17-24. https://polarpublications.com/index.php/JABADP/article/view/2024-12-10

Abstract

Hallucinations in low-resource languages present challenges for multilingual language models in domains where factual accuracy and linguistic nuance are paramount. Large Language Models (LLMs) often rely on extensive corpora for training, yet many dialects and underrepresented languages lack substantial textual resources. This scarcity can amplify hallucination, where models generate statements devoid of factual grounding, leading to misinformation and undermining user trust. Recent advancements in neural architectures provide partial solutions through language transfer and specialized fine-tuning, but limitations persist when data exhibits inconsistent spellings, code-switching, and limited availability of authoritative references. The resulting outputs may include fabricated entities, incorrect translations, or contextually discordant facts. These hallucinations pose risks in settings such as healthcare, legal documentation, and governmental communication. Empirical findings indicate that increasing the size of training data and leveraging cross-lingual transfer techniques can mitigate certain errors, though no single strategy fully eradicates hallucination. The surge in real-world deployments of LLMs amplifies ethical concerns over content authenticity, fairness in resource allocation, and long-term user reliance on automated systems. Future research directions highlight the importance of balanced corpora, robust evaluation metrics, and lexicon-based validation strategies to enhance reliability in low-resource contexts. Methods for systematic error analysis, domain adaptation, and targeted oversight offer promising steps toward higher-fidelity multilingual generation.