Natural Language Processing of a Low Resource Language (Igbo, an African Language)

Project Information

Project Status: Complete
Project Region: CAREERS
Submitted By: Stanley Nwoji
Project Email: SNwoji@harrisburgu.edu
Anchor Institution: CR-Penn State

Mentors: Iheb Abdellatif
Students: Atajan Abdyyev

Project Description

Though there are only 20 languages that fall into the high-resource category, most natural language processing (NLP) advancements have been accomplished in these 20 languages, excluding thousands of the low-resource languages spoken by millions of people in the world. It's not only a technological problem; equity is also in danger. This study seeks to fill this gap. The lack of low-resource language corpora and other linguistic resources is one of the causes of this knowledge gap. We must create a corpus of the African Igbo language to solve this problem. We will employ NLP machine and deep learning techniques to analyze the corpus. The outcome of this project could be applications like text categorization, information extraction, summarization, dialogue systems, and machine translation in the Igbo language. Currently, we have started building the Igbo_News corpus with Sketch Engine.

Additional Resources

Launch Presentation:

NLP Of Low Resource Language.pdf (419.76 KB)

Wrap Presentation: 6