Report : Masterclass “Crowd-sourcing Web Corpora of Nigerian Languages”
2019 first edition of the Annual Arcis NLP summer school
Taking advantage of the presence of the NaijaSynCor specialists of Natural Language Processing (NLP) who attended the June 2019 Naija Symposium in Ibadan, the Africa Regional Centre for Information Science (ARCIS), University of Ibadan, Nigeria, organised with IFRA the first edition of its Annual ARCIS NLP Summer School, on 1-3 July, 2019.
The cohort of participants for this first edition of the ARCIS NLP masterclass
Dr Slavomír Čeplö (Austrian Academy of Science, Vienna) gave two lectures on the state of the art in NLP of European and non-European languages. Prof. Mutawakilu Adisa Tiamiyu (ARCIS) gave a lecture on NLP tools and the development of African languages. Dr Kim Gerdes (Sorbonne-Nouvelle, Paris) demonstrated and taught the latest version of his NLP tool called Gromoteur. Gromoteur is a tool for linguists that gives easy access to textual corpora. It allows to get pages from the Web or from local files, treat them, analyze them, and output results. Gromoteur can look through the data, sort it, filter it, apply simple tools like lemmatizers, taggers, and word segmentation for different languages. Gromoteur includes the Nexico tool, a simplified version of Lexico3. It can compute the specific terms of any selection of pages and it can compute textual co-occurrences based on a fast implementation of the cumulative hypergeometric distribution. Tables and images can be exported.
The twelve registered students had been selected on a research project involving crowd-sourcing Nigerian languages data on the internet. These projects were tested by the students during extensive hands-on sessions. The projects submitted concerned Nigerian English, Nigerian Pidgin and other vernacular languages such as Yoruba, Igbo, etc. Unregistered students from the University of Ibadan were allowed to attend the NLP Summer School and benefit from the lectures.
Čeplö and K. Gerdes were assisted by Prof. Sylvain Kahane (Paris-Nanterre) and Marine Courtin (Sorbonne-Nouvelle).
The supervising team for the NLP masterclass