NaijaSynCor: A Corpus-based Macro-Syntactic Study of Naija (Nigerian Pidgin)

NaijaSynCor (A Corpus-based Macro-Syntactic Study of Naija, aka Nigerian Pidgin) takes an exhaustive and in-depth look at the structure of Naija (Nigerian Pidgin) in Nigeria today. Spoken by educated Nigerians, it has been proved to develop in Lagos as a discrete language, separate from Nigerian English. This study proposes to assess whether this holds true for the rest of Nigeria where Naija is spoken by over 75 million speakers. It examines diachronic, diatopic, diaphasic, diastratic, and genre variation.

The project is a collaborative effort of two Nigerian leading experts on Naija (F. Egbokhare & C. Ofulue) and two research units that have proved their expertise in corpus annotation in previous programmes: Llacan, on lesser-described languages; Modyco, on the interaction of prosody and syntax in French and the development of large treebanks, and. The macrosyntactic framework developed in the ANR Rhapsodie project (Lacheret, Pietrandrea & Tchobanov 2014) has proved to be particularly efficient in dealing with the specificities of oral corpora, e.g. piles stacking, disfluencies, repetitions, discourse markers, overlaps, co-enunciation, false starts, self-repairs and truncations. This method is data-driven, inductive (the relevant units are identified through annotation) and modular.

The tools developed by the research team in these previous corpus study programs are robust and mature enough to focus on the linguistic problem posed by Naija: in its geographical and functional expansion, does Naija maintain its status as a discrete language, separate from Nigerian English, or does it undergo decreolization? While answering this question, the research programme aims at overcoming two remaining technological challenges, (i) automatic identification of illocutionary units based on intonation data as a parameter; (ii) building a parser integrating intonation data as a parameter.

Through the creation of a deeply annotated 500 Kw corpus, the project documents the emergence of Naija as a language at the national level, challenging existing theories of the development of creoles and languages in contact. Capitalizing on the latest developments in the area of corpus annotation, this innovative approach to the dynamics of contact and change in the areas of human behaviour and sociology of language will powerfully impact the methodology and technology of research on emerging languages.

Starting: February 1st, 2017
: 42 months

Principal Investigator (PI): Bernard CARON, Senior Research Fellow - This email address is being protected from spambots. You need JavaScript enabled to view it.

Partner #1: LLACAN, UMR 8135, Langages, Langues et Cultures d’Afrique Noire (Inalco – CNRS)
Partner #2: MODYCO, UMR 7114, Modèles, Dynamiques, Corpus (Université Paris -Ouest Nanterre La Défense – CNRS)

