Irish verbs: mapping past, present and future



Theodorus Fransen

Posted: 12 July, 2018

Theodorus Fransen - wordcloud

We continue this month’s theme, Languages, with a piece in which Theodorus Fransen introduces us to the basics of historical Irish linguistics and Natural Language Processing. Theodorus is a PhD student in the School of Linguistic, Speech and Communication Sciences, Trinity College Dublin and is currently an Early Career Research Fellow in the Trinity Long Room Hub Arts & Humanities Research Institute.

Ireland can boast one of the earliest vernaculars in Western Europe, with its manuscript tradition going back to the fifth century A.D. Old Irish (c. 700–900 A.D.) is the earliest stage of the Irish language for which we have an extensive body of prose. Acquiring the intricacies of Old Irish grammar, especially the verbal system, often turns out to be a nightmare for the uninitiated university student and the initiated classicist alike. Moreover, Old Irish and the Irish spoken today look significantly different from each other, especially when we compare the way that verb forms are construed. Who would have guessed that lig ‘to let’ and teilg ‘to cast’ both go back to forms with the Old Irish root léic ‘to let’?

Our linguistic experiences have become increasingly digital in the last two decades. Google is our newly venerated and omnipresent portal to the unknown, and ‘Speak, and you will find’ the motto of modern society. Some of us may even believe in a ‘ghost in the machine’. Whatever may be the case, neither the machine nor the ghost understands medieval Irish. And that is a huge pity, given the enormous amount of texts in Irish-language manuscripts waiting to be transcribed, edited and translated (e.g. Medical Texts of Ireland 1350–1600). Moreover, for historical linguists like me, the lack of digital support is a huge impediment to the study of the evolution of the Irish language.

My research area can be located at the fascinating intersection of historical Irish linguistics and Natural Language Processing (NLP), the latter of which is concerned with ‘giving computers the ability to process human language’. NLP for historical languages is particularly challenging since old texts often show huge variation in style, grammar and orthography. As for Irish, we can’t fully rely on existing historical dictionaries and other resources, as these are either incomplete or concentrate on a certain time frame. Early Modern Irish (c. 1200–1650) is the most poorly covered period, resulting in a ‘lexicographical gap’ between Old and Modern Irish.

In order to bridge this lexicographical gap, we need computational linguistic resources for both the medieval and modern stages of the Irish language. For the latter, more precisely the period 1600–2000, state-of-the-art computational methods are currently being employed. For Old Irish, although well documented, such methods are lacking (but progress is being made on this front). That shouldn’t come as a surprise: trying to explain Old Irish to a machine (or the ghost, depending on your belief system) is easier said than done.

Old Irish is notorious for (what appears on the surface to be) unpredictable verb stem formation. The stem (or base) of a word is a fundamental building block in word formation, often equivalent to a dictionary entry. Here we encounter the first problem. Due to diverging accentual patterns, the majority of Old Irish verbs have two stems, if not more, which are not infrequently remarkably dissimilar; compare, for example, do-léic- and teilc- ‘to let go’ or ‘to cast’, which are mere stem variants (‘two sides of the same coin’).

The relationship between Old Irish and Modern Irish stems may be equally opaque; while Old Irish léic reasonably transparently maps to the Modern Irish stem lig ‘to let’, it also survives in the modern stem teilg ‘to cast’ (the latter being the modern equivalent of the Old Irish stems do-léic- / teilc- mentioned above). In other words, stems are of key importance in the context of automatically generating and analysing the wealth of inflected verb forms, as well as successfully connecting cognates (different words with common origin) along the chronological spectrum. So we need to think carefully about how to encode them computationally.

Theodorus Fransen - historical stem changes

Cracking the code of Old Irish verb stem formation is central to my thesis, and it is a challenging endeavour. I have therefore decided to focus on a restricted set of Old Irish verbs, and work towards a prototype digital resource that facilitates old-to-modern (and vice versa) mappings based on that restricted set. Hopefully I will be able to expand my project in the future, incorporating not only verbs but also nouns, adjectives, etc. A further research prospect is to deal with some of the currently ill-resourced intermediary historical stages of the language, thus slowly sealing the lexicographical gap between Old and Modern Irish.

Disclaimer: The opinions expressed in our guest blogs are the author’s own, and do not reflect the opinions of the Irish Research Council or any employee thereof.