Sull’utilità e i problemi di un’edizione lemmatizzata. Un caso esempio offerto da Roman de Horn, King Horn e Horn Childe and Maiden Rimnild
The present paper deals with methods employed to create a lemmatized edition of all the witnesses of Roman de Horn, King Horn and Horn Childe and Maiden Rimnild. These are three closely related versions of the medieval story of Horn: while the first is in Insular French, thesecond and the third are in Middle English. The lemmatized edition of these three versions is part of a doctoral project for a qualitative and quantitative analysis of their style. The final goal of the lemmatized edition is to provide a searchable database for stylistic investigations. First, an introduction to the three versions is offered. Then, the reasons for a digital approach are explained. After that, the paper discusses the problems of the lemmatization, and the solutions adopted. The resulting protocol is explained, combining the need for a qualitative and controlled lemmatization with the practical need for a partially automatized pipeline. A program was written employing the Natural Language Toolkit module of Python to edit and lemmatize each witness. A private MySQL database was developed, containing all the lemmas from the Middle English Dictionary and the Anglo-Norman Dictionary: the database hastened the process of lemmatization. Additionally, a set of corpora of Insular French texts and Middle English texts was realized, and these corpora provided a control group for the stylistic analysis. The creation of this control group required a distinct, automatized protocol of lemmatization, which resulted in a specific kind of digital output. The paper ends with a brief example of stylistic investigation employing the corpus of lemmatized texts.
