Towards Multilingual Interlinear Morphological Glossing - Traitement du Langage Parlé
Conference Papers Year : 2023

Towards Multilingual Interlinear Morphological Glossing

Shu Okabe
François Yvon

Abstract

Interlinear Morphological Glosses are annotations produced in the context of language documentation. Their goal is to identify morphs occurring in an L1 sentence and to explicit their function and meaning, with the further support of an associated translation in L2. We study here the task of automatic glossing, aiming to provide linguists with adequate tools to facilitate this process. Our formalisation of glossing uses a latent variable Conditional Random Field (CRF), which labels the L1 morphs while simultaneously aligning them to L2 words. In experiments with several under-resourced languages, we show that this approach is both effective and data-efficient and mitigates the problem of annotating unknown morphs. We also discuss various design choices regarding the alignment process and the selection of features. We finally demonstrate that it can benefit from multilingual (pre-)training, achieving results which outperform very strong baselines.
Fichier principal
Vignette du fichier
2023.findings-emnlp.396.pdf (354.39 Ko) Télécharger le fichier
Origin Publisher files allowed on an open archive
licence

Dates and versions

hal-04357157 , version 1 (21-12-2023)

Licence

Identifiers

Cite

Shu Okabe, François Yvon. Towards Multilingual Interlinear Morphological Glossing. 2023 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Dec 2023, Singapore, Singapore. pp.5958-5971, ⟨10.18653/v1/2023.findings-emnlp.396⟩. ⟨hal-04357157⟩
188 View
166 Download

Altmetric

Share

More