Annotation guidelines of UD and SUD treebanks for spoken corpora - Université Paris Nanterre
Book Sections Year : 2021

Annotation guidelines of UD and SUD treebanks for spoken corpora

Abstract

This paper presents practical and theoretical guidelines for the development of treebanks for spoken languages in the UD and SUD annotation schemes. We discuss text-sound alignment, segmentation into "sentences", use of "punctuation", paradigmatic lists, disfluencies, and paratactic constructions. This proposal is based on the development of (Surface-Syntactic) Universal Dependencies treebanks for spoken French, Naija, and Beja.
Fichier principal
Vignette du fichier
Kahane_Annotation_Guidelines_2021.pdf (974.84 Ko) Télécharger le fichier
Origin Files produced by the author(s)

Dates and versions

hal-03839772 , version 1 (04-11-2022)

Licence

Identifiers

  • HAL Id : hal-03839772 , version 1

Cite

Sylvain Kahane, Bernard Caron, Emmett Strickland, Kim Gerdes. Annotation guidelines of UD and SUD treebanks for spoken corpora. Daniel Dakota, Kilian Evang, Sandra Kübler. Proceedings of the 20th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2021), Association for Computational Linguistics, pp. 35-47, 2021. ⟨hal-03839772⟩
101 View
102 Download

Share

More