Annotation guidelines of UD and SUD treebanks for spoken corpora - Université Paris Nanterre
Chapitre D'ouvrage Année : 2021

Annotation guidelines of UD and SUD treebanks for spoken corpora

Résumé

This paper presents practical and theoretical guidelines for the development of treebanks for spoken languages in the UD and SUD annotation schemes. We discuss text-sound alignment, segmentation into "sentences", use of "punctuation", paradigmatic lists, disfluencies, and paratactic constructions. This proposal is based on the development of (Surface-Syntactic) Universal Dependencies treebanks for spoken French, Naija, and Beja.
Fichier principal
Vignette du fichier
Kahane_Annotation_Guidelines_2021.pdf (974.84 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03839772 , version 1 (04-11-2022)

Licence

Identifiants

  • HAL Id : hal-03839772 , version 1

Citer

Sylvain Kahane, Bernard Caron, Emmett Strickland, Kim Gerdes. Annotation guidelines of UD and SUD treebanks for spoken corpora. Daniel Dakota, Kilian Evang, Sandra Kübler. Proceedings of the 20th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2021), Association for Computational Linguistics, pp. 35-47, 2021. ⟨hal-03839772⟩
124 Consultations
117 Téléchargements

Partager

More