Annotation guidelines of UD and SUD treebanks for spoken corpora - Université Paris Nanterre Access content directly
Book Sections Year : 2021

Annotation guidelines of UD and SUD treebanks for spoken corpora

Abstract

This paper presents practical and theoretical guidelines for the development of treebanks for spoken languages in the UD and SUD annotation schemes. We discuss text-sound alignment, segmentation into "sentences", use of "punctuation", paradigmatic lists, disfluencies, and paratactic constructions. This proposal is based on the development of (Surface-Syntactic) Universal Dependencies treebanks for spoken French, Naija, and Beja.
Fichier principal
Vignette du fichier
Kahane_Annotation_Guidelines_2021.pdf (974.84 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-03839772 , version 1 (04-11-2022)

Licence

Attribution

Identifiers

  • HAL Id : hal-03839772 , version 1

Cite

Sylvain Kahane, Bernard Caron, Emmett Strickland, Kim Gerdes. Annotation guidelines of UD and SUD treebanks for spoken corpora: a proposal. Daniel Dakota, Kilian Evang, Sandra Kübler. Proceedings of the 20th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2021), Association for Computational Linguistics, pp. 35-47, 2021. ⟨hal-03839772⟩
88 View
74 Download

Share

Gmail Facebook X LinkedIn More