Segmentation in macrosyntactic units across different interaction types. A quantitative study - Université Paris Nanterre Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Segmentation in macrosyntactic units across different interaction types. A quantitative study

Segmentation en unités macrosyntaxiques dans différents types d'interaction. Une étude quantitative

Résumé

Our communication takes place in the context of the French-German project SegCor (Segmentation of Oral Corpora, ANR-15-FRAL-0004), focusing on the segmentation of oral corpora. The general aim is the development of a method of segmentation for oral corpora that is adequate for the analyses of interactional data at different levels and for various communities of researchers. The French and German datasets consist of ten excerpts of ten minutes each for each language[3], which represent the overall data diversity in terms of situation types. The following recorded interactions have been studied: radio talks, meal preparations, reading activities with a child, service encounters, telephone calls, table talks, social meetings, school lessons and panel discussions. In our paper, we will address the relationship between these interaction types and segmentation in maximal units. More particularly, the focus will be on the composition of this kind of units for the French corpus. Several models have been proposed in previous researches and have been discussed within the SegCor project: part-of-speech tagging and chunking processes via automatic annotation (Eshkol-Taravella et al. 2014); a syntactic annotation relying on a dependency parser (Kahane et al. 2017); a macrosyntactic segmentation in illocutionary units (Benzitoun et al. 2010; Lacheret et al. 2014); the annotation of prosodic prominences and disfluencies leading to the segmentation of intonational periods (Lacheret et al. 2014); the annotation of Turn-Constructional Units (TCUs), i.e. the minimal, emergent and negotiable units through which participants build turns of talk in interaction (Sacks et al. 1974; Ochs et al. 1996; Traverso 2016). In this paper, we will focus on the segmentation of broad units, which is grounded on the macrosyntactic model (Blanche-Benveniste et al. 1990; Blanche-Benveniste 2010a, 2010b; Lacheret et al. 2014). We rely on the following maximal macrosyntactic units: Simple units, composed of one nucleus, which is defined as a minimal macrosyntactic component corresponding to an autonomous utterance, according to Blanche-Benveniste et al. (1990: 114); Complex units, composed of more than one nucleus (including pre-nuclei, post-nuclei and in-nuclei, i.e. sequences beyond government); Abandoned units, i.e. syntactically unfinished units. The segmentation has been realized on tokenized transcripts through the EXMARaLDA Partitur Editor[4]. Our main aim is to appreciate the relevance of tokens’ number per maximal unit in our representative corpora. Thus, we propose a quantitative study that is focused on token count per maximal unit in each situation type. For example, preliminary investigation has shown a higher rate of abandoned units when interactions are conflictual (e. g. panel discussion and radio talk), due to turn-taking specificities. Conversely, in expert talk, i.e. a conference realized by a speaker, abandoned units are very few because of the planned character of the talk. Relying on the composition of maximal segmentation units, our contribution discusses evidence from corpus segmentation and aims at investigating variation across different interaction types. Our approach is not in contrast to previous research in the field of corpus linguistics, see for example Biber’s multi-dimensional analyses of written and oral genres (Biber 1988) and conversational text types (Biber 2004) in English, which are based on a variety of linguistic features. This contribution offers complementary dimensions for a classification of interaction types, from a quantitative perspective. We will then explore the other segmentation levels annotated in the SegCor project on syntax, prosody and interaction to study if unit characterization depends on the type of interaction and if similar trends can be observed. Statistical analyses and graphing are performed using the R software platform.
Fichier non déposé

Dates et versions

hal-01927595 , version 1 (20-11-2018)

Identifiants

  • HAL Id : hal-01927595 , version 1

Citer

Biagio Ursi, Carole Etienne, Iris Eshkol-Taravella, Nathalie Rossi-Gensane, Luisa Acosta Córdoba, et al.. Segmentation in macrosyntactic units across different interaction types. A quantitative study. 50 years of corpus linguistics on oral corpora. Its contribution to the study of variation, Nov 2018, Orléans, France. ⟨hal-01927595⟩
139 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More