TIB: A Dataset for Abstractive Summarization of Long Multimodal Videoconference Records - Traitement du Langage Parlé
Conference Papers Year : 2023

TIB: A Dataset for Abstractive Summarization of Long Multimodal Videoconference Records

Abstract

Large language models and multimodal language-vision models give impressive results on current available summarization benchmarks, but are not designed to handle long multimodal documents. Most summarization datasets are composed of either mono-modal documents or short multimodal documents. In order to develop models designed for understanding and summarizing real-world videoconference records that are typically around 1 hour long, we propose a dataset of 9,103 videoconference records extracted from the German National Library of Science and Technology (TIB) archive, along with their abstract. Additionally, we process the content using automatic tools in order to provide the transcripts and key frames. Finally, we present experiments for abstractive summarization, to serve as baseline for future research work in multimodal approaches.
Fichier principal
Vignette du fichier
tib_dataset_preprint_230728.pdf (2.68 Mo) Télécharger le fichier
Origin Files produced by the author(s)

Dates and versions

hal-04168911 , version 1 (28-07-2023)

Identifiers

Cite

Théo Gigant, Frédéric Dufaux, Camille Guinaudeau, Marc Decombas. TIB: A Dataset for Abstractive Summarization of Long Multimodal Videoconference Records. 20th International Conference on Content-based Multimedia Indexing (CBMI 2023), ACM, Sep 2023, Orléans, France. ⟨10.1145/3617233.3617238⟩. ⟨hal-04168911⟩
338 View
294 Download

Altmetric

Share

More