A visual approach for text analysis using multiword topics - Université Paris Nanterre Accéder directement au contenu
Communication Dans Un Congrès Année : 2017

A visual approach for text analysis using multiword topics

Résumé

Topics in a text corpus include features and information; visualizing these topics can improve a user's understanding of the corpus. Topics can be broadly divided into two categories: those whose meaning can be described in one word and those whose meaning in expressed through a combination of words. The latter type can be described as multiword expressions and consists of a combination of different words. However, analysis of multiword topics requires systematic analysis to extract accurate topic results. Therefore, we propose a visual system that accurate extracts topic results with multiple word combinations. For this study, we utilize the text of 957 speeches from 43 U.S. presidents (from George Washington to Barack Obama) as corpus data. Our visual system is divided into two parts: First, our system refines the database by topic, including multiword topics. Through data processing, we systematically analyze the accurate extraction of multiword topics. In the second part, users can confirm the details of this result with a word cloud and simultaneously verify the result with the raw corpus. These two parts are synchronized and the desired value of N in the N-gram model, topics, and presidents examined can be altered. In this case study of U.S. presidential speech data, we verify the effectiveness and usability of our system.
Fichier principal
Vignette du fichier
A visual approach for text analysis using multiword topics.pdf (861.91 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

halshs-01590990 , version 1 (20-09-2017)

Identifiants

Citer

Seongmin Mun, Guillaume Desagulier, Kyungwon Lee. A visual approach for text analysis using multiword topics. EuroVis 2017, Eurographics, Jun 2017, Barcelona, Spain. ⟨10.2312/eurp.20171168⟩. ⟨halshs-01590990⟩
138 Consultations
109 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More