Ellipsis Errors in Student Writing : A Language Resource for an Automatic Detection and Correction Tool - Université Paris Nanterre
Communication Dans Un Congrès Année : 2019

Ellipsis Errors in Student Writing : A Language Resource for an Automatic Detection and Correction Tool

Laura Noreskal

Résumé

Our study is part of a larger social science research project with educational purpose on student writing: écri+. The aim of our research is to set up an assessment, courses and certification system to improve the written expression and comprehension of French students. Our contribution in this project is to develop automatic tools for the detection and the correction of errors, among them errors in Ellipsis constructions. At this early stage, the issue lies in how we build our corpus in order to choose the most appropriate Natural Language Processing (NLP) method for detecting and correcting faulty Ellipsis in these specific writings. While previous research on Ellipsis detection and resolution (among others Nielsen, 2005; Bos & Spenader, 2011; Gandón-Chapela, 2017) have applied on English annotated data using the British National Corpus (BNC) and the Wall Street Journal (WSJ), none of them addressed a French dataset. Moreover, actually no existing corpus corresponds to what we are dealing with, namely errors in student writing. Thereby, our first task is now to constitute a language resource on Ellipsis errors, not only to test and find the adequate NLP treatment but also to find out which type of Ellipsis is subject to errors. We have collected currently 164 errors in Ellipsis constructions in different student writing (exams, homework, internship reports…). At the end of this first step, we aim to constitute a corpus of about 250-300 faulty Ellipsis constructions, which we consider to be a linguistically representative size, enabling us to test symbolic and machine learning methods such as Support Vector Machine (SVM) or artificial neural network. Our goal is thus to present the key steps in the constitution of our corpus by explaining the methodology adopted and some first analyzes that we have done

Domaines

Linguistique
Fichier non déposé

Dates et versions

hal-04064234 , version 1 (11-04-2023)

Identifiants

  • HAL Id : hal-04064234 , version 1

Citer

Laura Noreskal. Ellipsis Errors in Student Writing : A Language Resource for an Automatic Detection and Correction Tool. CLARIN Annual Conference 2019, Sep 2019, Leipzig (DE), Germany. ⟨hal-04064234⟩
27 Consultations
0 Téléchargements

Partager

More