The goal of the project is a seq2seq approach where the input is a source sentence (src) with grammatical errors and the output a target sentence (tgt) as a corrected version of the input. For each sentence s, the model receives input tokens xi of the sentence at hand and should predict output token(s) yi, where yi is a correction, if xi needs amelioration (e.g. correct spelling, additional prepositions, etc.). In the seq2seq approach, the decoder should eventually produce output vectors yi corresponding to the final hidden state representation of s where each xi was previously encoded. Therefore, this task can be described as a "many-to-many" sequence modeling problem (Kaparthy 2015).
This repository contains the necessary preprocessing scripts as well as an extensive Jupyter notebook for implementing a seq2seq model on Google Colab. Finally, it provides an updated script to calculate the General Language Evaluation Understanding (GLEU) score for model tests. The GLEU script is taken and adapted from: https://github.com/cnap/gec-ranking/tree/master?tab=readme-ov-file
Scripts were developed with Python 3.11. The following packages need to be
installed: matplotlib
, numpy
, scipy.stats
, torch
, torch.nn
,
torch.optim
, torchtext.vocab
, tqdm
Cambridge English Write & Improve (W&I) corpus created for the Building Educational Applications 2019 Shared Task: Grammatical Error Correction. Consider the following publication:
Helen Yannakoudakis, Øistein E. Andersen, Ardeshir Geranpayeh, Ted Briscoe and Diane Nicholls. 2018. Developing an automated writing placement system for ESL learners. Applied Measurement in Education, 31:3, pages 251-267.
The model used for the task is a PyTorch implementation of an LSTM by Ben Trevett.
The inference results from the model are saved to test/[model_name]/
where references.txt
correspond to the correct solutions, sources.txt
contains the sentences from writers, and targets.txt
contains the
solutions provided by the model.
To calculate GLEU for the given results, please run:
$ python3 compute_gleu3.py -r test/model_name/references.txt -s test/model_name/sources.txt -o test/model_name/targets.txt
This repository contains the results of the best model as of 05.07.2024
stored at test/
.
Overall, the LSTM performs inadequately on the task with the provided data. Adjustments to hyperparameters, including hidden units, dropout, teacher-forcing ratio, embedding dimensions for the vocabulary, and batch size, resulted in minor improvements in the outputs. Data augmentation was also attempted by doubling the training set using target sentences as additional pairs of correct sources. Again, this led to some improvements.
Training After training, the model exhibits high variance, as evidenced by the discrepancy in loss between the training and validation sets.
GLEU The GLEU score for the model predictions of the test items
provided in this repository is 0.036604
.
Example For a given, incorrect source sentence the model makes the following target prediction (where the reference represents the gold correction):
Source: There have been a number of cases in the last ten years of the top few boxers having tragic losses throughout their ranks .
Reference: There have been a number of cases in the last ten years of the top few boxers having tragic losses among their ranks .
Prediction: there have been a number of changes in the last ten years in the of of and and a a famous man .
Improvements To reduce high variance a few measures can be applied: (a) more training data, (b) better regularization, (c) reducing complexity of the model.
Interpretation The model fairs poorly on the task as many of the grammatical errors are very sparse and may only appear in the validation and test sets.
I want to thank Meng Li for the idea of this project.