A fun little NLP project where I train different classifiers to determine whether a given text conforms to the Eastern Armenian or Western Armenian dialect. For training/testing data, random articles for both dialects were collected from Wikipedia using the Wikipedia API. Character-level
git clone https://github.com/takavor/Armenian-Dialect-Detector.git
cd Armenian-Dialect-Detector
pip install -r requirements.txt
Model | Accuracy | F1 | Precision | Recall |
---|---|---|---|---|
Logistic Regression | 0.979592 | 0.984615 | 0.969697 | 1.00000 |
Naive Bayes | 0.918367 | 0.935484 | 0.939394 | 0.96875 |
Decision Tree | 0.938776 | 0.953846 | 0.939394 | 0.96875 |
SVM | 0.979592 | 0.984615 | 0.969697 | 1.00000 |
MLP | 0.979592 | 0.984615 | 0.969697 | 1.00000 |
Model | Accuracy | F1 | Precision | Recall |
---|---|---|---|---|
Logistic Regression | 0.979592 | 0.984615 | 0.969697 | 1.00000 |
Naive Bayes | 0.979592 | 0.984615 | 0.969697 | 1.00000 |
Decision Tree | 0.938776 | 0.953846 | 0.939394 | 0.96875 |
SVM | 0.979592 | 0.984615 | 0.969697 | 1.00000 |
MLP | 0.979592 | 0.984615 | 0.969697 | 1.00000 |
Model | Accuracy | F1 | Precision | Recall |
---|---|---|---|---|
Logistic Regression | 0.959184 | 0.969697 | 0.941176 | 1.00000 |
Naive Bayes | 0.979592 | 0.984615 | 0.969697 | 1.00000 |
Decision Tree | 0.959184 | 0.968750 | 0.968750 | 0.96875 |
SVM | 0.959184 | 0.969697 | 0.941176 | 1.00000 |
MLP | 0.979592 | 0.984615 | 0.969697 | 1.00000 |
In my opinion, character-level
It would be interesting to explore this problem using Eastern Armenian sentences which are in pre-reform (classical) orthography like Western Armenian in hopes of capturing grammatical differences.