Text Classification Sample Project

Overview

This project demonstrates a simple binary text classification task using a sample dataset. It includes scripts to generate sample data (very short recipes and job descriptions) and to train a binary classification model on this data.

Scripts

Data Generation (gen_test_data.py): This script contains functions to generate sample data, creating two types of short texts: recipes and job descriptions. These texts are then combined into a pandas DataFrame with corresponding labels.
Model Training and Prediction (split_train_test_sample.py): This script trains a binary classification model using the Hugging Face Transformers library. It handles splitting the data, tokenizing it, training the model, evaluating its performance, and predicting classes for new texts.

How to Use

Data Generation:
- Run gen_test_data.py to create a DataFrame of sample texts and their labels, which is shuffled and ready for model training.
Model Training:
- Configure the model checkpoint and training arguments in split_train_test_sample.py.
- Run the script to train the model. It splits the data, tokenizes the texts, and trains the classification model.
- It also evaluates the model on the test set and outputs a confusion matrix and accuracy score.
- Predictions and softmax probabilities are saved to a CSV file.
Prediction:
- Use the predict_text function to predict the class of new text samples with the trained model.

Dependencies

Install the required packages from the requirements.txt file:

pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
code		code
results		results
.gitignore		.gitignore
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Classification Sample Project

Overview

Scripts

How to Use

Dependencies

About

Releases

Packages

Languages

abigailhaddad/bert_classification

Folders and files

Latest commit

History

Repository files navigation

Text Classification Sample Project

Overview

Scripts

How to Use

Dependencies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages