Twitter Online Learning

Goal for this project is to classify twitter review sentiment with implementation of Online learning.

Online learning is the process of retraining the model as the data comes in streams of continuously generated data.

Dataset contains Twitter tweets records (Size - 1.6M unique records).

Coverage and Modules insights

Data processing extraction of useful data features
ETL Cleaning and Filtering of data with operations like removal of stop words, punctuations, urls, repeating phrases, encodings
Visualizations of data distributions, word clouds
Microservice Implementing - each service developed can be used as a module in external environment
NLP using techniques like Tokenization, Stemming and Lemmatization
Model comparator which compares multiple model stats and saves the best performing model
Model selector keeps checking for best performing model and selects the top model for production
Clock function which garbage collects the obsolete models and data files based on business rules
Model run history covers all previous best runs of every model

Tech Stack

TBU

Data Cleaning, Filtering & Manipulation - Regular expressions, pandas and numpy dataframes
Data Visualization - Plotly, Seaborn, Matplotlib, word cloud
Data Storage - local
Webapp - TBA

Running Instructions

Download the project and run the below requirements in the project folder terminal

pip install -r /path/to/requirements.txt

Task at Hand

Implement logging at a modular level
Exception Handling for data transformation and model selector
Enhance model training and history to parametric modules
Implement clock function to remove obsolete models/data
Create and load environment variable file

Illustration from Data

Data Stats

Positive Word Cloud

Negative Word Cloud

MODEL EVALUATION

Beroulli NB Model

Linear Model

Logistic Regression Model

Mutlinomial NB Model

XGBoost Model

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
config		config
logs		logs
model		model
src		src
static		static
README.md		README.md
logging.md		logging.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Online Learning

Coverage and Modules insights

Tech Stack

Running Instructions

Task at Hand

Illustration from Data

MODEL EVALUATION

About

Releases

Packages

Languages

Shubhammalik/tweet_tagging_model

Folders and files

Latest commit

History

Repository files navigation

Twitter Online Learning

Coverage and Modules insights

Tech Stack

Running Instructions

Task at Hand

Illustration from Data

MODEL EVALUATION

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages