The objective of this project is to perform sentiment analysis on movie reviews using a Long Short-Term Memory (LSTM)-based model. The model classifies reviews as positive or negative based on their text content.
To get started, you'll need to install the necessary libraries.
pip install tensorflow nltk tensorflow-datasets
In this project, we'll be using an LSTM (Long Short-Term Memory) model for sentiment analysis on the IMDb movie reviews dataset. The main goal is to classify movie reviews as either positive or negative based on the text content of the reviews.
The dataset used in this project is the IMDb movie reviews dataset which consists of movie reviews and their corresponding sentiment labels. The dataset is available through TensorFlow Datasets (tensorflow-datasets
), which can be easily loaded using tfds.load()
.
The dataset is split into training and testing sets:
- Training Set: Contains a collection of movie reviews for training the model.
- Testing Set: Used to evaluate the performance of the model.
- Data Loading: Load the IMDb dataset using TensorFlow Datasets (
tensorflow_datasets
). - Data Preprocessing: Tokenize the text data, remove stopwords, and pad sequences to ensure uniform input length for the LSTM model.
- Model Building: Build a sequential model with an embedding layer, an LSTM layer, and dense output layers.
- Model Training: Train the model using the training dataset.
- Evaluation: Evaluate the model on the test dataset to check its performance.
The architecture of the LSTM model is as follows:
- Embedding Layer: This layer converts words into dense vectors of fixed size.
- LSTM Layer: A Long Short-Term Memory (LSTM) layer is used to process the sequences and capture long-term dependencies.
- Dense Layer: The output layer with a sigmoid activation function is used for binary classification (positive or negative sentiment).
The model will output the classification of movie reviews as either positive or negative based on the sentiment expressed in the text. Accuracy is used to evaluate the performance of the model.
-
Clone the repository:
git clone https://github.com/your-username/sentiment-analysis-lstm.git cd sentiment-analysis-lstm
-
Install dependencies:
pip install -r requirements.txt
-
Run the training script:
python train.py
This will train the LSTM model on the IMDb dataset.
-
Evaluate the model on the test set:
python evaluate.py
This will give the performance metrics such as accuracy on the test set.
This project is licensed under the MIT License - see the LICENSE file for details.