The main objective of the project was to predict customer sentiment based on drug reviews and to identify ambiguous reviews to better serve drug manufacturers and new customers. The dataset for the project was collected from Kaggle 2018 University Club Hackathon and consisted of customer provided ratings for a drug and its review. To strike the right balance between the vocabulary size and the model accuracy, we used a custom stop words list along with various parameters in the vectorization process. We conducted a comparative study of various supervised machine learning classifiers such as SVM and Naive Bayes models to better predict the sentiment of a customer. Based on the evaluation parameters such as Precision, Recall, F-score, Accuracy, and extreme misclassification errors, we concluded that LinearSVC classifiers performed better than Naive Bayes models for predicting sentiment on the given dataset. We hypothesized that the number of conjunctions used in a review is directly proportional to the ambiguity of a review. Therefore, to identify the ambiguous reviews, we used a combination of misclassification errors of LinearSVC with a high number of conjunctions.
-
Notifications
You must be signed in to change notification settings - Fork 0
pratt-datar/Sentiment-ambiguity
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Predicting patient drug review sentiment and identifying ambiguous reviews
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published