In this project, we will analyze the dataset which contains 492 frauds out of 284,807 transactions from Kaggle (www.kaggle.com/mlg-ulb/creditcardfraud/data). The transactions were made by european credit card holders in September 2013. Our objective of this project is to fit the dataset into our machine learning models to predict precisely while dealing with the highly unbalanced issue of this dataset. Since there are 28 variables which are the result of a principle component analysis (PCA) transformation and the information of the variables was not given, we will drop the variables which have similar distributions. Our next step is to deal with the unbalanced issue. We will use the synthetic minority over-sampling technique (SMOTE) to resample the dataset to make the numbers of frauds and normal transactions even. The last step is to compare the machine learning methods and we found that Xgboost returned the highest AUC score.
-
Notifications
You must be signed in to change notification settings - Fork 14
ireneliu521/Credit-Card-Fraud_J2D_Project_Python
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Apply 7 common Machine Learning Algorithms to detect fraud, while dealing with imbalanced dataset
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published