Skip to content

Preprocessing and Analyzing NYC Yellow taxi passenger count for 2022 and 2023 using Isolation forest

Notifications You must be signed in to change notification settings

smilee3998/nyc_taxi_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Preprocessing and Analyzing NYC Yellow taxi passenger count for 2022 and 2023 using Isolation forest

Steps

  1. First, download the queried dataset as csv to the folder data/ from 2022dataset and 2023dataset
  2. install required packages pip install -r requirements.txt
  3. Run all code in preprocess.iypnb. It first group the number of passenger per hour and do data cleaning to remove problematic data in the original dataset. Then it locates the maximum number of passenger per hour in one day to further reduce the data size. Finally it saves the processed data as csv files.
  4. Run python anomoly_detection.py

Results

result

About

Preprocessing and Analyzing NYC Yellow taxi passenger count for 2022 and 2023 using Isolation forest

Topics

Resources

Stars

Watchers

Forks