Skip to content

Latest commit

 

History

History
99 lines (83 loc) · 6 KB

readme.md

File metadata and controls

99 lines (83 loc) · 6 KB

Points Regression - Top 100 NHL Players 2023/24 Season

This example highlights a simple example using the NHL API, Scikit-Learn, and Python. Using the top 100 NHL players by points (as of Jan 6, 2023) and Scikit-Learn I created 3 regression models (Linear, Random Forest, and Gradient Boost) to predict player point totals.

The models are trained on data from the past 5 seasons (not including the current season = 2023/2024). This allows us to:

  • test the models using the current season stats (as of Jan 6, 2023)
  • extrapolate current season stats to 82 games and predict end of the season point totals

To visualize the data I created a graph and datatable using Plotly.

Visualizer

Live Example

See it live and try the interactive visualizer at: https://bloodlinealpha.com/nhl/points-prediction/

Run Locally

1.) Create and Activate the Virtual Environment

  • Open a terminal and navigate to the points directory:
    cd .\examples\points\
  • Create the virtual environment
    python -m venv env or python3 -m venv env
  • Activate the virtual environment
    cd .\env\Scripts\activate
  • You should see a (env) in yout terminal

2.) Install the Packages

  • Navigate back to the root directory
        cd ..
        cd ..
  • Install the pip packages from the requirements.txt
    pip install -r requirements.txt

3.) Run the visualizer

  • Ensure you are in the points folder
    python visualizer.py or python3 visualizer.py 
  • Open your browser and navigate to: http://127.0.0.1:8050/

Steps Taken

1.) I created the Top-100-NHL-20232024-Jan-6-2023.json using the below link...

2.) Created and Ran init.py which:

3.) Created and ran model.py, which creates a:

  • Default model - default settings
  • Tuned model - tuned parameters using GridSearchCV
    • It finds the best combination of hyperparameters for a machine learning model.

For each model I:

  • Imported the Top-100-NHL-5-year-Stats.csv
  • Dropped columns that are not needed and those that cause data leakage
  • One-hot encodes the positionCode
  • Split the data into train and test sets
  • Normalized the data based on max and min values
  • Created 3 prediction models (default and tuned for each):
    • 1.) Linear Regression Model & Linear Regression Model - Tuned
    • 2.) Random Forest Regressor Model & Random Forest Regressor Model - Tuned
    • 3.) Gradient Boosting Regressor Model & Gradient Boosting Regressor Model - Tuned
  • Ran the test set, which prints and saves the metrics for each model to the console and console_tuned folder
  • Saved the prediction model and normalization values for each model to a .joblib file in the respective output folder (allows it to be used in the future without retraining)

4.) Created and ran test.py which:

  • Takes in the Top-100-NHL-20232024-Jan-6-2023.csv
  • One-hot encodes the positionCode
  • Loops through each row (Player's stats)
  • Applies the same data manipulation as the model training (normalization, encoding)
  • Combines all players stats and saves player_stats.xlsx to output
  • Predicts the points for each player, combines all players and saves results_current.xlsx to output and results_current_tuned.xlsx to output_tuned
  • Extrapolates the necessary data to a full 82 game season
  • Combines all players and saves player_stats_extrapolated.xlsx to output
  • Predicts the points for each player, combines all players and saves results_extrapolated.xlsx to output and results_extrapolated_tuned.xlsx to output_tuned

5.) Created and ran visualizer.py which:

  • Uses Plotly to create a simple graph and data table
  • Uses the output and output_tuned excel files directly to load the results data.
  • Steps:
    • 1.) Select a player from the drop down list
    • 2.) Select a model (default or tuned)
    • 3.) Toggle the lines from the legend
    • 4.) Click directly on the graph to view predictions for each line
    • 5.) Reset The legend if necessary
    • 6.) Scroll down to see the tabular data for each line on the graph

NHL API

1.) API Calls