Skip to content

Latest commit

 

History

History
145 lines (82 loc) · 6.89 KB

README.md

File metadata and controls

145 lines (82 loc) · 6.89 KB

Board Game User Rating Predictor

  • contributors: Marian Agyby, Ashwin Babu, Vikram Grewal, Eric Tsai
  • URL of the project repo is here

About

In this project we aim to answer the following question: Given certain characteristics about a new board game, how would users rate the board game? Answering this question will help board game creators understand which characteristics enhance user enjoyment and improve their developing capabilities towards a successful game, minimizing their R&D time and developing a popular new board game.

To answer this question, we are using a large data set containing user ratings and reviews for thousands of board games, created by BoardGameGeek and made available by tidytuesday, which can be found here. The data consists of two data sets, one containing the user ratings, and the other containing information about the board games, including names and descriptions, as well as several characteristics such as playing time, minimum age, number of players, etc. We have merged the two data sets and built multiple regression models that predict the average user rating based on various features.

Analysis

First we split the data into 50% training set and 50% test set (because of the time it takes to train the model), then performed exploratory data analysis on the training set to assess which features are the most appropriate to train the model. A distribution of the average rating target variable is displayed as a histogram, and was used to assess whether the data is imbalanced or skewed. Distributions of the numeric features are also displayed as histograms to show the most common numeric feature values.

Since the target we are trying to predict is continuous, and the features include a mixture of categorical and continuous variables, we tested out a few predictive regression models and assessed their performance, then selected the one that performs with the highest accuracy as the final model. We also use a randomized search to cross-validate and optimize the models' hyperparameter values. Once the final model was selected and fitted to the entire training set, we used it to predict average user ratings on the test set, measured the accuracy of the model, and reported the model's performance results in a table.

The exploratory data analysis report can be found here.

Report

The final report of the project can be found here.

Usage

Note: replicating this analysis takes some time (about 30 minutes) to run.

Using Docker

To replicate the analysis, install Docker, clone this repository, then run the following commands in a Unix shell (terminal) from the root directory of this project:

  1. Pull the docker image from DockerHub:

     docker pull axb2860/boardgame_rating_predictor
    
  2. Run the docker image:

    a. If you are using a Windows operating system:

     docker run --rm -v  "/$(pwd)://home" axb2860/boardgame_rating_predictor make -C //home all
    

    b. If you are using a Mac M1 or M2 operating system:

     docker run  --rm  -v  “$PWD”:/home axb2860/boardgame_rating_predictor make -C /home all
    

    Note: you can add --platform linux/amd64 as an optional flag to the command above

  3. To delete the files and figures created from the analysis and return the repository to a clean state, run the following:

    a. If you are using a Windows operating system:

     docker run --rm -v  "/$(pwd)://home" axb2860/boardgame_rating_predictor make -C //home clean
    

    b. If you are using a Mac M1 or M2 operating system:

     docker run  --rm -v  “$PWD”:/home axb2860/boardgame_rating_predictor make -C /home clean
    

    Note: you can add --platform linux/amd64 as an optional flag to the command above

Without Using Docker

To replicate this analysis without docker, follow these instructions and run the corresponding commands in a Unix shell (terminal).

  1. Download the dependency file from the .yaml file

  2. Create and activate the environment

conda env create -f envboard.yaml

conda activate envboard
  1. Clone the repository from:

    https://github.com/UBC-MDS/boardgame_rating_predictor.git

  2. Move to the root directory of this project

cd boardgame_rating_predictor
  1. To replicate the analysis in its entirety, you can run the following command.
make all
  1. To delete the files and figures created from the analysis and return the repository to a clean state, run the following.
make clean
  1. To just download the raw data files, use the following commands:
python src/download_data.py --url="https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-25/ratings.csv" --out_file="data/raw/ratings.csv"

python src/download_data.py --url="https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-25/details.csv" --out_file="data/raw/details.csv"
  1. For a more in-depth look at the exploratory data analysis, see link or run the file on any IDE.

  2. If you want to check the model performance comparison click here

Dependencies

  • Python 3.10.6 and Python packages:

    • docopt-ng==0.8.1
    • requests 2.27.1
    • numpy==1.23.4
    • pandas==1.4.4
    • altair==4.2.0
    • altair_saver
    • scikit-learn==1.1.3
    • ipykernel
    • matplotlib>=3.2.2
    • requests>=2.24.0
    • graphviz
    • python-graphviz
    • eli5
    • shap
    • jinja2
    • selenium<4.3.0
    • imbalanced-learn
    • pip
    • lightgbm
    • vl_convert
  • R 4.2.1 and R packages:

    • tidyverse==1.3.2
    • knitr==1.40
    • kableExtra==1.3.4

Makefile dependency diagram

Note: click on the image below to view an enlarged version.

License

All Board Game User Rating Predictor materials are licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) License and the MIT License.

References

BoardGameGeek, LLC. 2022. "Board Games". Retrieved November 16, 2022 from github.com/rfordatascience/tidytuesday/tree/master/data/2022/2022-01-25.