Machine Learning Model to Quantify the Accordance of Work Output and Objectives in a Complex Business Environment
This project was carried out during the last 4 months of my Bachelor's Degree in Physics as a final requirement to earn the degree at the University of Zaragoza, Spain. Here, the Bachelor's Thesis must be up to 25 pages long + appendix and it was graded Sobresaliente (Outstanding). The thesis was originally written and presented in Spanish in July 2021, but here I provide a complete translation into English along with the code.
The project is called Machine Learning Model to Quantify the Accordance of Work Output and Objectives in a Complex Business Environment. This is an end-to-end data science project carried out with Python and QGIS. I learned to tune and deploy some basic machine learning models after a thoughtful analysis and transformation of real raw data provided by a company. It was a quite instructive opportunity and it helped me lay the groundwork for further progress in data science and machine learning through a real hands-on project. In this sense, the project was a good oportunity to gain an in-depth understanding of the theoretical foundations, but also a great opportunity to face the main challenges often present in real projects.
I present a complete analysis of data provided by a real company related to the management of urban waste collection. The objective is the creation of a goodness metric to determine the quality of the work orders executed by waste collection vehicles.
First, I carry out an adequate selection and transformation of the data, as well as a study of the variables of greatest interest that can be constructed. Next, I proceed with the design of machine learning models that allow us to make predictions using classification and regression algorithms based on decision trees and neural networks. For each model, I evaluate its capacity with the relevant statistical techniques and I show the most important results for the understanding of the dataset, as well as the implications of the predictions.
The work with the collaborating company was very fruitful and I highly appreciated the advice on how real data are usually collected and managed. The datasets are private and belong to the company, so I decided not to share them publicly for confidentiality purposes.
From the context of the Bachelor's Degree in Physics, this project has been very useful as an application of programming knowledge, advanced statistical analysis and comparison of complex models. Moreover, given the relevance of artificial intelligence and data processing in today's world, this project has been a very advantageous hands-on implementation of knowledge that will undoubtedly be useful to me in the years to come.
The Spanish version can also be found in the university repository: https://zaguan.unizar.es/record/108881?ln=es
In March 2022, I was awarded the Santander Award for Digital Skills in Final Projects at the University of Zaragoza: https://catbs.unizar.es/articulos/premios-santander-a-competencias-digitales-en-trabajos-fin-de-estudios-de-la-universidad-de-zaragoza-2