Skip to content

Commit

Permalink
OLS missing data feature removal + docs update
Browse files Browse the repository at this point in the history
  • Loading branch information
teanijarv committed Mar 3, 2024
1 parent 55486b2 commit 07ea86b
Show file tree
Hide file tree
Showing 4 changed files with 54 additions and 35 deletions.
7 changes: 6 additions & 1 deletion HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,9 @@ History
0.2.0 (2024-03-2)
------------------

* Overall project restructuring for optimisation
* Overall project restructuring for optimisation

0.2.1 (2024-03-3)
------------------

* Option to modify the OLS parameters used in the HLR
9 changes: 2 additions & 7 deletions HLR/regression.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
class HierarchicalLinearRegression:
"""Class for performing hierarchical linear regression analysis."""

def __init__(self, df, ivs_dict, dv, missing_data=None, ols_params=None):
def __init__(self, df, ivs_dict, dv, ols_params=None):
"""Initializes the HierarchicalLinearRegression class.
Args:
Expand All @@ -44,11 +44,6 @@ def __init__(self, df, ivs_dict, dv, missing_data=None, ols_params=None):
raise ValueError(f"{dv} is not a column in the DataFrame.")
self.outcome_var = dv

if missing_data is not None:
if missing_data not in ['none', 'drop', 'raise']:
raise ValueError("missing_data must be either 'none', 'drop', or 'raise'.")
self.missing_data = missing_data if missing_data is not None else 'none'

if ols_params is not None:
if not isinstance(ols_params, dict):
raise ValueError("ols_params must be a dictionary.")
Expand All @@ -65,7 +60,7 @@ def fit_models(self):
X = self.data[predictors]
X_const = sm.add_constant(X)
y = self.data[self.outcome_var]
model = sm.OLS(y, X_const, missing=self.missing_data).fit(**self.ols_params)
model = sm.OLS(y, X_const).fit(**self.ols_params)
results[level] = model
return results

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,11 +150,11 @@ This program is provided with no warranty of any kind and it is still under deve

#### To-do
Would be great if someone with more experience with packages would contribute with testing and the whole deployment process. Also, if someone would want to write documentation, that would be amazing.
- docs
- dict valus within df hard to read
- ability to change OLS parameters
- add t stats for coefficients
- give option for output only some columns not all
- add regression type option (eg, for logistic regression)

#### Contributors
[Toomas Erik Anijärv](https://github.com/teanijarv)
Expand Down
71 changes: 45 additions & 26 deletions docs/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,38 +2,57 @@
Usage
=====

See GitHub repository for the example dataset and Jupyter Notebook.
Quick start guide to use hierarchical linear regression using HLR package.

To use HLR - Hierarchical Linear Regression in a project::
Initialising the HLR object
---------------------------

import pandas as pd
from HLR import HierarchicalLinearRegression
Let's first fetch some data and initiate the HLR object. We'll use the `penguins` dataset from `seaborn` for our example.

# Example dataframe which includes some columns which are also mentioned below
nba = pd.read_csv('example/NBA_train.csv')
.. code-block:: python
# Define the models for hierarchical regression including predictors for each model
X = {1: ['PTS'],
2: ['PTS', 'ORB'],
3: ['PTS', 'ORB', 'BLK']}
import seaborn as sns
import pandas as pd
# Define the outcome variable
y = 'W'
# Load the example penguins dataset
df = sns.load_dataset('penguins')
df.dropna(inplace=True)
df = df[['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']]
# Initiate the HLR object
hreg = HierarchicalLinearRegression(df, X, y)
Initialising the HLR object and generating summary report
---------------------------------------------------------

# Generate a summarised report as a dataframe which shows all linear regression models parameters and difference between the models
summary_report = hreg.summary()
display(summary_report)
.. code-block:: python
# Run diagnostics on all the models (displayed output below only shows the first model)
hreg.diagnostics(verbose=True)
from HLR import HierarchicalLinearRegression
# Different plots
hreg.plot_studentized_residuals_vs_fitted()
hreg.plot_qq_residuals()
hreg.plot_influence()
hreg.plot_std_residuals()
hreg.plot_histogram_std_residuals()
hreg.plot_partial_regression()
# Define the independent variables for each model level
ivs_dict = {
1: ['bill_length_mm'],
2: ['bill_length_mm', 'bill_depth_mm'],
3: ['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm']
}
# Define the dependent variable
dv = 'body_mass_g'
# Initialize the HierarchicalLinearRegression class
hlr = HierarchicalLinearRegression(df, ivs_dict, dv)
hlr.summary()
Run diagnostics for testing assumptions
---------------------------------------

.. code-block:: python
diagnostics_dict = hlr.diagnostics(verbose=True)
Plotting options for all model levels
-------------------------------------

.. code-block:: python
hlr.plot_studentized_residuals_vs_fitted()
hlr.plot_qq_residuals()
hlr.plot_influence()
hlr.plot_std_residuals()
hlr.plot_histogram_std_residuals()
hlr.plot_partial_regression()

0 comments on commit 07ea86b

Please sign in to comment.