-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #132 from hkirvesl/master
add regularizer
- Loading branch information
Showing
13 changed files
with
1,562 additions
and
32 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,360 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "760ffc6d", | ||
"metadata": {}, | ||
"source": [ | ||
"# Basic Tutorial : Regularization with Giotto-deep\n", | ||
"#### Author: Henry Kirveslahti\n", | ||
"\n", | ||
"This short tutorial introduces the regularization techniques integrated to *giotto-deep*. We briefly introduce the topic and show how one can fit a LASSO-regularized model with *giotto-deep*.\n", | ||
"\n", | ||
"This notebook is organized as follows:\n", | ||
"\n", | ||
"1. Introduction\n", | ||
"2. Data generation and an unregularized regression model\n", | ||
"3. Using Tihonov-type regularizers with *giotto-deep*\n", | ||
"4. Comparing the results from regularized and unregularized models\n", | ||
"5. Concluding remarks and further reading\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "447fc00e", | ||
"metadata": {}, | ||
"source": [ | ||
"## 1. Introduction\n", | ||
"\n", | ||
"Regularization is a powerful tool that aims to combat overfitting the data. If we want a model that is flexible enough to fit complex data, we need many parameters. In these cases it may be that there are multiple choices for the parameters to fit the data very well, and the the problem becomes ill-defined - there is no unique solution. When this is the case, we may introduce preference to a model that is in some way simpler, and we may do this with the help of a *regularizer*. This is the subject of this notebook.\n", | ||
"\n", | ||
"On a high level, without regularization, our loss $L$ is a function of the input data $X$ and the response variable y:\n", | ||
"$$\n", | ||
"L=L(X,y).\n", | ||
"$$\n", | ||
"In *giotto-deep*, these cases are handled by the usual Trainer class.\n", | ||
"\n", | ||
"However, when we want our loss to also depend on the model $M$ itself, that is,\n", | ||
"\n", | ||
"$$\n", | ||
"L=L(X,y,M),\n", | ||
"$$\n", | ||
"we need to use the RegularizedTrainer class.\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "df5bf19a", | ||
"metadata": {}, | ||
"source": [ | ||
"## 2. Example - Unregularized model\n", | ||
"In this section we generate some data and fit a standard regression model with *giotto-deep*. First we import some dependencies:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "df8c44ad", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%reload_ext autoreload\n", | ||
"%autoreload 2\n", | ||
"%matplotlib inline\n", | ||
"import numpy as np\n", | ||
"import torch\n", | ||
"import torch.nn as nn\n", | ||
"from torch.optim import SGD, Adam, RMSprop\n", | ||
"import matplotlib.pyplot as plt\n", | ||
"from torch.utils.data import TensorDataset, DataLoader\n", | ||
"from sklearn.model_selection import train_test_split\n", | ||
"from gdeep.trainer import Trainer\n", | ||
"from gdeep.trainer.regularizer import TihonovRegularizer\n", | ||
"from gdeep.search import GiottoSummaryWriter\n", | ||
"from gdeep.models import ModelExtractor\n", | ||
"from gdeep.utility import DEVICE\n", | ||
"writer = GiottoSummaryWriter()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "73aa84cb", | ||
"metadata": {}, | ||
"source": [ | ||
"### Data generation\n", | ||
"Next we generate data with sample size $S$. We have two covariates, $x_1$ and $x_2$, and a response variable $y$. The response variable $y$. More precisely :\n", | ||
"\n", | ||
"$$z_{0,i} \\sim^{\\textrm{i.i.d}} N(0,1);$$\n", | ||
"$$z_{1,i} = 0.9*z_{0,i} + 0.1*\\tau_{1,i};$$\n", | ||
"$$z_{2,i} = 0.85*z_{0,i} + 0.15*\\tau_{2,i};$$\n", | ||
"$$y_i=z_{0,i}+ \\epsilon_i;$$\n", | ||
"and\n", | ||
"$$\\tau_{j,i} \\sim^{\\textrm{i.i.d}} N(0,1), j \\in \\{1,2\\};$$\n", | ||
"$$\\epsilon \\sim^{\\textrm{i.i.d}},N(0,1),$$\n", | ||
"indenpedently of $\\tau$s.\n", | ||
"\n", | ||
"In other words, we have 2 covariates that both are corrupted versions of $z_0$ that is directly related to the response. The two covariates are then highly correlated, but $z_1$ has better signal-to-noise ratio than $z_2$. If we want to predict $y$, $z_2$ doesn't have any merits over $z_1$.\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "56bc8889", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"rng = np.random.default_rng()\n", | ||
"S=100\n", | ||
"z0=rng.standard_normal(S)\n", | ||
"z1=0.9*z0+0.1*rng.standard_normal(S)\n", | ||
"z2=0.85*z0+0.15*rng.standard_normal(S)\n", | ||
"y=z0+rng.standard_normal(S)\n", | ||
"X=np.stack([z1,z2],1)\n", | ||
"y=y.reshape(-1,1)\n", | ||
"y=y.astype(float)\n", | ||
"X=X.astype(float)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "d46f23d0", | ||
"metadata": {}, | ||
"source": [ | ||
"## Modeling\n", | ||
"Next we define our models. Our first model is an unregularized regression on 2 variables without intercept. Concretely, the underlying statistical model is:\n", | ||
"\n", | ||
"$$\n", | ||
"y_i = \\alpha_1 z_{1,i} + \\alpha_{2,i} z_{2,i} + \\epsilon_i.\n", | ||
"$$\n", | ||
"\n", | ||
"We train this simple model for 20 Epochs. Concretely:\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "23a5974c", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"train_x, val_x, train_y, val_y = train_test_split(X, y, test_size=0.1)\n", | ||
"tensor_x_t = torch.Tensor(train_x)\n", | ||
"tensor_x_t=tensor_x_t.float()\n", | ||
"tensor_y_t = torch.from_numpy(train_y)\n", | ||
"tensor_y_t=tensor_y_t.float()\n", | ||
"tensor_x_v = torch.Tensor(val_x)\n", | ||
"tensor_y_v = torch.from_numpy(val_y)\n", | ||
"train_dataset = TensorDataset(tensor_x_t,tensor_y_t)\n", | ||
"dl_tr = DataLoader(train_dataset,batch_size=10)\n", | ||
"val_dataset = TensorDataset(tensor_x_v,tensor_y_v)\n", | ||
"dl_val = DataLoader(val_dataset,batch_size=10)\n", | ||
"\n", | ||
"class Net(nn.Module):\n", | ||
" def __init__(self,featdim):\n", | ||
" super(Net, self).__init__() \n", | ||
" self.flatten = nn.Flatten()\n", | ||
" self.linear_relu_stack = nn.Sequential(\n", | ||
" nn.Linear(featdim, 1, bias=False),\n", | ||
" )\n", | ||
"\n", | ||
" def forward(self, x):\n", | ||
" x = self.flatten(x)\n", | ||
" logits = self.linear_relu_stack(x)\n", | ||
" return logits" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "2eb2afbe", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"network=Net(2)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "ec64bdb2", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"def l2_norm(prediction, y):\n", | ||
" return torch.norm(prediction - y, p=2).to(DEVICE)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "b9ea93f5", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"loss_fn = nn.MSELoss()\n", | ||
"pipe = Trainer(network, (dl_tr, dl_val), loss_fn, writer,l2_norm)\n", | ||
"pipe.train(SGD, 500, False, {\"lr\": 0.1})" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "3d4dc80e", | ||
"metadata": {}, | ||
"source": [ | ||
"## 3. Regularized model\n", | ||
"Next we fit a regularized model to the same dataset. Our loss function is\n", | ||
"\n", | ||
"$$\n", | ||
"L(f,X,y) = \\textrm{MSE_loss} \\big(X,y \\big) + \\lambda \\sum_{i=1}^{n} ||\\beta_i||_{1},\n", | ||
"$$\n", | ||
"\n", | ||
"where the latter term is known as the LASSO penalty and the $\\beta$s are the network weights.\n", | ||
"\n", | ||
"This regression technique falls into the broader category of *Tihonov-type* regularizers, which take the general form\n", | ||
"\n", | ||
"$$\n", | ||
"L(f,X,y) = \\textrm{MSE_loss} \\big(X,y \\big) + \\lambda \\sum_{i=1}^{n} || \\beta_i ||_{p}^{p},\n", | ||
"$$\n", | ||
"\n", | ||
"and serve as a namesake for our regularizer class. More general versions, such as elastic net, are readily extendable by mimicking the functionality of the TihonovRegularizer Class.\n", | ||
"\n", | ||
"\n", | ||
"### 3.1 Fitting the regularized model\n", | ||
"\n", | ||
"To fit a regularized model, we need use RegularizedTrainer, and define a regularizer. The inner workings of the giotto-deep regularizers are beyond the scope of this introductory tutorial, for now we just need to define a Tihonov regularizer and the regression penalty coefficient $\\lambda$. This is done as follows:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "b6893b86", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"pipereg = Trainer(network, (dl_tr, dl_val), loss_fn, writer,l2_norm,regularizer=TihonovRegularizer(0.2,p=1))\n", | ||
"pipereg.train(SGD, 500, False, {\"lr\": 0.1})" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "0a93abf7", | ||
"metadata": {}, | ||
"source": [ | ||
"## 4. Results\n", | ||
"Let us take a look at the regression coefficients we obtained from our two runs:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "5fb71d6b", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"ex = ModelExtractor(pipe.model, loss_fn)\n", | ||
"[*ex.get_layers_param().values()]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "555cbd6c", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"ex = ModelExtractor(pipereg.model, loss_fn)\n", | ||
"[*ex.get_layers_param().values()]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "0624ed1c", | ||
"metadata": {}, | ||
"source": [ | ||
"We see that our regularized model shrinks the second regression coefficient while keeping the first one relative close to 1." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "0dbd68c7", | ||
"metadata": {}, | ||
"source": [ | ||
"### 4.1 Regularization coefficients as a function of the regression penalty" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "67a5b70f", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"regpens=[0.001,0.01,0.05,0.1,0.2,0.5,1]\n", | ||
"param1=np.zeros(len(regpens))\n", | ||
"param2=np.zeros(len(regpens))\n", | ||
"for i in range(len(regpens)):\n", | ||
" pipe2 = Trainer(network, (dl_tr, dl_val), loss_fn, writer,l2_norm,regularizer=TihonovRegularizer(lamda=regpens[i],p=1))\n", | ||
" pipe2.lamda=regpens[i]\n", | ||
" pipe2.regularize=True\n", | ||
" pipe2.train(SGD, 500, False, {\"lr\": 0.02})\n", | ||
" ex = ModelExtractor(pipe2.model, loss_fn)\n", | ||
" param1[i]=[*ex.get_layers_param().values()][0][0][0]\n", | ||
" param2[i]=[*ex.get_layers_param().values()][0][0][1]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "e93e154e", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"plt.plot(regpens, param1, label = r'$\\alpha_1$')\n", | ||
"plt.plot(regpens, param2, label = r'$\\alpha_2$')\n", | ||
"plt.legend()\n", | ||
"plt.title('Regression coefficients as a function of regularization penalty')\n", | ||
"plt.xlabel('Regression penalty ' r'$\\lambda$')\n", | ||
"plt.ylabel('Value')\n", | ||
"plt.show()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "91066c56", | ||
"metadata": {}, | ||
"source": [ | ||
"We should see this regularization helps bring the coefficient $\\alpha_2$ to zero, for as long as the model is trained long enough to find the optimum. This also corrects the coefficient $\\alpha_1$ upwards. If we set lambda too high, we over-regularize the model diminishing its generalizability." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "ed7d9f9b", | ||
"metadata": {}, | ||
"source": [ | ||
"## 5. Concluding remarks and Further Reading\n", | ||
"In this tutorial we introduced the RegularizedTrainer class and showed how it can be used for Tihonov-type regularizered regression. *Giotto-deep* supports very flexible framework for defining more complicated regularization techniques, these are discussed in the notebook *deploying_custom_regularizers*." | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.9.17" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Oops, something went wrong.