Merge pull request #132 from hkirvesl/master

add regularizer
giotto-ai · Oct 11, 2023 · bdd41e8 · bdd41e8
2 parents 8d318c4 + 7c87aff
commit bdd41e8
Show file tree

Hide file tree

Showing 13 changed files with 1,562 additions and 32 deletions.
diff --git a/.github/workflows/python-package-windows.yml b/.github/workflows/python-package-windows.yml
@@ -48,6 +48,7 @@ jobs:
         coverageFile: coverage.xml
         token: ${{ secrets.GITHUB_TOKEN }}
       if: github.event_name == 'pull_request'
+      continue-on-error: true
     - name: Integration tests
       run: |
         set --exit

diff --git a/.github/workflows/python-package.yml b/.github/workflows/python-package.yml
@@ -53,6 +53,7 @@ jobs:
         token: ${{ secrets.GITHUB_TOKEN }}
         thresholdAll: 0.7
       if: github.event_name == 'pull_request'
+      continue-on-error: true
     - name: Integration tests
       run: |
         set -e

diff --git a/examples/basic_tutorial_regularizers.ipynb b/examples/basic_tutorial_regularizers.ipynb
@@ -0,0 +1,360 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "760ffc6d",
+   "metadata": {},
+   "source": [
+    "# Basic Tutorial : Regularization with Giotto-deep\n",
+    "#### Author: Henry Kirveslahti\n",
+    "\n",
+    "This short tutorial introduces the regularization techniques integrated to *giotto-deep*. We briefly introduce the topic and show how one can fit a LASSO-regularized model with *giotto-deep*.\n",
+    "\n",
+    "This notebook is organized as follows:\n",
+    "\n",
+    "1. Introduction\n",
+    "2. Data generation and an unregularized regression model\n",
+    "3. Using Tihonov-type regularizers with *giotto-deep*\n",
+    "4. Comparing the results from regularized and unregularized models\n",
+    "5. Concluding remarks and further reading\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "447fc00e",
+   "metadata": {},
+   "source": [
+    "## 1. Introduction\n",
+    "\n",
+    "Regularization is a powerful tool that aims to combat overfitting the data. If we want a model that is flexible enough to fit complex data, we need many parameters. In these cases it may be that there are multiple choices for the parameters to fit the data very well, and the the problem becomes ill-defined - there is no unique solution. When this is the case, we may introduce preference to a model that is in some way simpler, and we may do this with the help of a *regularizer*. This is the subject of this notebook.\n",
+    "\n",
+    "On a high level, without regularization, our loss $L$ is a function of the input data $X$ and the response variable y:\n",
+    "$$\n",
+    "L=L(X,y).\n",
+    "$$\n",
+    "In *giotto-deep*, these cases are handled by the usual Trainer class.\n",
+    "\n",
+    "However, when we want our loss to also depend on the model $M$ itself, that is,\n",
+    "\n",
+    "$$\n",
+    "L=L(X,y,M),\n",
+    "$$\n",
+    "we need to use the RegularizedTrainer class.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df5bf19a",
+   "metadata": {},
+   "source": [
+    "## 2. Example - Unregularized model\n",
+    "In this section we generate some data and fit a standard regression model with *giotto-deep*. First we import some dependencies:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "df8c44ad",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%reload_ext autoreload\n",
+    "%autoreload 2\n",
+    "%matplotlib inline\n",
+    "import numpy as np\n",
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "from torch.optim import SGD, Adam, RMSprop\n",
+    "import matplotlib.pyplot as plt\n",
+    "from torch.utils.data import TensorDataset, DataLoader\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from gdeep.trainer import Trainer\n",
+    "from gdeep.trainer.regularizer import TihonovRegularizer\n",
+    "from gdeep.search import GiottoSummaryWriter\n",
+    "from gdeep.models import ModelExtractor\n",
+    "from gdeep.utility import DEVICE\n",
+    "writer = GiottoSummaryWriter()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73aa84cb",
+   "metadata": {},
+   "source": [
+    "### Data generation\n",
+    "Next we generate data with sample size $S$. We have two covariates, $x_1$ and $x_2$, and a response variable $y$. The response variable $y$. More precisely :\n",
+    "\n",
+    "$$z_{0,i} \\sim^{\\textrm{i.i.d}} N(0,1);$$\n",
+    "$$z_{1,i} = 0.9*z_{0,i} + 0.1*\\tau_{1,i};$$\n",
+    "$$z_{2,i} = 0.85*z_{0,i} + 0.15*\\tau_{2,i};$$\n",
+    "$$y_i=z_{0,i}+ \\epsilon_i;$$\n",
+    "and\n",
+    "$$\\tau_{j,i} \\sim^{\\textrm{i.i.d}} N(0,1), j \\in \\{1,2\\};$$\n",
+    "$$\\epsilon \\sim^{\\textrm{i.i.d}},N(0,1),$$\n",
+    "indenpedently of $\\tau$s.\n",
+    "\n",
+    "In other words, we have 2 covariates that both are corrupted versions of $z_0$ that is directly related to the response. The two covariates are then highly correlated, but $z_1$ has better signal-to-noise ratio than $z_2$. If we want to predict $y$, $z_2$ doesn't have any merits over $z_1$.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "56bc8889",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "rng = np.random.default_rng()\n",
+    "S=100\n",
+    "z0=rng.standard_normal(S)\n",
+    "z1=0.9*z0+0.1*rng.standard_normal(S)\n",
+    "z2=0.85*z0+0.15*rng.standard_normal(S)\n",
+    "y=z0+rng.standard_normal(S)\n",
+    "X=np.stack([z1,z2],1)\n",
+    "y=y.reshape(-1,1)\n",
+    "y=y.astype(float)\n",
+    "X=X.astype(float)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d46f23d0",
+   "metadata": {},
+   "source": [
+    "## Modeling\n",
+    "Next we define our models. Our first model is an unregularized regression on 2 variables without intercept. Concretely, the underlying statistical model is:\n",
+    "\n",
+    "$$\n",
+    "y_i = \\alpha_1 z_{1,i} + \\alpha_{2,i} z_{2,i} + \\epsilon_i.\n",
+    "$$\n",
+    "\n",
+    "We train this simple model for 20 Epochs. Concretely:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "23a5974c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_x, val_x, train_y, val_y = train_test_split(X, y, test_size=0.1)\n",
+    "tensor_x_t = torch.Tensor(train_x)\n",
+    "tensor_x_t=tensor_x_t.float()\n",
+    "tensor_y_t = torch.from_numpy(train_y)\n",
+    "tensor_y_t=tensor_y_t.float()\n",
+    "tensor_x_v = torch.Tensor(val_x)\n",
+    "tensor_y_v = torch.from_numpy(val_y)\n",
+    "train_dataset = TensorDataset(tensor_x_t,tensor_y_t)\n",
+    "dl_tr = DataLoader(train_dataset,batch_size=10)\n",
+    "val_dataset = TensorDataset(tensor_x_v,tensor_y_v)\n",
+    "dl_val = DataLoader(val_dataset,batch_size=10)\n",
+    "\n",
+    "class Net(nn.Module):\n",
+    "    def __init__(self,featdim):\n",
+    "        super(Net, self).__init__() \n",
+    "        self.flatten = nn.Flatten()\n",
+    "        self.linear_relu_stack = nn.Sequential(\n",
+    "            nn.Linear(featdim, 1, bias=False),\n",
+    "        )\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "        x = self.flatten(x)\n",
+    "        logits = self.linear_relu_stack(x)\n",
+    "        return logits"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2eb2afbe",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "network=Net(2)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ec64bdb2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def l2_norm(prediction, y):\n",
+    "    return torch.norm(prediction - y, p=2).to(DEVICE)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b9ea93f5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "loss_fn = nn.MSELoss()\n",
+    "pipe = Trainer(network, (dl_tr, dl_val), loss_fn, writer,l2_norm)\n",
+    "pipe.train(SGD, 500, False, {\"lr\": 0.1})"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3d4dc80e",
+   "metadata": {},
+   "source": [
+    "## 3. Regularized model\n",
+    "Next we fit a regularized model to the same dataset. Our loss function is\n",
+    "\n",
+    "$$\n",
+    "L(f,X,y) = \\textrm{MSE_loss} \\big(X,y \\big) + \\lambda \\sum_{i=1}^{n} ||\\beta_i||_{1},\n",
+    "$$\n",
+    "\n",
+    "where the latter term is known as the LASSO penalty and the $\\beta$s are the network weights.\n",
+    "\n",
+    "This regression technique falls into the broader category of *Tihonov-type* regularizers, which take the general form\n",
+    "\n",
+    "$$\n",
+    "L(f,X,y) = \\textrm{MSE_loss} \\big(X,y \\big) + \\lambda \\sum_{i=1}^{n} || \\beta_i ||_{p}^{p},\n",
+    "$$\n",
+    "\n",
+    "and serve as a namesake for our regularizer class. More general versions, such as elastic net, are readily extendable by mimicking the functionality of the TihonovRegularizer Class.\n",
+    "\n",
+    "\n",
+    "### 3.1 Fitting the regularized model\n",
+    "\n",
+    "To fit a regularized model, we need use RegularizedTrainer, and define a regularizer. The inner workings of the giotto-deep regularizers are beyond the scope of this introductory tutorial, for now we just need to define a Tihonov regularizer and the regression penalty coefficient $\\lambda$. This is done as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b6893b86",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pipereg = Trainer(network, (dl_tr, dl_val), loss_fn, writer,l2_norm,regularizer=TihonovRegularizer(0.2,p=1))\n",
+    "pipereg.train(SGD, 500, False, {\"lr\": 0.1})"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0a93abf7",
+   "metadata": {},
+   "source": [
+    "## 4. Results\n",
+    "Let us take a look at the regression coefficients we obtained from our two runs:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5fb71d6b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ex = ModelExtractor(pipe.model, loss_fn)\n",
+    "[*ex.get_layers_param().values()]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "555cbd6c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ex = ModelExtractor(pipereg.model, loss_fn)\n",
+    "[*ex.get_layers_param().values()]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0624ed1c",
+   "metadata": {},
+   "source": [
+    "We see that our regularized model shrinks the second regression coefficient while keeping the first one relative close to 1."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0dbd68c7",
+   "metadata": {},
+   "source": [
+    "### 4.1 Regularization coefficients as a function of the regression penalty"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "67a5b70f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "regpens=[0.001,0.01,0.05,0.1,0.2,0.5,1]\n",
+    "param1=np.zeros(len(regpens))\n",
+    "param2=np.zeros(len(regpens))\n",
+    "for i in range(len(regpens)):\n",
+    "    pipe2 = Trainer(network, (dl_tr, dl_val), loss_fn, writer,l2_norm,regularizer=TihonovRegularizer(lamda=regpens[i],p=1))\n",
+    "    pipe2.lamda=regpens[i]\n",
+    "    pipe2.regularize=True\n",
+    "    pipe2.train(SGD, 500, False, {\"lr\": 0.02})\n",
+    "    ex = ModelExtractor(pipe2.model, loss_fn)\n",
+    "    param1[i]=[*ex.get_layers_param().values()][0][0][0]\n",
+    "    param2[i]=[*ex.get_layers_param().values()][0][0][1]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e93e154e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plt.plot(regpens, param1, label = r'$\\alpha_1$')\n",
+    "plt.plot(regpens, param2, label = r'$\\alpha_2$')\n",
+    "plt.legend()\n",
+    "plt.title('Regression coefficients as a function of regularization penalty')\n",
+    "plt.xlabel('Regression penalty ' r'$\\lambda$')\n",
+    "plt.ylabel('Value')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "91066c56",
+   "metadata": {},
+   "source": [
+    "We should see this regularization helps bring the coefficient $\\alpha_2$ to zero, for as long as the model is trained long enough to find the optimum. This also corrects the coefficient $\\alpha_1$ upwards. If we set lambda too high, we over-regularize the model diminishing its generalizability."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed7d9f9b",
+   "metadata": {},
+   "source": [
+    "## 5. Concluding remarks and Further Reading\n",
+    "In this tutorial we introduced the RegularizedTrainer class and showed how it can be used for Tihonov-type regularizered regression. *Giotto-deep* supports very flexible framework for defining more complicated regularization techniques, these are discussed in the notebook *deploying_custom_regularizers*."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.17"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}