-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathen_Model Evaluation in Regression Models.txt
114 lines (114 loc) · 6.97 KB
/
en_Model Evaluation in Regression Models.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
Hello, and welcome!
In this video, we’ll be covering model evaluation. So, let’s get started.
The goal of regression is to build a model to accurately predict an unknown case.
To this end, we have to perform regression evaluation after building the model.
In this video, we’ll introduce and discuss two types of evaluation approaches that can
be used to achieve this goal.
These approaches are: train and test on the same dataset, and train/test split.
We’ll talk about what each of these are, as well as the pros and cons of using each
of these models.
Also, we’ll introduce some metrics for accuracy of regression models.
Let’s look at the first approach.
When considering evaluation models, we clearly want to choose the one that will give us the
most accurate results.
So, the question is, how we can calculate the accuracy of our model?
In other words, how much can we trust this model for prediction of an unknown sample,
using a given a dataset and having built a model such as linear regression.
One of the solutions is to select a portion of our dataset for testing.
For instance, assume that we have 10 records in our dataset.
We use the entire dataset for training, and we build a model using this training set.
Now, we select a small portion of the dataset, such as row numbers 6 to 9, but without the
labels.
This set, is called a test set, which has the labels, but the labels are not used for
prediction, and is used only as ground truth.
The labels are called “Actual values” of the test set.
Now, we pass the feature set of the testing portion to our built model, and predict the
target values.
Finally, we compare the predicted values by our model with the actual values in the test
set.
This indicates how accurate our model actually is.
There are different metrics to report the accuracy of the model, but most of them work
generally, based on the similarity of the predicted and actual values.
Let’s look at one of the simplest metrics to calculate the accuracy of our regression
model.
As mentioned, we just compare the actual values, y, with the predicted values, which is noted
as y ̂ for the testing set.
The error of the model is calculated as the average difference between the predicted and
actual values for all the rows.
We can write this error as an equation.
So, the first evaluation approach we just talked about is the simplest one: train and
test on the SAME dataset.
Essentially, the name of this approach says it all … you train the model on the entire
dataset, then you test it using a portion of the same dataset.
In a general sense, when you test with a dataset in which you know the target value for each
data point, you’re able to obtain a percentage of accurate predictions for the model.
This evaluation approach would most likely have a high “training accuracy” and a
low “out-of-sample accuracy”, since the model knows all of the testing data points
from the training.
What is training accuracy and out-of-sample accuracy?
We said that training and testing on the same dataset produces a high training accuracy,
but what exactly is "training accuracy?"
Training accuracy is the percentage of correct predictions that the model makes when using
the test dataset.
However, a high training accuracy isn’t necessarily a good thing.
For instance, having a high training accuracy may result in an ‘over-fit’ of the data.
This means that the model is overly trained to the dataset, which may capture noise and
produce a non-generalized model.
Out-of-sample accuracy is the percentage of correct predictions that the model makes on
data that the model has NOT been trained on.
Doing a “train and test” on the same dataset will most likely have low out-of-sample accuracy
due to the likelihood of being over-fit.
It’s important that our models have high, out-of-sample accuracy, because the purpose
of our model is, of course, to make correct predictions on unknown data.
So, how can we improve out-of-sample accuracy?
One way is to use another evaluation approach called "Train/Test Split."
In this approach, we select a portion of our
dataset for training, for example, rows 0 to 5.
And the rest is used for testing, for example, rows 6 to 9.
The model is built on the training set.
Then, the test feature set is passed to the model for prediction.
And finally, the predicted values for the test set are compared with the actual values
of the testing set.
This second evaluation approach, is called "Train/Test Split."
Train/Test Split involves splitting the dataset into training and testing sets, respectively,
which are mutually exclusive, after which, you train with the training set and test with
the testing set.
This will provide a more accurate evaluation on out-of-sample accuracy because the testing
dataset is NOT part of the dataset that has been used to train the data.
It is more realistic for real world problems.
This means that we know the outcome of each data point in this dataset, making it great
to test with!
And since this data has not been used to train the model, the model has no knowledge of the
outcome of these data points.
So, in essence, it’s truly out-of-sample testing.
However, please ensure that you train your model with the testing set afterwards, as
you don’t want to lose potentially valuable data.
The issue with train/test split is that it’s highly dependent on the datasets on which
the data was trained and tested.
The variation of this causes train/test split to have a better out-of-sample prediction
than training and testing on the same dataset, but it still has some problems due to this
dependency.
Another evaluation model, called "K-Fold Cross-validation," resolves most of these issues.
How do you fix a high variation that results from a dependency?
Well, you average it.
Let me explain the basic concept of “k-fold cross-validation” to see how we can solve
this problem.
The entire dataset is represented by the points in the image at the top left.
If we have k=4 folds, then we split up this dataset as shown here.
In the first fold, for example, we use the first 25 percent of the dataset for testing,
and the rest for training.
The model is built using the training set, and is evaluated using the test set.
Then, in the next round (or in the second fold), the second 25 percent of the dataset
is used for testing and the rest for training the model.
Again the accuracy of the model is calculated.
We continue for all folds.
Finally the result of all 4 evaluations are averaged.
That is, the accuracy of each fold is then averaged, keeping in mind that each fold is
distinct, where no training data in one fold is used in another.
K-fold cross-validation, in its simplest form, performs multiple train/test splits using
the same dataset where each split is different.
Then, the result is averaged to produce a more consistent out-of-sample accuracy.
We wanted to show you an evaluation model that addressed some of the issues we’ve
described in the previous approaches.
However, going in-depth with the K-fold cross-validation model is out of the scope for this course.
Thanks for watching!