en_Simple Linear Regression.srt

0
00:00:00,510 --> 00:00:05,280
Hello, and welcome! In this video, we’ll be covering linear regression.

1
00:00:05,280 --> 00:00:12,080
You don’t need to know any linear algebra to understand topics in linear regression.

2
00:00:12,080 --> 00:00:17,000
This high-level introduction will give you enough background information on linear regression

3
00:00:17,000 --> 00:00:20,380
to be able to use it effectively on your own problems.

4
00:00:20,380 --> 00:00:22,900
So, let’s get started.

5
00:00:22,900 --> 00:00:28,250
Let’s take a look at this dataset. It’s related to the Co2 emission of different

6
00:00:28,250 --> 00:00:33,210
cars. It includes Engine size, Cylinders, Fuel Consumption

7
00:00:33,210 --> 00:00:40,739
and Co2 emissions for various car models. The question is: Given this dataset, can we

8
00:00:40,739 --> 00:00:46,039
predict the Co2 emission of a car, using another field, such as Engine size?

9
00:00:46,039 --> 00:00:50,289
Quite simply, yes! We can use linear regression to predict a

10
00:00:50,289 --> 00:00:56,760
continuous value such as Co2 Emission, by using other variables.

11
00:00:56,760 --> 00:01:01,989
Linear regression is the approximation of a linear model used to describe the relationship

12
00:01:01,989 --> 00:01:07,830
between two or more variables. In simple linear regression, there are two

13
00:01:07,830 --> 00:01:13,350
variables: a dependent variable and an independent variable.

14
00:01:13,350 --> 00:01:18,480
The key point in the linear regression is that our dependent value should be continuous

15
00:01:18,480 --> 00:01:23,840
and cannot be a discreet value. However, the independent variable(s) can be

16
00:01:23,840 --> 00:01:28,810
measured on either a categorical or continuous measurement scale.

17
00:01:28,810 --> 00:01:36,420
There are two types of linear regression models. They are: simple regression and multiple regression.

18
00:01:36,420 --> 00:01:41,580
Simple linear regression is when one independent variable is used to estimate

19
00:01:41,580 --> 00:01:46,370
a dependent variable. For example, predicting Co2 emission using

20
00:01:46,370 --> 00:01:51,480
the EngineSize variable. When more than one independent variable is

21
00:01:51,480 --> 00:01:55,140
present, the process is called multiple linear regression.

22
00:01:55,140 --> 00:02:01,440
For example, predicting Co2 emission using EngineSize and Cylinders of cars.

23
00:02:01,440 --> 00:02:05,090
Our focus in this video is on simple linear regression.

24
00:02:05,090 --> 00:02:13,400
Now, let’s see how linear regression works. OK, so let’s look at our dataset again.

25
00:02:13,400 --> 00:02:18,019
To understand linear regression, we can plot our variables here.

26
00:02:18,019 --> 00:02:23,900
We show Engine size as an independent variable, and Emission as the target value that we would

27
00:02:23,900 --> 00:02:28,200
like to predict. A scatterplot clearly shows the relation between

28
00:02:28,200 --> 00:02:35,939
variables where changes in one variable &quot;explain&quot; or possibly &quot;cause&quot; changes in the other variable.

29
00:02:35,939 --> 00:02:41,330
Also, it indicates that these variables are linearly related.

30
00:02:41,330 --> 00:02:45,440
With linear regression you can fit a line through the data.

31
00:02:45,440 --> 00:02:50,849
For instance, as the EngineSize increases, so do the emissions.

32
00:02:50,849 --> 00:02:54,840
With linear regression, you can model the relationship of these variables.

33
00:02:54,840 --> 00:03:01,290
A good model can be used to predict what the approximate emission of each car is.

34
00:03:01,290 --> 00:03:07,060
How do we use this line for prediction now? Let us assume, for a moment, that the line

35
00:03:07,060 --> 00:03:11,290
is a good fit of data. We can use it to predict the emission of an

36
00:03:11,290 --> 00:03:15,940
unknown car. For example, for a sample car, with engine

37
00:03:15,940 --> 00:03:20,640
size 2.4, you can find the emission is 214.

38
00:03:20,640 --> 00:03:26,269
Now, let’s talk about what this fitting line actually is.

39
00:03:26,269 --> 00:03:30,239
We’re going to predict the target value, y.

40
00:03:30,239 --> 00:03:38,160
In our case, using the independent variable, &quot;Engine Size,&quot; represented by x1.

41
00:03:38,160 --> 00:03:46,189
The fit line is shown traditionally as a polynomial. In a simple regression problem (a single x),

42
00:03:46,189 --> 00:03:55,799
the form of the model would be θ0 +θ1 x1. In this equation, y ̂ is the dependent variable

43
00:03:55,799 --> 00:04:05,699
or the predicted value, and x1 is the independent variable; θ0 and θ1 are the parameters of

44
00:04:05,699 --> 00:04:11,730
the line that we must adjust. θ1 is known as the &quot;slope&quot; or &quot;gradient&quot;

45
00:04:11,730 --> 00:04:17,370
of the fitting line and θ0 is known as the &quot;intercept.&quot;

46
00:04:17,370 --> 00:04:23,540
θ0 and θ1 are also called the coefficients of the linear equation.

47
00:04:23,540 --> 00:04:31,100
You can interpret this equation as y ̂ being a function of x1, or y ̂ being dependent of x1.

48
00:04:31,100 --> 00:04:35,700
Now the questions are: &quot;How would you draw

49
00:04:35,700 --> 00:04:41,500
a line through the points?&quot; And, &quot;How do you determine which line ‘fits

50
00:04:41,500 --> 00:04:42,660
best’?&quot;

51
00:04:42,660 --> 00:04:46,600
Linear regression estimates the coefficients of the line.

52
00:04:46,600 --> 00:04:54,060
This means we must calculate θ0 and θ1 to find the best line to ‘fit’ the data.

53
00:04:54,060 --> 00:04:59,600
This line would best estimate the emission of the unknown data points.

54
00:04:59,600 --> 00:05:05,220
Let’s see how we can find this line, or to be more precise, how we can adjust the

55
00:05:05,220 --> 00:05:09,230
parameters to make the line the best fit for the data.

56
00:05:09,230 --> 00:05:15,340
For a moment, let’s assume we’ve already found the best fit line for our data.

57
00:05:15,340 --> 00:05:21,660
Now, let’s go through all the points and check how well they align with this line.

58
00:05:21,660 --> 00:05:30,100
Best fit, here, means that if we have, for instance, a car with engine size x1=5.4, and

59
00:05:30,100 --> 00:05:41,990
actual Co2=250, its Co2 should be predicted very close to the actual value, which is y=250,

60
00:05:41,990 --> 00:05:44,160
based on historical data.

61
00:05:44,160 --> 00:05:51,410
But, if we use the fit line, or better to say, using our polynomial with known parameters

62
00:05:51,410 --> 00:05:57,650
to predict the Co2 emission, it will return y ̂ =340.

63
00:05:57,650 --> 00:06:04,440
Now, if you compare the actual value of the emission of the car with what we predicted

64
00:06:04,440 --> 00:06:10,030
using our model, you will find out that we have a 90-unit error.

65
00:06:10,030 --> 00:06:17,290
This means our prediction line is not accurate. This error is also called the residual error.

66
00:06:17,290 --> 00:06:24,920
So, we can say the error is the distance from the data point to the fitted regression line.

67
00:06:24,920 --> 00:06:31,190
The mean of all residual errors shows how poorly the line fits with the whole dataset.

68
00:06:31,190 --> 00:06:38,880
Mathematically, it can be shown by the equation, mean squared error, shown as (MSE).

69
00:06:38,880 --> 00:06:43,980
Our objective is to find a line where the mean of all these errors is minimized.

70
00:06:43,980 --> 00:06:49,680
In other words, the mean error of the prediction using the fit line should be minimized.

71
00:06:49,680 --> 00:06:56,330
Let’s re-word it more technically. The objective of linear regression is to minimize

72
00:06:56,330 --> 00:07:04,240
this MSE equation, and to minimize it, we should find the best parameters, θ0 and θ1.

73
00:07:04,240 --> 00:07:13,640
Now, the question is, how to find θ0 and θ1 in such a way that it minimizes this error?

74
00:07:13,640 --> 00:07:19,580
How can we find such a perfect line? Or, said another way, how should we find the

75
00:07:19,580 --> 00:07:25,250
best parameters for our line? Should we move the line a lot randomly and

76
00:07:25,250 --> 00:07:29,650
calculate the MSE value every time, and choose the minimum one?

77
00:07:29,650 --> 00:07:34,430
Not really! Actually, we have two options here:

78
00:07:34,430 --> 00:07:40,430
Option 1 - We can use a mathematic approach. Or, Option 2 - We can use an optimization

79
00:07:40,430 --> 00:07:41,490
approach.

80
00:07:41,490 --> 00:07:49,420
Let’s see how we can easily use a mathematic formula to find the θ0 and θ1.

81
00:07:49,420 --> 00:07:56,750
As mentioned before, θ0 and θ1, in the simple linear regression, are the coefficients of

82
00:07:56,750 --> 00:08:01,460
the fit line. We can use a simple equation to estimate these

83
00:08:01,460 --> 00:08:04,770
coefficients. That is, given that it’s a simple linear

84
00:08:04,770 --> 00:08:12,320
regression, with only 2 parameters, and knowing that θ0 and θ1 are the intercept and slope

85
00:08:12,320 --> 00:08:17,490
of the line, we can estimate them directly from our data.

86
00:08:17,490 --> 00:08:23,590
It requires that we calculate the mean of the independent and dependent or target columns,

87
00:08:23,590 --> 00:08:28,080
from the dataset. Notice that all of the data must be available

88
00:08:28,080 --> 00:08:34,560
to traverse and calculate the parameters. It can be shown that the intercept and slope

89
00:08:34,559 --> 00:08:40,729
can be calculated using these equations. We can start off by estimating the value for θ1.

90
00:08:40,729 --> 00:08:44,510
This is how you can find the slope of a line

91
00:08:44,510 --> 00:08:50,180
based on the data. x ̅  is the average value for the engine size

92
00:08:50,180 --> 00:08:55,990
in our dataset. Please consider that we have 9 rows here,

93
00:08:55,990 --> 00:09:01,420
row 0 to 8. First, we calculate the average of x1 and

94
00:09:01,420 --> 00:09:06,490
average of y. Then we plug it into the slope equation, to

95
00:09:06,490 --> 00:09:12,890
find θ1. The xi and yi in the equation refer to the

96
00:09:12,890 --> 00:09:20,070
fact that we need to repeat these calculations across all values in our dataset and i refers

97
00:09:20,070 --> 00:09:24,860
to the i’th value of x or y.

98
00:09:24,860 --> 00:09:32,090
Applying all values, we find θ1=39; it is our second parameter.

99
00:09:32,090 --> 00:09:37,140
It is used to calculate the first parameter, which is the intercept of the line.

100
00:09:37,140 --> 00:09:43,640
Now, we can plug θ1 into the line equation to find θ0.

101
00:09:43,640 --> 00:09:54,210
It is easily calculated that θ0=125.74. So, these are the two parameters for the line,

102
00:09:54,210 --> 00:10:02,529
where θ0 is also called the bias coefficient and θ1 is the coefficient for the Co2 Emission

103
00:10:02,529 --> 00:10:06,690
column. As a side note, you really don’t need to

104
00:10:06,690 --> 00:10:11,810
remember the formula for calculating these parameters, as most of the libraries used

105
00:10:11,810 --> 00:10:18,770
for machine learning in Python, R, and Scala can easily find these parameters for you.

106
00:10:18,770 --> 00:10:22,680
But it’s always good to understand how it works.

107
00:10:22,680 --> 00:10:27,020
Now, we can write down the polynomial of the line.

108
00:10:27,020 --> 00:10:32,320
So, we know how to find the best fit for our data, and its equation.

109
00:10:32,320 --> 00:10:38,150
Now the question is: &quot;How can we use it to predict the emission of a new car based on

110
00:10:38,150 --> 00:10:40,580
its engine size?&quot;

111
00:10:40,580 --> 00:10:45,700
After we found the parameters of the linear equation, making predictions is as simple

112
00:10:45,700 --> 00:10:50,000
as solving the equation for a specific set of inputs.

113
00:10:50,000 --> 00:10:57,750
Imagine we are predicting Co2 Emission(y) from EngineSize(x) for the Automobile in record

114
00:10:57,750 --> 00:11:01,970
number 9. Our linear regression model representation

115
00:11:01,970 --> 00:11:09,640
for this problem would be: y ̂ = θ0 + θ1 x1.

116
00:11:09,640 --> 00:11:19,600
Or if we map it to our dataset, it would be Co2Emission = θ0 + θ1 EngineSize.

117
00:11:19,600 --> 00:11:26,210
As we saw, we can find θ0, θ1 using the equations that we just talked about.

118
00:11:26,210 --> 00:11:31,080
Once found, we can plug in the equation of the linear model.

119
00:11:31,080 --> 00:11:44,480
For example, let’s use θ0=125 and θ1=39. So, we can rewrite the linear model as 𝐶𝑜2𝐸𝑚𝑖𝑠𝑠𝑖𝑜𝑛=125+39𝐸𝑛𝑔𝑖𝑛𝑒𝑆𝑖𝑧𝑒.

120
00:11:44,480 --> 00:11:55,310
Now, let’s plug in the 9th row of our dataset and calculate the Co2 Emission for a car with

121
00:11:55,310 --> 00:12:05,770
an EngineSize of 2.4. So Co2Emission = 125 + 39 × 2.4.

122
00:12:05,770 --> 00:12:14,020
Therefore, we can predict that the Co2 Emission for this specific car would be 218.6.

123
00:12:14,020 --> 00:12:20,130
Let’s talk a bit about why Linear Regression is so useful.

124
00:12:20,130 --> 00:12:25,320
Quite simply, it is the most basic regression to use and understand.

125
00:12:25,320 --> 00:12:30,730
In fact, one reason why Linear Regression is so useful is that it’s fast!

126
00:12:30,730 --> 00:12:36,350
It also doesn’t require tuning of parameters. So, something like tuning the K parameter

127
00:12:36,350 --> 00:12:41,990
in K-Nearest Neighbors or the learning rate in Neural Networks isn’t something to worry

128
00:12:41,990 --> 00:12:45,860
about. Linear Regression is also easy to understand

129
00:12:45,860 --> 00:12:48,460
and highly interpretable.

130
00:12:48,460 --> 00:12:50,220
Thanks for watching this video.