-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathen_More on Hierarchical Clustering.srt
256 lines (192 loc) · 6.76 KB
/
en_More on Hierarchical Clustering.srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
0
00:00:00,410 --> 00:00:04,970
Hello, and welcome! In this video, we’ll be covering more details
1
00:00:04,970 --> 00:00:08,750
about Hierarchical clustering. Let’s get started.
2
00:00:08,750 --> 00:00:14,059
Let’s look at Agglomerative algorithm for Hierarchical Clustering.
3
00:00:14,059 --> 00:00:18,120
Remember that Agglomerative clustering is a bottom-up approach.
4
00:00:18,120 --> 00:00:26,050
Let’s say our dataset has n data points. First, we want to create n clusters, one for
5
00:00:26,050 --> 00:00:31,500
each data point. Then each point is assigned as a cluster.
6
00:00:31,500 --> 00:00:38,890
Next, we want to compute the distance/proximity matrix, which will be an n by n table.
7
00:00:38,890 --> 00:00:43,579
After that, we want to iteratively run the following steps until the specified cluster
8
00:00:43,579 --> 00:00:48,890
number is reached, or until there is only one cluster left.
9
00:00:48,890 --> 00:00:56,660
First, MERGE the two nearest clusters. (Distances are computed already in the proximity matrix.)
10
00:00:56,660 --> 00:01:01,649
Second, UPDATE the proximity matrix with the new values.
11
00:01:01,649 --> 00:01:07,620
We stop after we’ve reached the specified number of clusters, or there is only one cluster
12
00:01:07,620 --> 00:01:14,940
remaining, with the result stored in a dendrogram. So, in the proximity matrix, we have to measure
13
00:01:14,940 --> 00:01:20,610
the distances between clusters, and also merge the clusters that are “nearest.”
14
00:01:20,610 --> 00:01:27,170
So, the key operation is the computation of the proximity between the clusters with one
15
00:01:27,170 --> 00:01:31,230
point, and also clusters with multiple data points.
16
00:01:31,230 --> 00:01:36,110
At this point, there are a number of key questions that need to be answered.
17
00:01:36,110 --> 00:01:41,530
For instance, “How do we measure the distances between these clusters and How do we define
18
00:01:41,530 --> 00:01:47,850
the ‘nearest’ among clusters?” We also can ask, “Which points do we use?”
19
00:01:47,850 --> 00:01:53,970
First, let’s see how to calculate the distance between 2 clusters with 1 point each.
20
00:01:53,970 --> 00:02:00,560
Let’s assume that we have a dataset of patients, and we want to cluster them using hierarchy
21
00:02:00,560 --> 00:02:05,220
clustering. So, our data points are patients, with a feature
22
00:02:05,220 --> 00:02:11,950
set of 3 dimensions. For example, Age, Body Mass Index (or BMI),
23
00:02:11,950 --> 00:02:16,070
and Blood Pressure. We can use different distance measurements
24
00:02:16,070 --> 00:02:21,560
to calculate the proximity matrix. For instance, Euclidean distance.
25
00:02:21,560 --> 00:02:28,930
So, if we have a dataset of n patients, we can build an n by n dissimilarly-distance
26
00:02:28,930 --> 00:02:32,770
matrix. It will give us the distance of clusters with
27
00:02:32,770 --> 00:02:37,800
1 data point. However, as mentioned, we merge clusters in
28
00:02:37,800 --> 00:02:42,570
Agglomerative clustering. Now, the question is, “How can we calculate
29
00:02:42,570 --> 00:02:48,470
the distance between clusters when there are multiple patients in each cluster?”
30
00:02:48,470 --> 00:02:53,790
We can use different criteria to find the closest clusters, and merge them.
31
00:02:53,790 --> 00:03:01,010
In general, it completely depends on the data type, dimensionality of data, and most importantly,
32
00:03:01,010 --> 00:03:06,680
the domain knowledge of the dataset. In fact, different approaches to defining
33
00:03:06,680 --> 00:03:11,790
the distance between clusters, distinguish the different algorithms.
34
00:03:11,790 --> 00:03:15,940
As you might imagine, there are multiple ways we can do this.
35
00:03:15,940 --> 00:03:21,880
The first one is called Single-Linkage Clustering. Single linkage is defined as the shortest
36
00:03:21,880 --> 00:03:27,650
distance between 2 points in each cluster, such as point “a” and “b”.
37
00:03:27,650 --> 00:03:33,480
Next up is Complete-Linkage Clustering. This time, we are finding the longest distance
38
00:03:33,480 --> 00:03:39,680
between points in each cluster, such as the distance between point “a” and “b”.
39
00:03:39,680 --> 00:03:45,710
The third type of linkage is Average Linkage Clustering, or the mean distance.
40
00:03:45,710 --> 00:03:50,660
This means we’re looking at the average distance of each point from one cluster to
41
00:03:50,660 --> 00:03:56,959
every point in another cluster. The final linkage type to be reviewed is Centroid
42
00:03:56,959 --> 00:04:01,790
Linkage Clustering. Centroid is the average of the feature sets
43
00:04:01,790 --> 00:04:06,660
of points in a cluster. This linkage takes into account the centroid
44
00:04:06,660 --> 00:04:11,310
of each cluster when determining the minimum distance.
45
00:04:11,310 --> 00:04:15,380
There are 3 main advantages to using hierarchical clustering.
46
00:04:15,380 --> 00:04:21,940
First, we do not need to specify the number of clusters required for the algorithm.
47
00:04:21,940 --> 00:04:26,340
Second, hierarchical clustering is easy to implement.
48
00:04:26,340 --> 00:04:32,400
And third, the dendrogram produced is very useful in understanding the data.
49
00:04:32,400 --> 00:04:39,150
There are some disadvantages as well. First, the algorithm can never undo any previous
50
00:04:39,150 --> 00:04:44,090
steps. So for example, the algorithm clusters 2 points,
51
00:04:44,090 --> 00:04:50,000
and later on we see that the connection was not a good one, the program cannot undo that
52
00:04:50,000 --> 00:04:54,100
step. Second, the time complexity for the clustering
53
00:04:54,100 --> 00:05:00,100
can result in very long computation times, in comparison with efficient algorithms, such
54
00:05:00,100 --> 00:05:04,720
k-Means. Finally, if we have a large dataset, it can
55
00:05:04,720 --> 00:05:09,320
become difficult to determine the correct number of clusters by the dendrogram.
56
00:05:09,320 --> 00:05:15,230
Now, let’s compare Hierarchical clustering with k-Means.
57
00:05:15,230 --> 00:05:22,090
K-Means is more efficient for large datasets. In contrast to k-Means, Hierarchical clustering
58
00:05:22,090 --> 00:05:27,070
does not require the number of clusters to be specified.
59
00:05:27,070 --> 00:05:31,830
Hierarchical clustering gives more than one partitioning depending on the resolution,
60
00:05:31,830 --> 00:05:37,110
whereas k-Means gives only one partitioning of the data.
61
00:05:37,110 --> 00:05:42,530
Hierarchical clustering always generates the same clusters, in contrast with k-Means that
62
00:05:42,530 --> 00:05:49,600
returns different clusters each time it is run due to random initialization of centroids.
63
00:05:49,600 --> 00:05:50,580
Thanks for watching!