-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathData sources of the 145 datasets.txt
139 lines (132 loc) · 2.76 KB
/
Data sources of the 145 datasets.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
The first 19 datasets (data 0 to data 18) come from the scikit-learn project, the following 126 datasets come from the two benchmarks of evaluating clustering:
https://github.com/deric/clustering-benchmark
https://github.com/gagolews/clustering_benchmarks_v1
Following is filenames of the last 126 datasets, readers can check data sources of the datasets by file names from the two benchmarks.
data_ID, filename
19, DS-850.arff
20, compound.arff
21, complex9.arff
22, cluto-t5-8k.arff
23, complex8.arff
24, chainlink.arff
25, wingnut.arff
26, pathbased.arff
27, banana.arff
28, xclara.arff
29, disk-4000n.arff
30, simplex.arff
31, dense-disk-3000.arff
32, smile2.arff
33, cure-t0-2000n-2D.arff
34, cluto-t7-10k.arff
35, disk-5000n.arff
36, jain.arff
37, spiral.arff
38, 2dnormals.arff
39, triangle1.arff
40, disk-4600n.arff
41, sizes3.arff
42, DS-577.arff
43, atom.arff
44, long1.arff
45, dartboard1.arff
46, flame.arff
47, triangle2.arff
48, lsun.arff
49, dartboard2.arff
50, hypercube.arff
51, ds3c3sc6.arff
52, 2d-4c-no4.arff
53, st900.arff
54, spiralsquare.arff
55, gaussians1.arff
56, rings.arff
57, dense-disk-5000.arff
58, ds4c2sc8.arff
59, disk-6000n.arff
60, donut1.arff
61, disk-3000n.arff
62, blobs.arff
63, 2sp2glob.arff
64, 2d-4c-no9.arff
65, donut2.arff
66, cluto-t8-8k.arff
67, diamond9.arff
68, donut3.arff
69, twenty.arff
70, aml28.arff
71, dpb.arff
72, target.arff
73, elly-2d10c13s.arff
74, dpc.arff
75, pmf.arff
76, engytime.arff
77, spherical_5_2.arff
78, tetra.arff
79, cassini.arff
80, curves1.arff
81, shapes.arff
82, ds2c2sc13.arff
83, twodiamonds.arff
84, zelnik1.arff
85, 2d-4c.arff
86, hepta.arff
87, curves2.arff
88, elliptical_10_2.arff
89, square2.arff
90, zelnik3.arff
91, 2d-20c-no0.arff
92, 2d-10c.arff
93, square3.arff
94, donutcurves.arff
95, 3MC.arff
96, zelnik6.arff
97, spherical_4_3.arff
98, disk-4500n.arff
99, s-set2.arff
100, R15.arff
101, square4.arff
102, longsquare.arff
103, zelnik4.arff
104, zelnik5.arff
105, square5.arff
106, 3-spiral.arff
107, 2d-3c-no123.arff
108, s-set1.arff
109, spherical_6_2.arff
110, graves/dense
111, graves/line
112, graves/parabolic
113, graves/ring
114, graves/ring_outliers
115, graves/zigzag
116, graves/zigzag_noisy
117, other/chameleon_t4_8k
118, other/chameleon_t5_8k
119, other/chameleon_t7_10k
120, other/chameleon_t8_8k
121, other/hdbscan
122, other/square
123, sipu/a1
124, sipu/s3
125, sipu/s4
126, sipu/unbalance
127, wut/circles
128, wut/isolation
129, wut/mk1
130, wut/mk2
131, wut/mk3
132, wut/mk4
133, wut/olympic
134, wut/smile
135, wut/stripes
136, wut/trajectories
137, wut/twosplashes
138, wut/windows
139, wut/x1
140, wut/x2
141, wut/x3
142, wut/z1
143, wut/z2
144, wut/z3
If a dataset contains noise data points, the noise was removed from the original dataset. Duplicated data points were also removed from each dataset.