-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathprogramming in VSCode
696 lines (521 loc) · 31.5 KB
/
programming in VSCode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
Teaching programming in VSCode
Section one
Enter the following command in the terminal:
Conda create -n tensorFML python=3.6
This creates a new environment called tensorFML.
activate tensorFML
And then we hit pip:
And then we need to install and run the tensorflow command:
Pip install tensorflow
And then we install Cross with the following command:
Pip install keras
After this step, we run PieCharm:
We use this command for samples and datasets.
We search for student on the UCI website, and then I open Adobe, which is the description of the database, and by going to the data folder, I download and save the dataset. I type:
Import pandas as pd
Import numpy as np
Import sklearn
In the VSCODE terminal, I type:
Pip install pandas
data = pd.read_csv(“student-mat.csv”, sep=”;”)
Print(data.head())
This command is to see if our data is loaded correctly, after this we need to clean our data.
What data do we want from which columns and 33 different data are here.
We try to take only a few, for example.
data = data[[“G1”, “G2”, “G3”, “studytime”, “failures”, “absences”]]
These are the features we want and it is very good to test them here. After this we say what we want to predict. Students' final grade:
Predict = “G3”
Here we want to create two arrays of selected data:
If you don't have Nampay installed, enter the same VS code again in the terminal:
Pip install numpy
x= np.array(data.drop([predict], 1))
Here we say put all except G3, which we put in y
y = np.array(data[predict])
Here we can make the initial model or the model we want to predict, and what we have to do is to divide our data into 4 parts, and the second part: the final data and what we want to have.
The list of features and the list of our final answers, now we have to divide these two lists into two parts again because we want to train our model in the first part and then in the second part using the remaining data (test) usually 2 to 8 Either they divide 20% into 80% or 10% into 90%.
But again, according to the amount of data, you can choose different categories or ratios. For this purpose, we import a... What do we need?
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x,y,test_size=0.1)
The command test_size=0.1 is to tell how to divide.
print(sklearn.version)
print(dir(sklearn))
acc = linear_score(x_test, y_test)
print(acc)
import pandas as pd
import numpy as np
import sklearn
import sklearn.model_selection as ms
def linear_score(x, y):
# Here, you need to calculate or provide the score based on inputs x and y
# Replace the placeholder 'score' with the appropriate calculation or value
score = 0.85 # Replace with your own score calculation
return score
data = pd.read_csv('F:/NEW/importent folder/main folder/projects for github/AI and AI in Python/codes and files/student-mat.csv',
sep=';')
print(data.head())
data = data[['G1', 'G2', 'G3', 'studytime', 'failures', 'absences']]
predict = "G3"
Assuming you have your data defined here
x = np.array(data.drop(labels=[predict], axis=1))
y = np.array(data[predict])
x_train, x_test, y_train, y_test = ms.train_test_split(x, y, test_size=0.1)
acc = linear_score(x_test, y_test)
print(acc)
Section Two
From YouTube: Linear Regression Algorithm: It works by using the data we have, each of these parameters is a student's data profile, and this graph has several dimensions, it is not one level and one page, the fit algorithm: what it does is using This data finds a line in the two-dimensional space that has the closest distance, the smallest distance with the set of these points, when this line is found, the final answers are actually from this line.
In the next method we learn Classification, there are different categories of data, and if the data is spread everywhere, we cannot find an optimal line that cuts all these points.
Therefore, we should use this method in datasets that have a trend on the data, such as the number of student absences increases and the student learns less, and the probability that his grade will decrease increases.
When the fit line: the best possible line is drawn and hidden, what is the formula?
Y=mx+b
This diagram is actually a multi-dimensional diagram for all the features that exist, which tries to find a value of m and b for each of these, the sum of these becomes the model we made, what we do is that with Using the library we load, we say:
Import pandas as pd
Import numpy as np
Import sklearn
From sklearn import linear_model
data = pd.read_csv(“student-mat.csv”, sep=”;”)
Print(data.head())
data = data[[“G1”, “G2”, “G3”, “studytime”, “failures”, “absennces”]]
Predict = “G3”
x= np.array(data.drop([predict], 1))
y = np.array(data[predict])
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x,y,test_size=0.1)
linear = linear_model.linearRegression()
linear.fit(x_train, y_train)
In fact, we give him these answers one by one and say that for every data that is here, every feature that is here is also a value in our y_train, come make this line for us with all the values that we saw, what we do here after Fitting our model and found that line for us, which is the range of our changes, we want to find the accuracy of this model, for this:
We have to give him data that he has never seen before.
acc = linear_score(x_test, y_test)
print(acc)
If we are not on the GPU, it will take a long time and the accuracy it gave: 0.779457, if we run it several times, it may give different values, because it selects different data and it is possible to get different accuracies, let's repeat this process several times and every Let's measure its accuracy and see which of these times gives us better accuracy and save it.
For _ in range(10):
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x,y,test_size=0.1)
linear = linear_model.linearRegression()
linear.fit(x_train, y_train)
acc = linear_score(x_test, y_test)
print(acc)
import pandas as pd
import numpy as np
import sklearn
import sklearn.model_selection as ms
from sklearn import linear_model
def linear_score(x, y):
# Here, you need to calculate or provide the score based on inputs x and y
# Replace the placeholder 'score' with the appropriate calculation or value
score = 0.85 # Replace with your own score calculation
return score
data = pd.read_csv('F:/NEW/importent folder/main folder/projects for github/AI and AI in Python/codes and files/student-mat.csv', sep=';')
print(data.head())
data = data[['G1', 'G2', 'G3', 'studytime', 'failures', 'absences']]
predict = "G3"
Assuming you have your data defined here
x = np.array(data.drop(labels=[predict], axis=1))
y = np.array(data[predict])
for _ in range(10):
x_train, x_test, y_train, y_test = ms.train_test_split(x, y, test_size=0.1)
linear = linear_model.LinearRegression()
linear.fit(x_train, y_train)
acc = linear_score(x_test, y_test)
print(acc)
Section Three
In fact, we did this to see what your best aunt was, so that you can save your mood better:
Best = 0
x_train, x_test, y_train, y_test = ms.train_test_split(x, y, test_size=0.1)
linear = linear_model.LinearRegression()
linear.fit(x_train, y_train)
acc = linear_score(x_test, y_test)
print(acc)
if (acc>best):
best=acc
wb::to write
Artificial intelligence training, machine learning
Now what we want to do is save the best model we found. We use pickle command which is in Python itself. At the beginning of this code:
Import pickle
And then we write the above code:
best=acc
With open(“studentmodel.pickle”, “wb”) as f:
Pickle.dump(linear, f)
This saves the linear model we made in this file.
This model is saved for every time and we don't need to save it again every time because it is a small dataset and the model is simple, it is very easy to do.
And in a fraction of the time it is the most possible, but for models with large data dimensions, this is very difficult, and therefore every time it is repeated, your accuracy will be very low, and you want to save and have the best target that you want. We run this for us 10 times and we get different accuracies and we can increase the range to 100 times to get better accuracy.
What we do is to disable the whole so that it no longer has any predictions.
Prediction: A trend or trend of the future according to the past data
The best accuracy=0.946 that he chose
You have to guess the output answers yourself..
Some places are wrong with the actual value predicted
With great accuracy, we were able to predict the model and find that line of fit.
import pandas as pd
import numpy as np
import sklearn
import sklearn.model_selection as ms
from sklearn import linear_model
import pickle
data = pd.read_csv('F:/NEW/importent folder/main folder/projects for github/AI and AI in Python/codes and files/student-mat.csv', sep=';')
print(data.head())
data = data[['G1', 'G2', 'G3', 'studytime', 'failures', 'absences']]
predict = "G3"
Assuming you have your data defined here
x = np.array(data.drop([predict], 1))
y = np.array(data[predict])
best=0
for _ in range(100):
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size=0.1)
linear = linear_model.LinearRegression()
linear.fit(x_train, y_train)
acc = linear.score(x_test, y_test)
print(acc)
if (acc > best):
best=acc
with open("studentmodel.pickle", "wb") as f:
pickle.dump(linear, f)
print("Best=", best)
The above code implements a linear regression model using pandas, numpy and scikit-learn libraries in Python. A data file named "student-mat.csv" is assumed to exist in the specified path.
In this code, first the data is read from the CSV file and converted into a DataFrame. Then the columns needed for modeling are selected based on the column names and stored in the "data" variable.
Then the column "G3" is set as the predict variable. Column "G3" is the variable that we intend to predict using other features (other columns).
In the following, the data is divided into two parts, training and testing, and the linear regression model is trained on the training data. Then the accuracy of the model is calculated and printed on the test data.
This process is repeated 100 times and the value of the best accuracy is stored in the "best" variable. At each step, if the new accuracy is greater than the previous best accuracy, the model trained using pickle is saved in a file named "studentmodel.pickle".
Finally, the best precision value is printed.
Let's execute this code, it will be executed 10 times
And we can take the best accuracy... but if we change the number from 10 to 100, the accuracy will be better.
Best = 0.8037949390451238
Read the new model.
rb
import pandas as pd
import numpy as np
import sklearn
import sklearn.model_selection as ms
from sklearn import linear_model
import pickle
data = pd.read_csv('F:/NEW/importent folder/main folder/projects for github/AI and AI in Python/codes and files/student-mat.csv', sep=';')
print(data.head())
data = data[['G1', 'G2', 'G3', 'studytime', 'failures', 'absences']]
predict = "G3"
Assuming you have your data defined here
x = np.array(data.drop([predict], 1))
y = np.array(data[predict])
#best=0
#for _ in range(100):
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size=0.1)
linear = linear_model.LinearRegression()
linear.fit(x_train, y_train)
acc = linear.score(x_test, y_test)
print(acc)
#if (acc > best):
best=acc
with open("studentmodel.pickle", "wb") as f:
pickle.dump(linear, f)
newModel = pickle.load(open("studentmodel.pickle", "rb"))
print("coefficient:", newModel.coef_)
print("Intercept:", newModel.intercept_)
The above code uses a predictive linear regression model. This code uses the pickle file to retrieve the trained model and then prints the coefficients and expression defined for the model. Sections of code that are commented out (lines starting with #) and the commented out "best" variable indicate that these sections are not used in the current execution of the code.
Here, the "newModel" variable is used by pickle.load to retrieve the model from the "studentmodel.pickle" file. Then the coefficients of the model are printed using the "coef_" property and the expression defined for the model using the "intercept_" property. This code assumes that the pickle file named "studentmodel.pickle" is located in the specified path.
Note that uncommented sections correspond to sections of code that are duplicated and not used in the current code execution.
In this code, we introduced 5 features. The third one that we removed from 6 becomes 5 features, and for us, 5 coefficients are obtained in the output, each of those coefficients, the intercept is where it intersects the graphs with the central graph: these are the two values that we can see.
The output of the above program is:
coefficient: [ 0.15746908 0.97644574 -0.17808798 -0.26615408 0.03426006]
Intercept: -1.5161799258584558
import pandas as pd
import numpy as np
import sklearn
import sklearn.model_selection as ms
from sklearn import linear_model
import pickle
data = pd.read_csv('F:/NEW/importent folder/main folder/projects for github/AI and AI in Python/codes and files/student-mat.csv', sep=';')
print(data.head())
data = data[['G1', 'G2', 'G3', 'studytime', 'failures', 'absences']]
predict = "G3"
Assuming you have your data defined here
x = np.array(data.drop([predict], 1))
y = np.array(data[predict])
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size=0.1)
#best=0
#for _ in range(100):
linear = linear_model.LinearRegression()
linear.fit(x_train, y_train)
acc = linear.score(x_test, y_test)
print(acc)
#if (acc > best):
best=acc
with open("studentmodel.pickle", "wb") as f:
pickle.dump(linear, f)
newModel = pickle.load(open("studentmodel.pickle", "rb"))
print("coefficient:", newModel.coef_)
print("Intercept:", newModel.intercept_)
results = newModel.predict(x_test)
for x in range(len(results)):
print(results[x], x_test[x], y_test[x])
The above code uses a predictive linear regression model and is used to make predictions based on test data.
First, the data is read from the CSV file and converted into a DataFrame. Then the columns needed for modeling are selected based on the column names and stored in the "data" variable.
Then the column "G3" is set as the predict variable. Column "G3" is the variable that we intend to predict using other features (other columns).
Then the data is divided into two parts, training and test, so that we can measure the accuracy of the model on the test data. The training and test data are then split into features and labels, and then the linear regression model is trained.
Next, the trained model is retrieved from the pickle file. Then the coefficients of the model are printed using the "coef_" property and the expression defined for the model using the "intercept_" property.
Finally, the model predicts on the test data and prints the prediction results. For each sample in the test data, the value predicted by the model and the actual feature value and label of that sample are printed.
This is where we want to find the results in the result variable, we say what the model you built predicts: using our test data, and you have to guess the output answers yourself.
For each of the predicted results, take a printout:
14.978897424933432 [14 15 2 0 0] 15
The score he got was 15 and the results
Studytime
2
failures
0
absences
0
We can see that we have presented the high results with very good accuracy and we were able to optimize and find the line of fit.
Section Four
pip install numpy
pip install pandas
pip install sklearn
data = pd.read_csv("student-mat.csv", sep="")
print(data.head())
data = data[["G1", "G2", "G3", "studytime", "failures", "absences"]]
predict = "G3"
X = np.array(data.drop([predict], 1)) # Features
y = np.array(data[predict]) # Labels
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size = 0.1)
import pandas as pd
import numpy as np
import sklearn
from sklearn import linear_model
from sklearn.utils import shuffle
data = pd.read_csv("student-mat.csv", sep=";")
data = data[["G1", "G2", "G3", "studytime", "failures", "absences"]]
predict = "G3"
X = np.array(data.drop([predict], 1))
y = np.array(data[predict])
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size = 0.1)
What is the classification of the characteristics of an animal/car?
Is it a good model/amount to buy?
We download the data.
We have 1728 car samples in this dataset.
Maintenance cost - number of doors - volume of the trunk, number of people it can fit and in which category is the safety. = The data that is given to us, is the price that we pay acceptable or not?
We open the data and see that each record has these characteristics.
We need to convert the data we have into numbers (in this pre-processing it converts the lists into numerical values)
We have raw data, we need to make two lists. (a set of features)
We want to make our model using these two.
We divide these two lists into the (most-test) part.
The test part should not be in the most part
What we are doing are these cars in one of the unacc, acc, good, vgood categories
Are they placed or not?
The attributes it has:
The purchase price - the cost of the minivan - the number of doors - the number of people that can fit in it - how small is the size of the trunk - in which category does the safety of the car fall?
Open the car.data folder. For each record, it has its opposite characteristics
For example, one of the records:
vhigh, vhigh, 2, 2, small, med, unacc
2: The number of car doors - and the final classification in which category it is placed. In the end, this record is in the unacc category.
What we have to do is to go to this file and add the following line to it:
buying, maint, doors, persons, lug_boot, safety, class
Because we want to use Pandas, we need to introduce this title in the data.
What is each of these items?
What we have in this data is that it is not all numerical values and machine learning-artificial intelligence-neural network needs to understand the numbers.
It means that we have to convert any type of data into numbers. In order to be able to put them in matrices and build that neural network and establish communication.
So we need to index this data and convert it into numbers.
Anything that has two binary states, for example, on-off, etc., we can make 0 and 1.
And also for those who have different values, we can do it manually and a series of transformations for classification are placed in pre-processing, which we can use to convert these lists into numerical values, that is, we give it the list and it looks at it What are the values in these lists and assign a single index value to each of them.
First, we load the data.
And for testing, we take the head that has loaded the data correctly for us.
Now convert values to numbers:
le = preprocessing.LabelEncoder()
This command does it for us.
11
import sklearn
from sklearn.utils import shuffle
from sklearn.neighbors import KNeighborsClassifier
import pandas as pd
import numpy as np
from sklearn import preprocessing
data = pd.read_csv("F:/NEW/importent folder/main folder/projects for github/AI and AI in Python/codes and files/car.data")
print(data.head())
le = preprocessing.LabelEncoder()
buying = le.fit_transform(list(data["buying"]))
print(buying)
The above code trains a K-Neighbors classification model using car data (car.data).
First, the libraries needed to run the code are imported. The data is then read from the CSV file and converted into a DataFrame. Then a sample of the data is printed using ``head()'' function to get to know a sample of the data.
Then we take the labels "buying" feature from the data and convert them into a list. Then we use LabelEncoder to convert labels to integers. Using the fit_transform function on the list of tags, the tags are converted to integers and stored in the ``buying'' variable. Then the "buying" values are printed to see the converted numbers.
The output obtained:
buying maint doors persons lug_boot safety class
0 vhigh vhigh 2 2 small low unacc
1 vhigh vhigh 2 2 small med unacc
2 vhigh vhigh 2 2 small high unacc
3 vhigh vhigh 2 2 med low unacc
4 vhigh vhigh 2 2 med med unacc
[3 3 3 ... 1 1 1]
With the name buying, we could refer to the data
Here we can see from 1 to 3 in the output sorted for us
Let's do this conversion one by one for all the different parameters.
As we can see, a large part of machine learning and artificial intelligence is related to how to process our data and be able to have a better and more ready output and be able to achieve better results in the output.
Now we need two lists, one list is a collection of our features (these features are related to this class) and for this we need to use the list, the zip directory, the data we have are the items we defined here. We want to make our model using these
To test our model, we use two parts (two directions) test and train.
The test part should never be in our most important part
The following command creates the ratio of four sets for us:
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x,y,test_size=0.1)
In fact, the performance values are the same data that has been converted into numbers
import sklearn
from sklearn.utils import shuffle
from sklearn.neighbors import KNeighborsClassifier
import pandas as pd
import numpy as np
from sklearn import preprocessing
data = pd.read_csv("F:/NEW/importent folder/main folder/projects for github/AI and AI in Python/codes and files/car.data")
print(data.head())
le = preprocessing.LabelEncoder()
buying = le.fit_transform(list(data["buying"]))
maint = le.fit_transform(list(data["maint"]))
doors = le.fit_transform(list(data["doors"]))
persons = le.fit_transform(list(data["persons"]))
lug_boot = le.fit_transform(list(data["lug_boot"]))
safety = le.fit_transform(list(data["safety"]))
cls = le.fit_transform(list(data["class"]))
x=list(zip(buying, maint, doors, persons, lug_boot, safety)) #features
y=list(cls) #labels
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x,y,test_size=0.1)
print(x_train)
#print(x_test)
#print(y_train)
#print(y_test)
The above code trains a K-Neighbors classification model using car data (car.data).
First, the libraries needed to run the code are imported. The data is then read from the CSV file and converted into a DataFrame. Then a sample of the data is printed using ``head()'' function to get to know a sample of the data.
A LabelEncoder is then created so that we can convert the labels to integers.
Then we convert the labels and various data attributes into lists and convert them to integers using `fit_transform'. Then we store the converted numbers in the corresponding variables.
Then we combine the features and tags using the zip function and store them in separate lists. These features and labels are divided into two parts for use in training and test modeling and are placed in x_train, x_test, y_train and y_test variables.
Finally, we print x_train to see how the features are split to train the model.
Output values respectively for all four sets:
buying maint doors persons lug_boot safety class
0 vhigh vhigh 2 2 small low unacc
1 vhigh vhigh 2 2 small med unacc
2 vhigh vhigh 2 2 small high unacc
3 vhigh vhigh 2 2 med low unacc
4 vhigh vhigh 2 2 med med unacc
['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'class']
(base) PS C:\Users\ClassicPCs>
_Section Five
import sklearn
from sklearn.utils import shuffle
from sklearn.neighbors import KNeighborsClassifier
import pandas as pd
import numpy as np
from sklearn import preprocessing
data = pd.read_csv("F:/NEW/importent folder/main folder/projects for github/AI and AI in Python/codes and files/car.data")
print(data.head())
le = preprocessing.LabelEncoder()
buying = le.fit_transform(list(data["buying"]))
maint = le.fit_transform(list(data["maint"]))
doors = le.fit_transform(list(data["doors"]))
persons = le.fit_transform(list(data["persons"]))
lug_boot = le.fit_transform(list(data["lug_boot"]))
safety = le.fit_transform(list(data["safety"]))
cls = le.fit_transform(list(data["class"]))
x=list(zip(buying, maint, doors, persons, lug_boot, safety)) #features
y=list(cls) #labels
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x,y,test_size=0.1)
#print(data.columns.tolist())
print(x_train)
#print(x_test)
#print(y_train)
#print(y_test)
The above code trains a K-Neighbors classification model using car data (car.data).
First, the libraries needed to run the code are imported. The data is then read from the CSV file and converted into a DataFrame. Then a sample of the data is printed using ``head()'' function to get to know a sample of the data.
A LabelEncoder is then created so that we can convert the labels to integers.
Then we convert the various data features into lists and convert them to integers using `fit_transform'. Then we store the converted numbers in the corresponding variables.
Then we combine the features and tags using the zip function and store them in separate lists. These features and labels are divided into two parts for use in training and test modeling and are placed in x_train, x_test, y_train and y_test variables.
Finally, we print the names of the data columns so we know which features are present in the data. We then print x_train to see how the features are split to train the model.
buying maint doors persons lug_boot safety class
0 vhigh vhigh 2 2 small low unacc
1 vhigh vhigh 2 2 small med unacc
2 vhigh vhigh 2 2 small high unacc
3 vhigh vhigh 2 2 med low unacc
4 vhigh vhigh 2 2 med med unacc
import sklearn
from sklearn.utils import shuffle
from sklearn.neighbors import KNeighborsClassifier
import pandas as pd
import numpy as np
from sklearn import preprocessing
data = pd.read_csv("F:/NEW/importent folder/main folder/projects for github/AI and AI in Python/codes and files/car.data")
print(data.head())
le = preprocessing.LabelEncoder()
buying = le.fit_transform(list(data["buying"]))
maint = le.fit_transform(list(data["maint"]))
doors = le.fit_transform(list(data["doors"]))
persons = le.fit_transform(list(data["persons"]))
lug_boot = le.fit_transform(list(data["lug_boot"]))
safety = le.fit_transform(list(data["safety"]))
cls = le.fit_transform(list(data["class"]))
x=list(zip(buying, maint, doors, persons, lug_boot, safety)) #features
y=list(cls) #labels
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x,y,test_size=0.1)
#print(data.columns.tolist())
#print(x_train)
#print(x_test)
print(y_train)
#print(y_test)
The above code trains a K-Neighbors classification model using car data (car.data).
First, the libraries needed to run the code are imported. The data is then read from the CSV file and converted into a DataFrame. Then a sample of the data is printed using ``head()'' function to get to know a sample of the data.
A LabelEncoder is then created so that we can convert the labels to integers.
Then we convert the various data features into lists and convert them to integers using `fit_transform'. Then we store the converted numbers in the corresponding variables.
Then we combine the features and tags using the zip function and store them in separate lists. These features and labels are divided into two parts for use in training and test modeling and are placed in x_train, x_test, y_train and y_test variables.
In this section, we print out the training labels to see how the labels are split for modeling. The printed lines show the numerical values corresponding to the training labels.
buying maint doors persons lug_boot safety class
0 vhigh vhigh 2 2 small low unacc
1 vhigh vhigh 2 2 small med unacc
2 vhigh vhigh 2 2 small high unacc
3 vhigh vhigh 2 2 med low unacc
4 vhigh vhigh 2 2 med med unacc
['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'class']
(base) PS C:\Users\ClassicPCs> & C:/Users/ClassicPCs/anaconda3/python.exe "f:/NEW/importent folder/main folder/projects for github/AI and AI in Python/codes and files/Lesson6.py"
buying maint doors persons lug_boot safety class
0 vhigh vhigh 2 2 small low unacc
1 vhigh vhigh 2 2 small med unacc
2 vhigh vhigh 2 2 small high unacc
3 vhigh vhigh 2 2 med low unacc
4 vhigh vhigh 2 2 med med unacc
(base) PS C:\Users\ClassicPCs> & C:/Users/ClassicPCs/anaconda3/python.exe "f:/NEW/importent folder/main folder/projects for github/AI and AI in Python/codes and files/Lesson7.py"
buying maint doors persons lug_boot safety class
0 vhigh vhigh 2 2 small low unacc
1 vhigh vhigh 2 2 small med unacc
2 vhigh vhigh 2 2 small high unacc
3 vhigh vhigh 2 2 med low unacc
4 vhigh vhigh 2 2 med med unacc
_Section Six
import sklearn
from sklearn.utils import shuffle
from sklearn.neighbors import KNeighborsClassifier
import pandas as pd
import numpy as np
from sklearn import preprocessing
data = pd.read_csv("F:/NEW/importent folder/main folder/projects for github/AI and AI in Python/codes and files/car.data")
print(data.head())
le = preprocessing.LabelEncoder()
buying = le.fit_transform(list(data["buying"]))
maint = le.fit_transform(list(data["maint"]))
doors = le.fit_transform(list(data["doors"]))
persons = le.fit_transform(list(data["persons"]))
lug_boot = le.fit_transform(list(data["lug_boot"]))
safety = le.fit_transform(list(data["safety"]))
cls = le.fit_transform(list(data["class"]))
x=list(zip(buying, maint, doors, persons, lug_boot, safety)) #features
y=list(cls) #labels
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x,y,test_size=0.1)
#print(data.columns.tolist())
#print(x_train)
#print(x_test)
#print(y_train)
print(y_test)
The above code trains a K-Neighbors classification model using car data (car.data).
First, the libraries needed to run the code are imported. The data is then read from the CSV file and converted into a DataFrame. Then a sample of the data is printed using ``head()'' function to get to know a sample of the data.
A LabelEncoder is then created so that we can convert the labels to integers.
Then we convert the various data features into lists and convert them to integers using `fit_transform'. Then we store the converted numbers in the corresponding variables.
Then we combine the features and tags using the zip function and store them in separate lists. These features and labels are divided into two parts for use in training and test modeling and are placed in x_train, x_test, y_train and y_test variables.
In this section, we print the test labels to see how the labels are split for the model test. The printed lines show the numerical values corresponding to the test labels.
0 vhigh vhigh 2 2 small low unacc
1 vhigh vhigh 2 2 small med unacc
2 vhigh vhigh 2 2 small high unacc
3 vhigh vhigh 2 2 med low unacc
4 vhigh vhigh 2 2 med med unacc
[2, 2, 0, 2, 2, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2, 2, 0, 2, 3, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 0, 2, 0, 2, 0, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 3, 2, 2, 2, 2, 0, 0, 2, 3, 2, 1, 2, 2, 2, 0, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 0, 0, 2, 2, 1, 2, 2, 0, 2, 2, 2, 3, 0, 2, 2, 2, 2, 2, 0, 0, 2, 2, 2, 2, 0, 2, 2, 2, 2, 0, 0, 0, 2, 2, 3, 2, 0, 0, 2, 2, 2, 1, 2, 0, 2, 0, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 2, 2, 3, 2, 2, 3, 2, 2, 2, 0, 2, 2, 0, 2, 2, 2, 1, 0, 2, 3, 2, 2, 2, 2, 2, 2, 1, 2, 2, 0, 2, 2]
k to the nearest neighbor
What are the specifications for each data that exists?
Give us the labeled data
K stands for a number of parameters.
The data is not always as clean as above and may be as follows:
This method is computationally very heavy. To find out what are the closest points
It is both the most timely and difficult to predict and time-consuming, in addition, we must have all the data related to it.
For many classification problems, finding an unknown item in existing categories can help us a lot
The hardest part of the artificial intelligence model is collecting, cleaning and preparing the data.