-
Notifications
You must be signed in to change notification settings - Fork 25
/
Copy path02_julia_intro.Rmd
1069 lines (831 loc) · 39.6 KB
/
02_julia_intro.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
editor_options:
markdown:
wrap: sentence
---
# Meeting Julia
## Why Julia
```{julia, echo=FALSE}
using Markdown
using InteractiveUtils
```
People have asked us why we wrote this book using Julia instead of another programming language like Python or R, which are the current standards in the data science world.
While Python and R are also great choices, Julia is an up and coming language that will surely have an impact in the coming years.
It performs faster than pure R and Python, and as fast as C, while maintaining the same degree of readability, allowing us to write highly performant code in a simple way.
Julia is already being used in many top-tier tech companies and scientific research projects ---there are plenty of scientists and engineers of different disciplines collaborating with Julia, which gives us a wide range of possibilities to approach different problems.
Often, languages like Python or R offer libraries that are optimized to be performant, but these libraries are usually written in other languages better suited for this task such as C or Fortran, as well as writing code to manage the communication between the high level language and the low level one.
In Julia, performant libraries can be developed in plain Julia code, following some basic coding guidelines to get the most out of it.
This enables people without extensive programming or Computer Science expertise to create and use libraries for their work.
In this sense, Julia expands the possibilities of domain experts who need to solve problems that involve a lot of computation.
## Julia introduction
Julia is a free and open-source general-purpose language, designed and developed by Jeff Bezanson, Alan Edelman, Viral B. Shah and Stefan Karpinski at MIT.
Julia is created from scratch to be both fast and easy to understand, even for people who are not programmers or computer scientists.
It has abstraction capabilities of high-level languages, while also being really fast, as its slogan promises:
> "Julia looks like Python, feels like Lisp, runs like Fortran".
Before Julia, designers of programming languages had to make a choice between having their language have a simple syntax and good abstraction capabilities ---and therefore being user-friendly--- or high performance, which was necessary to solve resource-intensive computations.
It was not possible to have both.
This required applied scientists to not only learn two different languages (a user-friendly one and a highly performant one), but also learn how to have them communicate with one another.
This difficulty is called the two-language problem, which Julia creators aim to overcome.
Julia is dynamically typed and great for interactive use.
It also uses multiple dispatch as a core design concept, which adds to the composability of the language.
In conventional, single-dispatched programming languages, when invoking a method, one of the arguments has a special treatment since it determines which of the methods contained in a function is going to be applied.
Multiple dispatch is a generalization of this for all the arguments of the function, so the method applied is going to be the one that matches exactly the number of types of the function call.
## Installation
For the installation process, we recommend you follow the instructions provided by the Julia team: [Platform Specific Instructions for Official Binaries](https://julialang.org/downloads/platform/): These instructions will get you through a fresh installation of Julia depending on the specifications of your computer.
It is a bare bones installation, so it will only include the basic Julia packages.
All along the book, we are going to use specific Julia packages that you have to install before calling them in your code.
Julia has a built-in packet manager that makes the task of installing new packages and checking compatibilities very easy.
First, you will need to start a Julia session.
For this, type in your terminal:
``` julia
~ julia
julia>
```
At this point, your Julia session will have started.
What you see right now is a **Julia REPL** (REPL stands for Read-Eval-Print Loop), an interactive command line prompt.
Here you can quickly evaluate Julia expressions, get help about different Julia functionalities and much more.
The REPL has a set of different modes you can activate with different keybindings.
The Julian mode is the default one, where you can directly type any Julia expression and press the Enter key to evaluate and print it.
The **help mode** is activated with an interrogation sign `?`.
You will notice that the prompt will now change.
At this point, your Julia session will have started.
What you see right now is a **Julia REPL** (read-eval-print loop), an interactive command line prompt.
Here you can quickly evaluate Julia expressions, get help about different Julia functionalities and much more.
The REPL has a set of different modes you can activate with different keybindings.
The *Julian mode* is the default, where you can directly type any Julia expression and press the Enter key to evaluate and print it.
The *help mode* is activated with an interrogation sign `?` . You will notice that the prompt will now change,
``` julia
julia> ?
help?>
```
By typing the name of a function or a Julia package, you will get information about it as well as usage examples.
Another available mode is the *shell mode*.
This is just a way to input terminal commands in your Julia REPL. You can access this mode by typing a semicolon `;`.
``` julia
julia> ;
shell>
```
Maybe one of the most used, along with the default Julian mode, is the *package manager mode*.
When in this mode, you can perform tasks such as adding and updating packages.
It is also useful to manage project environments and controlling package versions.
To switch to the package manager, type a closing square bracket `]`.
``` julia
julia> ]
(@v1.5) pkg>
```
If you see the abbreviation `pkg` in the prompt, it means you accessed the package manager successfully.
To add a new package, you just need to write
``` julia
(@v1.5) pkg> add NewPackage
```
It's as simple as that!
All Julia commands are case-sensitive, so be sure to write the package name ---and in the future, all functions and variables too--- correctly.
## First steps into the Julia world
As with every programming language, it is useful to know some of the basic operations and functionalities.
We encourage you to open a Julia session REPL and start experimenting with all the code written in this section to start developing an intuition about the things that make Julia code special.
The common arithmetical and logical operations are all available in Julia:
- $+$: Add operator
- $-$: Subtract operator
- $*$: Product operator
- $/$: Division operator
Julia code is intended to be very similar to math.
So instead of doing something like:
``` julia
julia> 2*x
```
you can simply do:
``` julia
julia> 2x
```
For this same purpose, Julia has a great variety of unicode characters, which enable us to write things like Greek letters and subscripts/superscripts, making our code much more beautiful and easy to read in a mathematical form.
In general, unicode characters are activated by using `,` followed by the name of the character and then pressing the **Tab** key.
For example:
``` julia
julia> \beta # and next we press tab
julia> β
```
You can add subscripts by using '\_' and superscripts by using '\^', followed by the character(s) you want to modify and then pressing **Tab**.
For example:
``` julia
julia> L\_0 # and next we press Tab
julia> L₀
```
Unicode characters behave just like any other character in your keyboard.
You can use them inside strings or as variable names and assign them a value.
``` julia
julia> β = 5
5
julia> "The ⌀ of the circle is $β "
"The ⌀ of the circle is 5 "
```
Some popular Greek letters already have their values assigned.
``` julia
julia> \pi # and next we press Tab
julia> π
π = 3.1415926535897...
julia> \euler # and next we press Tab
julia> ℯ
ℯ = 2.7182818284590...
```
You can see all the unicode characters supported by Julia [here](https://docs.julialang.org/en/v1/manual/unicode-input/)
The basic number types are also supported in Julia.
We can explore this with the function `typeof()`, which outputs the type of its argument, as it is represented in Julia.
Let's see some examples:
``` julia
julia>typeof(2)
Int64
julia>typeof(2.0)
Float64
julia>typeof(3 + 5im)
Complex{Int64}
```
These were examples of integers, floats and complex numbers.
All data types in Julia start with a capital letter.
Notice that in Julia, the division of integers always results in a floating point number:
``` julia
julia> 10/2
5.0
```
You can convert from one data type to another like this:
``` julia
julia> Int64(5.0)
5
```
Following with the basics, let's take a look at how logical or Boolean operations are done in Julia.
Booleans are written as `true` and `false`.
The most important Boolean operators for our purposes are the following:
``` julia
!: \"not\" logical operator
&: \"and\" logical operator
|: \"or\" logical operator
==: \"equal to\" logical operator
!=: \"different to\" logical operator
>: \"greater than\" operator
<: \"less than\" operator
>=: \"greater or equal to\" operator
<=: \"less or equal to\" operator
```
Some examples of these,
``` julia
julia> true & true
true
julia> true & false
false
julia> true & !false
true
julia> 3 == 3
true
julia> 4 == 5
false
julia> 7 <= 7
true
```
Comparisons can be chained to have a simpler mathematical readability, like so:
``` julia
julia> 10 <= 11 < 24
true
julia> 5 > 2 < 1
false
```
# Strings
The next important topic in this Julia programming basics, is the strings data type and basic manipulations.
As in many other programming languages, strings are created between quotation marks `"`:
``` julia
julia> "This is a Julia string!"
"This is a Julia string!"
```
You can access a specific character in a string by writing the index of that character in the string between brackets right next to the string name. Likewise, you can access a substring by writing the first and the last index of the substring you want, separated by a colon, all this between brackets.
This is called *slicing*, and it will be very useful later when working with arrays. Here's an example:
``` julia
julia> "This is a Julia string!"[1] # this will output the first character of the string and other related information.
'T': ASCII/Unicode U+0054 (category Lu: Letter, uppercase)
julia> "This is a Julia string!"[1:4] # this will output the substring obtained of going from the first index to the fourth
"This"
```
A really useful tool when using strings is *string interpolation*, which is a way to evaluate an expression inside a string and print it. This is usually done by writing a dollar symbol `$` followed by the expression between parentheses.
For example:
``` julia
julia> "The product between 4 and 5 is $(4 * 5)"
"The product between 4 and 5 is 20"
```
This wouldn’t be a programming introduction if we didn’t include printing `Hello World!`. Printing in Julia is very easy. There are two functions for printing: `print()` will print the string without creating a new line, while `println()` will create a new line each time it is called. To show this difference, we will execute two print actions in one console line. To execute multiple actions in one line you just need to separate them with a `;`.
``` julia
julia> print("Hello"); print(" world!")
Hello world!
julia> println("Hello"); println("world!")
Hello
world!
```
## Data collections
It's time now to start introducing collections of data in Julia. We will start with **arrays**. As in many other programming languages, arrays in Julia can be created by listing objects between square brackets separated by commas. For example:
``` julia
julia> int_array = [1, 2, 3]
3-element Array{Int64,1}:
1
2
3
julia> str_array = ["Hello", "World"]
2-element Array{String,1}:
"Hello"
"World"
```
As you can see, arrays can store any type of data. If all the data in the array is of the same type, it will be compiled as an array of that data type. You can see that in the pattern that the Julia REPL prints out:
Firstly, it displays how many elements there are in the collection. In our case, 3 elements in `int_array` and 2 elements in `str_array`. When dealing with higher dimensionality arrays, the shape will be informed.
Secondly, the output shows the type and dimensionality of the array. The first element inside the curly brackets specifies the type of every member of the array, if they are all the same. If this is not the case, type ‘Any’ will appear, meaning that the collection of objects inside the array is not homogeneous in its type.
Compilation of Julia code tends to be faster when arrays have a defined type, so it is recommended to use homogeneous types when possible.
The second element inside the curly braces tells us how many dimensions there are in the array. Our example shows two one-dimensional arrays, hence a 1 is printed. Later, we will introduce matrices and, naturally, a 2 will appear in this place instead a 1.
<!-- -->
3) Finally, the content of the array is printed in a columnar way.
When building Julia, the convention has been set so that it has column-major ordering.
So you can think of standard one-dimensional arrays as column vectors, and in fact this will be mandatory when doing calculations between vectors or matrices.
A row vector (or a $1$x$n$ array), in the other hand, can be defined using whitespaces instead of commas,
``` julia
julia> [3 2 1 4]
1×4 Array{Int64,2}:
3 2 1 4
```
In contrast to other languages, where matrices are expressed as 'arrays of arrays', in Julia we write the numbers in succession separated by whitespaces, and we use a semicolon to indicate the end of the row, just like we saw in the example of a row vector.
For example,
``` julia
julia> [1 1 2; 4 1 0; 3 3 1]
3×3 Array{Int64,2}:
1 1 2
4 1 0
3 3 1
```
The length and shape of arrays can be obtained using the `length()` and `size()` functions respectively.
``` julia
julia> length([1, -1, 2, 0])
4
julia> size([1 0; 0 1])
(2, 2)
julia> size([1 0; 0 1], 2) # you can also specify the dimension where you want the shape to be computed
2
```
An interesting feature in Julia is *broadcasting*.
Suppose you wanted to add the number 2 to every element of an array.
You might be tempted to do
``` julia
julia> 2 + [1, 1, 1]
ERROR: MethodError: no method matching +(::Array{Int64,1}, ::Int64)
For element-wise addition, use broadcasting with dot syntax: array .+ scalar
Closest candidates are:
+(::Any, ::Any, ::Any, ::Any...) at operators.jl:538
+(::Complex{Bool}, ::Real) at complex.jl:301
+(::Missing, ::Number) at missing.jl:115
...
Stacktrace:
[1] top-level scope at REPL[18]:1
```
As you can see, the expression returns an error.
If you watch this error message closely, it gives you a good suggestion about what to do.
If we now try writing a period '.' right before the plus sign, we get
``` julia
julia> 2 .+ [1, 1, 1]
3-element Array{Int64,1}:
3
3
3
```
What we did was broadcast the sum operator '+' over the entire array.
This is done by adding a period before the operator we want to broadcast.
In this way we can write complicated expressions in a much cleaner, simpler and compact way.
This can be done with any of the operators we have already seen,
``` julia
julia> 3 .> [2, 4, 5] # this will output a bit array with 0s as false and 1s as true
3-element BitArray{1}:
1
0
0
```
If we do a broadcasting operation between two arrays with the same shape, whatever operation you are broadcasting will be done element-wise.
For example,
``` julia
julia> [7, 2, 1] .* [10, 4, 8]
3-element Array{Int64,1}:
70
8
8
julia> [10 2 35] ./ [5 2 7]
1×3 Array{Float64,2}:
2.0 1.0 5.0
julia> [5 2; 1 4] .- [2 1; 2 3]
2×2 Array{Int64,2}:
3 1
-1 1
```
If we use the broadcast operator between a column vector and a row vector instead, the broadcast is done for every row of the first vector and every column of the second vector, returning a matrix,
``` julia
julia> [1, 0, 1] .+ [3 1 4]
3×3 Array{Int64,2}:
4 2 5
3 1 4
4 2 5
```
Another useful tool when dealing with arrays are concatenations.
Given two arrays, you can concatenate them horizontally or vertically.
This is best seen in an example
``` julia
julia> vcat([1, 2, 3], [4, 5, 6]) # this concatenates the two arrays vertically, giving us a new long array
6-element Array{Int64,1}:
1
2
3
4
5
6
julia> hcat([1, 2, 3], [4, 5, 6]) # this stacks the two arrays one next to the other, returning a matrix
3×2 Array{Int64,2}:
1 4
2 5
3 6
```
With some of these basic tools to start getting your hands dirty in Julia, we can get going into some other functionalities like loops and function definitions.
We have already seen a for loop.
For loops are started with a `for` keyword, followed by the name of the iterator and the range of iterations we want our loop to cover.
Below this `for` statement we write what we want to be performed in each loop and we finish with an `end` keyword statement.
Let's return to the example we made earlier,
``` julia
julia> for i in 1:100
println(i)
end
```
The syntax `1:100` is the Julian way to define a range of all the numbers from 1 to 100, with a step of 1.
We could have set `1:2:100` if we wanted to jump between numbers with a step size of 2.
We can also iterate over collections of data, like arrays.
Consider the next block of code where we define an array and then iterate over it,
``` julia
julia> arr = [1, 3, 2, 2]
julia> for element in arr
println(element)
end
1
3
2
2
```
As you can see, the loop was done for each element of the array.
It might be convenient sometimes to iterate over a collection.
Conditional statements in Julia are very similar to most languages.
Essentially, a conditional statement starts with the `if` keyword, followed by the condition that must be evaluated to true or false, and then the body of the action to apply if the condition evaluates to true.
Then, optional `elseif` keywords may be used to check for additional conditions, and an optional `else` keyword at the end to execute a piece of code if all of the conditions above evaluate to false.
Finally, as usual in Julia, the conditional statement block finishes with an `end` keyword.
``` julia
julia> x = 3
julia> if x > 2
println("x is greater than 2")
elseif 1 < x < 2
println("x is in between 1 and 2")
else
println("x is less than 1")
end
x is greater than 2
```
Now consider the code block below, where we define a function to calculate a certain number of steps of the Fibonacci sequence,
``` julia
julia> n1 = 0
julia> n2 = 1
julia> m = 10
julia> function fibonacci(n1, n2, m)
fib = Array{Int64,1}(undef, m)
fib[1] = n1
fib[2] = n2
for i in 3:m
fib[i] = fib[i-1] + fib[i-2]
end
return fib
end
fibonacci (generic function with 1 method)
```
Here, we first made some variable assignments, variables $n1$, $n2$ and $m$ were assigned values 0, 1 and 10.
Variables are assigned simply by writing the name of the variable followed by an 'equal' sign, and followed finally by the value you want to store in that variable.
There is no need to declare the data type of the value you are going to store.
Then, we defined the function body for the fibonacci series computation.
Function blocks start with the `function` keyword, followed by the name of the function and the arguments between brackets, all separated by commas.
In this function, the arguments will be the first two numbers of the sequence and the total length of the fibonacci sequence.
Inside the body of the function, everything is indented.
Although this is not strictly necessary for the code to run, it is a good practice to have from the bbeginning, since we want our code to be readable.
At first, we initialize an array of integers of one dimension and length $m$, by allocating memory.
This way of initializing an array is not strictly necessary, you could have initialized an empty array and start filling it later in the code.
But it is definitely a good practice to learn for a situation like this, where we know how long our array is going to be and optimizing code performance in Julia.
The memory allocation for this array is done by initializing the array as we have already seen earlier.
`julia {Int64,1}`just means we want a one-dimensional array of integers.
The new part is the one between parenthesis, `julia (undef, m)`.
This just means we are initializing the array with undefined values --which will be later modified by us--, and that there will be a number $m$ of them.
Don't worry too much if you don't understand all this right now, though.
We then proceed to assign the two first elements of the sequence and calculate the rest with a for loop.
Finally, an `end` keyword is necessary at the end of the for loop and another one to end the definition of the function.
Evaluating our function in the variables $n1$, $n2$ and $m$ already defined, gives us:
``` julia
julia> fibonacci(n1, n2, m)
10-element Array{Int64,1}:
0
1
1
2
3
5
8
13
21
34
```
Remember the broadcasting operation, that dot we added to the bbeginning of another operator to apply it on an entire collection of objects?
It turns out that this can be done with functions as well!
Consider the following function,
``` julia
julia> function isPositive(x)
if x >= 0
return true
elseif x < 0
return false
end
end
isPositive (generic function with 1 method)
julia> isPositive(3)
true
julia> isPositive.([-1, 1, 3, -5])
4-element BitArray{1}:
0
1
1
0
```
As you can see, we broadcasted the `isPositive()` function over every element of an array by adding a dot next to the end of the function name.
It is as easy as that!
Once you start using this feature, you will notice how useful it is.
One thing concerning functions in Julia is the 'bang'(!) convention.
Functions that have a name ending with an exclamation mark (or bang), are functions that change their inputs in-place.
Consider the example of the pop!
function from the Julia Base package.
Watch closely what happens to the array over which we apply the function.
``` julia
julia> arr = [1, 2, 3]
julia> n = pop!(arr)
3
julia> arr
2-element Array{Int64,1}:
1
2
julia> n
3
```
Did you understand what happened?
First, we defined an array.
Then, we applied the `pop!()` function, which returns the last element of the array and assigns it to n.
But notice that when we call our arr variable to see what it is storing, now the number 3 is gone.
This is what functions with a bang do and what we mean with modifying *in-place*.
Try to follow this convention whenever you define a function that will modify other objects in-place!
Sometimes, you will be in a situation where you may need to use some function, but you don't really need to give it name and store it, because it's not very relevant to your code.
For these kinds of situations, an *anonymous* or *lambda* function may be what you need.
Typically, anonymous functions will be used as arguments to higher-order functions.
This is just a fancy name to functions that accept other functions as arguments, that is what makes them of higher-order.
We can create an anonymous function and apply it to each element of a collection by using the `map()` keyword.
You can think of the `map()` function as a way to broadcast any function over a collection.
Anonymous functions are created using the arrow `->` syntax.
At the left-hand side of the arrow, you must specify what the arguments of the function will be and their name.
At the right side of the arrow, you write the recipe of the things to do with these arguments.
Let's use an anonymous function to define a not-anonymous function, just to illustrate the point.
``` julia
julia> f = (x,y) -> x + y
#1 (generic function with 1 method)
julia> f(2,3)
5
```
You can think about what we did as if $f$ were a variable that is storing some function.
Then, when calling $f(2,3)$ Julia understands we want to evaluate the function it is storing with the values 2 and 3.
Let's see now how the higher-order function `map()` uses anonymous functions.
We will broadcast our anonymous function x\^2 + 5 over all the elements of an array.
``` julia
julia> map(x -> x^2 + 5, [2, 4, 6, 3, 3])
5-element Array{Int64,1}:
9
21
41
14
14
```
The first argument of the map function is another function.
You can define new functions and then use them inside map, but with the help of anonymous functions you can simply create a throw-away function inside map's arguments.
This function we pass as an argument, is then applied to every member of the array we input as the second argument.
"""
Now let's introduce another data collection: Dictionaries.
A dictionary is a collection of key-value pairs.
You can think of them as arrays, but instead of being indexed by a sequence of numbers they are indexed by keys, each one linked to a value.
To create a dictionary we use the function `Dict()` with the key-value pairs as arguments.
`Dict(key1 => value1, key2 => value2)`.
``` julia
julia> Dict("A" => 1, "B" => 2)
Dict{String,Int64} with 2 entries:
"B" => 2
"A" => 1
```
So we created our first dictionary.
Let's review what the Julia REPL prints out:
`Dict{String,Int64}` tells us the dictionary data type that Julia automatically assigns to the pair (key,value).
In this example, the keys will be strings and the values, integers.
Finally, it prints all the (key =\> value) elements of the dictionary.
In Julia, the keys and values of a dictionary can be of any type.
``` julia
julia> Dict("x" => 1.4, "y" => 5.3)
Dict{String,Float64} with 2 entries:
"x" => 1.4
"y" => 5.3
julia> Dict(1 => 10.0, 2 => 100.0)
Dict{Int64,Float64} with 2 entries:
2 => 100.0
1 => 10.0
```
Letting Julia automatically assign the data type can cause bugs or errors when adding new elements.
Thus, it is a good practice to assign the data type of the dictionary ourselves.
To do it, we just need to indicate it in between brackets { } after the `Dict` keyword:
`Dict{key type, value type}(key1 => value1, key2 => value2)`
``` julia
julia> Dict{Int64,String}(1 => "Hello", 2 => "Wormd")
Dict{Int64,String} with 2 entries:
2 => "Wormd"
1 => "Hello"
```
Now let's see the dictionary's basic functions.
First, we will create a dictionary called "languages" that contains the names of programming languages as keys and their release year as values.
``` julia
julia> languages = Dict{String,Int64}("Julia" => 2012, "Java" => 1995, "Python" => 1990)
Dict{String,Int64} with 3 entries:
"Julia" => 2012
"Python" => 1990
"Java" => 1995
```
To grab a key's value we need to indicate it in between brackets [].
``` julia
julia> languages["Julia"]
2012
```
We can easily add an element to the dictionary.
``` julia
julia> languages["C++"] = 1980
1980
julia> languages
Dict{String,Int64} with 4 entries:
"Julia" => 2012
"Python" => 1990
"Java" => 1995
"C++" => 1980
```
We do something similar to modify a key's value:
``` julia
julia> languages["Python"] = 1991
1991
julia> languages
Dict{String,Int64} with 3 entries:
"Julia" => 2012
"Python" => 1991
"C++" => 1980
```
Notice that the ways of adding and modifying a value are identical.
That is because keys of a dictionary can never be repeated or modified.
Since each key is unique, assigning a new value for a key overrides the previous one.
To delete an element we use the `delete!` method.
``` julia
julia> delete!(languages,"Java")
Dict{String,Int64} with 3 entries:
"Julia" => 2012
"Python" => 1990
"C++" => 1980
```
To finish, let's see how to iterate over a dictionary.
``` julia
julia> for(key,value) in languages
println("$key was released in $value")
end
Julia was released in 2012
Python was released in 1991
C++ was released in 1980
```
"""
Now that we have discussed the most important details of Julia's syntax, let's focus our attention on some of the packages in Julia's ecosystem."
## Julia's Ecosystem: Basic plotting and manipulation of DataFrames
Julia's ecosystem is composed by a variety of libraries which focus on technical domains such as Data Science (DataFrames.jl, CSV.jl, JSON.jl), Machine Learning (MLJ.jl, Flux.jl, Turing.jl) and Scientific Computing (DifferentialEquations.jl), as well as more general purpose programming (HTTP.jl, Dash.jl).
We will now consider one of the libraries that will be accompanying us throughout the book to make visualizations, Plots.jl.
To install the Plots.jl library we need to go to the Julia package manager mode as we saw earlier.
``` julia
julia> ]
(@v1.5) pkg>
(@v1.5) pkg> add Plots.jl
```
There are some other great packages like Gadfly.jl and VegaLite.jl, but Plots will be the best to get you started.
Let's import the library with the 'using' keyword and start making some plots.
We will plot the first ten numbers of the fibonacci sequence using the `scatter()` function.
```{julia chap_2_plot_1}
begin
using Plots
sequence = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
scatter(sequence, xlabel="n", ylabel="Fibonacci(n)", color="purple", label=false, size=(450, 300))
end
```
### Plotting with Plots.jl
Let's make a plot of the 10 first numbers in the fibonacci sequence.
For this, we can make use of the `scatter()` function:
The only really important argument of the scatter function in the example above is *sequence*, the first one, which tells the function what is the data we want to plot.
The other arguments are just details to make the visualization prettier.
Here we have used the scatter function because we want a discrete plot for our sequence.
In case we wanted a continuous one, we could have used `plot()`.
Let's see this applied to our fibonacci sequence:
```{julia chap_2_plot_2}
plot(sequence, xlabel="x", ylabel="Fibonacci", linewidth=3, label=false, color="green", size=(450, 300))
```
```{julia chap_2_plot_3}
begin
plot(sequence, xlabel="x", ylabel="Fibonacci", linewidth=3, label=false, color="green", size=(450, 300))
scatter!(sequence, label=false, color="purple", size=(450, 300))
end
```
In the example above, a plot is created when we call the `plot()` function.
What the `scatter!()` call then does, is to modify the global state of the plot in-place.
If not done this way, both plots wouldn't be sketched together.
A nice feature that the Plots.jl package offers, is the fact of changing plotting backends.
There exist various plotting packages in Julia, and each one has its own special features and aesthetic flavour.
The Plots.jl package integrates these plotting libraries and acts as an interface to communicate with them in an easy way.
By default, the `GR` backend is the one used.
In fact, this was the plotting engine that generated the plots we have already done.
The most used and maintained plotting backends up to date, are the already mentioned `GR`, `Plotly/PlotlyJS`, `PyPlot`, `UnicodePlots` and `InspectDR`.
The backend you choose will depend on the particular situation you are facing.
For a detailed explanation on backends, we recommend you visit the Julia Plots [documentation](https://docs.juliaplots.org/latest/backends/).
Through the book we will be focusing on the `GR`backend, but as a demonstration of the ease of changing from one backend to another, consider the code below.
The only thing added to the code for plotting that we have already used, is the `pyplot()` call to change the backend.
If you have already coded in Python, you will feel familiar with this plotting backend.
```{julia chap_2_plot_4}
begin
pyplot()
plot(sequence, xlabel="x", ylabel="Fibonacci", linewidth=3, label=false, color="green", size=(450, 300))
scatter!(sequence, label=false, color="purple", size=(450, 300))
end
```
Analogously, we can use the `plotlyjs` backend, which is specially suited for interactivity.
```{julia chap_2_plot_5}
begin
plotlyjs()
plot(sequence, xlabel="x", ylabel="Fibonacci", linewidth=3, label=false, color="green", size=(450, 300))
scatter!(sequence, label=false, color="purple", size=(450, 300))
end
```
Each of these backends has its own scope, so there may be plots that one backend can do that other can't.
For example, 3D plots are not supported for all backends.
The details are well explained in the Julia documentation.
"
### Introducing DataFrames.jl
When dealing with any type of data in large quantities, it is essential to have a framework to organize and manipulate it in an efficient way.
If you have previously used Python, you probably came across the Pandas package and dataframes.
In Julia, the DataFrames.jl package follows the same idea.
Dataframes are objects with the purpose of structuring tabular data in a smart way.
You can think of them as a table, a matrix or a spreadsheet.
In the dataframe convention, each row is an observation of a vector-type variable, and each column is the complete set of values of a given variable, across all observations.
In other words, for a single row, each column represents a realization of a variable.
Let's see how to construct and load data into a dataframe.
There are many ways you can accomplish this.
Consider we had some data in a matrix and we want to organize it in a dataframe.
First, we are going to create some 'fake data' and loading that in a Julia DataFrame,
```{julia}
begin
using DataFrames, Random
Random.seed!(123)
fake_data = rand(5, 5) # this creates a 5x5 matrix with random values between 0
# and 1 in each matrix element.
df = DataFrame(fake_data)
end
```
As you can see, the column names were initialized with values $x1, x2, ...$.
We probably would want to rename them with more meaningful names.
To do this, we have the `rename!()` function.
Remember that this function has a bang, so it changes the dataframe in-place, be careful!
Below we rename the columns of our dataframe,
```{julia}
rename!(df, ["one", "two", "three", "four", "five"])
```
The first argument of the function is the dataframe we want to modify, and the second an array of strings, each one corresponding to the name of each column.
Another way to create a dataframe is by passing a list of variables that store arrays or any collection of data.
For example, "
```{julia}
DataFrame(column1=1:10, column2=2:2:20, column3=3:3:30)
```
As you can see, the name of each array is automatically assigned to the columns of the dataframe.
Furthermore, you can initialize an empty dataframe and start adding data later if you want,
```{julia}
begin
df_ = DataFrame(Names = String[],
Countries = String[],
Ages = Int64[])
df_ = vcat(df_, DataFrame(Names="Juan", Countries="Argentina", Ages=28))
end
```
We have used the `vcat()`function seen earlier to append new data to the dataframe.
You can also add a new column very easily,
```{julia}
begin
df_.height = [1.72]
df_
end
```
You can access data in a dataframe in various ways.
One way is by the column name.
For example,
```{julia}
df.three
```
```{julia}
df."three"
```
But you can also access dataframe data as if it were a matrix.
You can treat columns either as their column number or by their name,
```{julia}
df[1,:]
```
```{julia}
df[1:2, "one"]
```
```{julia}
df[3:5, ["two", "four", "five"]]
```
The column names can be accessed by the `names()` function,
```{julia}
names(df)
```
Another useful tool for having a quick overview of the dataframe, typically when in an exploratory process, is the `describe()` function.
It outputs some information about each column, as you can see below,
```{julia}
describe(df)
```
To select data following certain conditions, you can use the `filter()` function.
Given some condition, this function will throw away all the rows that don't evaluate the condition to true.
This condition is expressed as an anonymous function and it is written in the first argument.
In the second argument of the function, the dataframe where to apply the filtering is indicated.
In the example below, all the rows that have their 'one' column value greater than $0.5$ are filtered.
```{julia}
filter(col -> col[1] < 0.5, df)
```
A very usual application of dataframes is when dealing with CSV data.
In case you are new to the term, CSV stands for Comma Separated Values.
As the name indicates, these are files where each line is a data record, composed by values separated by commas.
In essence, a way to store tabular data.
A lot of the datasets around the internet are available in this format, and naturally, the DataFrame.jl package is well integrated with it.
As an example, consider the popular Iris flower dataset.
This dataset consists of samples of three different species of plants.
The samples correspond to four measured features of the flowers: length and width of the sepals and petals.
To work with CSV files, the package CSV.jl is your best choice in Julia.