Data Structures
Arrays
Julia has a nice and flexible array interface. Arrays can have an arbitrary number of dimensions. Let's define a one-dimetional array (i.e. a vector):
vector = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]6-element Array{Float64,1}:
1.0
2.0
3.0
4.0
5.0
6.0The first index of an array in Julia is 1:
vector[1]1.0You can use end to access the last element of an array:
vector[end]6.0Use ranges (start:end) to get a slice of the array:
vector[2:4]3-element Array{Float64,1}:
2.0
3.0
4.0Ranges in Julia are iterable objects:
indexes = 2:42:4for i in indexes
@show i
endi = 2
i = 3
i = 4Julia arrays, like the strings and ranges, are also iterables:
for element in vector
println(element)
end1.0
2.0
3.0
4.0
5.0
6.0Exercise 1
Write a function to return the distance between two three dimensional points, i.e. two vector of three elements. You should use a for loop over a range and index the vectors.
# function distance(a, b...using Test
A = [1.25, 2.0, 3.6]
B = [-3.5, 4.7, 5.0]
@test distance(A, B) ≈ hypot((A - B)...)... "splats" the values contained in an iterable collection into a function call as individual arguments, e.g:
vector = [1, 2, 3]
hypot(vector...) # hypot(1, 2, 3)3.7416573867739413You can use push! to add one element to the end of an array
vector = [1,2,3]3-element Array{Int64,1}:
1
2
3push!(vector, 4)4-element Array{Int64,1}:
1
2
3
4There are other useful dequeues functions defined in Julia, e.g. pop!, append!.
In Julia, by convention, all the functions that modify their arguments should end with a bang or exclamation mark, !, see the style guide.
Vectorized operations
You can use a dot, ., to indicate that a function, e.g. log.(x), or operator, e.g. x .^ y, should be applied element by element, see dot syntax:
a = [1, -2, -3]
b = [-2, -4, 0]
a .* b3-element Array{Int64,1}:
-2
8
0This notation allows vectorizing any function, even element-wise functions defined by the user:
f(x) = 3.45x + 4.76
f.(sin.(a))3-element Array{Float64,1}:
7.663074897587243
1.6229238774513979
4.273135972193458 Multiple vectorized operations get fused in a single loop without temporal arrays.
Comprehensions
You can use comprehensions to create arrays and perform some operation
[ 2x for x in 1:10 ]10-element Array{Int64,1}:
2
4
6
8
10
12
14
16
18
20result = [ 2x for x in 1:10 if x % 2 == 0 ]5-element Array{Int64,1}:
4
8
12
16
20Exercise 3
Write the equivalent of the previous expression using a for loop and push!.
# result = []
# for ...Matrices
Matrices, bidimentional arrays, can be defined with the following notation:
matrix = [ 1.0 4.0 7.0
2.0 5.0 8.0
3.0 6.0 9.0 ]3×3 Array{Float64,2}:
1.0 4.0 7.0
2.0 5.0 8.0
3.0 6.0 9.0You can use linear indexing (Julia arrays are stored in column major order) to access an element
matrix[2]2.0Or using one index by dimension, i.e. matrix[row_index, col_index] :
matrix[2, 1]2.0You can also use ranges and end. The colon, :, means that all the indices from that dimension should be used:
matrix[2:end, :]2×3 Array{Float64,2}:
2.0 5.0 8.0
3.0 6.0 9.0Comprehensions
You can also use comprehensions to create matrices. In fact, you can create array of any desired dimension:
[ x + y for x in 1:5, y in 1:10 ]5×10 Array{Int64,2}:
2 3 4 5 6 7 8 9 10 11
3 4 5 6 7 8 9 10 11 12
4 5 6 7 8 9 10 11 12 13
5 6 7 8 9 10 11 12 13 14
6 7 8 9 10 11 12 13 14 15Dictionaries and pairs
Dictionaries (hash tables) stores key => values pairs:
dictionary = Dict('A' => 'T', 'C' => 'G', 'T' => 'A', 'G' => 'C')Dict{Char,Char} with 4 entries:
'A' => 'T'
'G' => 'C'
'T' => 'A'
'C' => 'G'You can get a value by indexing with the key:
dictionary['A'] # get(dictionary, 'A')'T': ASCII/Unicode U+0054 (category Lu: Letter, uppercase)If the key is not present in the dictionary, an error is raised:
dictionary['N'] # get(dictionary, 'N')The function get allows to specify a default value that is returned if the key is absent in the dictionary:
get(dictionary, 'N', '-')'-': ASCII/Unicode U+002d (category Pd: Punctuation, dash)A nice thing about hash tables (dictionary keys, sets) is that test membership is $O(1)$ while it is $O(N)$ in lists/vectors/arrays:
'N' in keys(dictionary)falseA dictionary gives pairs when it is iterated:
for pair in dictionary
println("pair: ", pair) # each pair is key => value
println("key: ", pair.first) # pair.first == pair[1]
println("value: ", pair.second) # pair.second == pair[2]
endpair: 'A' => 'T'
key: A
value: T
pair: 'G' => 'C'
key: G
value: C
pair: 'T' => 'A'
key: T
value: A
pair: 'C' => 'G'
key: C
value: GTuples
Tuples are immutable collections, while arrays are mutable:
point = [1.0, 2.0, 3.0] # vector
point[1] = 10.0
point3-element Array{Float64,1}:
10.0
2.0
3.0point = (1.0, 2.0, 3.0) # tuple(1.0, 2.0, 3.0)point[1] = 10.0You can index a tuple, like a vector, to get the stored element(s):
point[1:2](1.0, 2.0)Tuples, vectors, pairs and other iterables can be easily unpacked using an assignation:
x, y, z = point
y2.0You can use this unpacking when iterating a dictionary:
for (key, value) in dictionary
println("key: ", key, " value: ", value)
endkey: A value: T
key: G value: C
key: T value: A
key: C value: GExercise 3
Write a function to return the reverse complement of a DNA sequence (string) using a dictionary, the join function and the Base.Iterators.reverse iterator. It should use a 'N' as complementary of any base different from 'A', 'C', 'T' or 'G':
# function reverse_complement(...using Test
@test reverse_complement("ACTGGTCCCNT") == "ANGGGACCAGT"Named tuples
They can be an easy and fast way to store data:
point = (x=1.0, y=2.0, z=3.0) # named tuple(x = 1.0, y = 2.0, z = 3.0)You can use namedtuple.name to access a particular element:
point.y2.0Sets
You can use Set to represent a set of unique elements:
set = Set([1, 2, 2, 3, 3, 3, 4, 4, 4, 4])Set([4, 2, 3, 1])Test membership is $O(1)$
4 in settrueYou can get the intersection of two sets using intersect or ∩ (\cap<TAB>)
set_a = Set([1, 2, 3])
set_b = Set([2, 3, 4])
set_a ∩ set_b # intersect(set, set_b)Set([2, 3])And the unioin of to sets using union or ∪ (\cup<TAB>)
set_a ∪ set_b # union(set, set_b)Set([4, 2, 3, 1])The symmetric difference, i.e. disjunctive union, of two sets
symdiff(set_a, set_b)Set([4, 1])This page was generated using Literate.jl.