Data Structures
Arrays
Julia has a nice and flexible array interface. Arrays can have an arbitrary number of dimensions. Let's define a one-dimetional array (i.e. a vector):
vector = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
6-element Array{Float64,1}:
1.0
2.0
3.0
4.0
5.0
6.0
The first index of an array in Julia is 1
:
vector[1]
1.0
You can use end
to access the last element of an array:
vector[end]
6.0
Use ranges (start:end
) to get a slice of the array:
vector[2:4]
3-element Array{Float64,1}:
2.0
3.0
4.0
Ranges in Julia are iterable objects:
indexes = 2:4
2:4
for i in indexes
@show i
end
i = 2
i = 3
i = 4
Julia arrays, like the strings and ranges, are also iterables:
for element in vector
println(element)
end
1.0
2.0
3.0
4.0
5.0
6.0
Exercise 1
Write a function to return the distance between two three dimensional points, i.e. two vector of three elements. You should use a for
loop over a range and index the vectors.
# function distance(a, b...
using Test
A = [1.25, 2.0, 3.6]
B = [-3.5, 4.7, 5.0]
@test distance(A, B) ≈ hypot((A - B)...)
...
"splats" the values contained in an iterable collection into a function call as individual arguments, e.g:
vector = [1, 2, 3]
hypot(vector...) # hypot(1, 2, 3)
3.7416573867739413
You can use push!
to add one element to the end of an array
vector = [1,2,3]
3-element Array{Int64,1}:
1
2
3
push!(vector, 4)
4-element Array{Int64,1}:
1
2
3
4
There are other useful dequeues functions defined in Julia, e.g. pop!
, append!
.
In Julia, by convention, all the functions that modify their arguments should end with a bang or exclamation mark, !
, see the style guide.
Vectorized operations
You can use a dot, .
, to indicate that a function, e.g. log.(x)
, or operator, e.g. x .^ y
, should be applied element by element, see dot syntax:
a = [1, -2, -3]
b = [-2, -4, 0]
a .* b
3-element Array{Int64,1}:
-2
8
0
This notation allows vectorizing any function, even element-wise functions defined by the user:
f(x) = 3.45x + 4.76
f.(sin.(a))
3-element Array{Float64,1}:
7.663074897587243
1.6229238774513979
4.273135972193458
Multiple vectorized operations get fused in a single loop without temporal arrays.
Comprehensions
You can use comprehensions to create arrays and perform some operation
[ 2x for x in 1:10 ]
10-element Array{Int64,1}:
2
4
6
8
10
12
14
16
18
20
result = [ 2x for x in 1:10 if x % 2 == 0 ]
5-element Array{Int64,1}:
4
8
12
16
20
Exercise 3
Write the equivalent of the previous expression using a for
loop and push!
.
# result = []
# for ...
Matrices
Matrices, bidimentional arrays, can be defined with the following notation:
matrix = [ 1.0 4.0 7.0
2.0 5.0 8.0
3.0 6.0 9.0 ]
3×3 Array{Float64,2}:
1.0 4.0 7.0
2.0 5.0 8.0
3.0 6.0 9.0
You can use linear indexing (Julia arrays are stored in column major order) to access an element
matrix[2]
2.0
Or using one index by dimension, i.e. matrix[row_index, col_index]
:
matrix[2, 1]
2.0
You can also use ranges and end
. The colon, :
, means that all the indices from that dimension should be used:
matrix[2:end, :]
2×3 Array{Float64,2}:
2.0 5.0 8.0
3.0 6.0 9.0
Comprehensions
You can also use comprehensions to create matrices. In fact, you can create array of any desired dimension:
[ x + y for x in 1:5, y in 1:10 ]
5×10 Array{Int64,2}:
2 3 4 5 6 7 8 9 10 11
3 4 5 6 7 8 9 10 11 12
4 5 6 7 8 9 10 11 12 13
5 6 7 8 9 10 11 12 13 14
6 7 8 9 10 11 12 13 14 15
Dictionaries and pairs
Dictionaries (hash tables) stores key => values pairs:
dictionary = Dict('A' => 'T', 'C' => 'G', 'T' => 'A', 'G' => 'C')
Dict{Char,Char} with 4 entries:
'A' => 'T'
'G' => 'C'
'T' => 'A'
'C' => 'G'
You can get a value by indexing with the key:
dictionary['A'] # get(dictionary, 'A')
'T': ASCII/Unicode U+0054 (category Lu: Letter, uppercase)
If the key is not present in the dictionary, an error is raised:
dictionary['N'] # get(dictionary, 'N')
The function get
allows to specify a default value that is returned if the key is absent in the dictionary:
get(dictionary, 'N', '-')
'-': ASCII/Unicode U+002d (category Pd: Punctuation, dash)
A nice thing about hash tables (dictionary keys, sets) is that test membership is $O(1)$ while it is $O(N)$ in lists/vectors/arrays:
'N' in keys(dictionary)
false
A dictionary gives pairs when it is iterated:
for pair in dictionary
println("pair: ", pair) # each pair is key => value
println("key: ", pair.first) # pair.first == pair[1]
println("value: ", pair.second) # pair.second == pair[2]
end
pair: 'A' => 'T'
key: A
value: T
pair: 'G' => 'C'
key: G
value: C
pair: 'T' => 'A'
key: T
value: A
pair: 'C' => 'G'
key: C
value: G
Tuples
Tuples are immutable collections, while arrays are mutable:
point = [1.0, 2.0, 3.0] # vector
point[1] = 10.0
point
3-element Array{Float64,1}:
10.0
2.0
3.0
point = (1.0, 2.0, 3.0) # tuple
(1.0, 2.0, 3.0)
point[1] = 10.0
You can index a tuple, like a vector, to get the stored element(s):
point[1:2]
(1.0, 2.0)
Tuples, vectors, pairs and other iterables can be easily unpacked using an assignation:
x, y, z = point
y
2.0
You can use this unpacking when iterating a dictionary:
for (key, value) in dictionary
println("key: ", key, " value: ", value)
end
key: A value: T
key: G value: C
key: T value: A
key: C value: G
Exercise 3
Write a function to return the reverse complement of a DNA sequence (string) using a dictionary, the join
function and the Base.Iterators.reverse
iterator. It should use a 'N' as complementary of any base different from 'A', 'C', 'T' or 'G':
# function reverse_complement(...
using Test
@test reverse_complement("ACTGGTCCCNT") == "ANGGGACCAGT"
Named tuples
They can be an easy and fast way to store data:
point = (x=1.0, y=2.0, z=3.0) # named tuple
(x = 1.0, y = 2.0, z = 3.0)
You can use namedtuple.name
to access a particular element:
point.y
2.0
Sets
You can use Set
to represent a set of unique elements:
set = Set([1, 2, 2, 3, 3, 3, 4, 4, 4, 4])
Set([4, 2, 3, 1])
Test membership is $O(1)$
4 in set
true
You can get the intersection of two sets using intersect
or ∩
(\cap<TAB>
)
set_a = Set([1, 2, 3])
set_b = Set([2, 3, 4])
set_a ∩ set_b # intersect(set, set_b)
Set([2, 3])
And the unioin of to sets using union
or ∪
(\cup<TAB>
)
set_a ∪ set_b # union(set, set_b)
Set([4, 2, 3, 1])
The symmetric difference, i.e. disjunctive union, of two sets
symdiff(set_a, set_b)
Set([4, 1])
This page was generated using Literate.jl.