Data Structures

Data Structures

Arrays

Julia has a nice and flexible array interface. Arrays can have an arbitrary number of dimensions. Let's define a one-dimetional array (i.e. a vector):

vector = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
6-element Array{Float64,1}:
 1.0
 2.0
 3.0
 4.0
 5.0
 6.0

The first index of an array in Julia is 1:

vector[1]
1.0

You can use end to access the last element of an array:

vector[end]
6.0

Use ranges (start:end) to get a slice of the array:

vector[2:4]
3-element Array{Float64,1}:
 2.0
 3.0
 4.0

Ranges in Julia are iterable objects:

indexes = 2:4
2:4
for i in indexes
	@show i
end
i = 2
i = 3
i = 4

Julia arrays, like the strings and ranges, are also iterables:

for element in vector
	println(element)
end
1.0
2.0
3.0
4.0
5.0
6.0

Exercise 1

Write a function to return the distance between two three dimensional points, i.e. two vector of three elements. You should use a for loop over a range and index the vectors.

# function distance(a, b...
using Test
A = [1.25, 2.0, 3.6]
B = [-3.5, 4.7, 5.0]
@test distance(A, B) ≈ hypot((A - B)...)

... "splats" the values contained in an iterable collection into a function call as individual arguments, e.g:

vector = [1, 2, 3]
hypot(vector...)  # hypot(1, 2, 3)
3.7416573867739413

You can use push! to add one element to the end of an array

vector = [1,2,3]
3-element Array{Int64,1}:
 1
 2
 3
push!(vector, 4)
4-element Array{Int64,1}:
 1
 2
 3
 4

There are other useful dequeues functions defined in Julia, e.g. pop!, append!.

In Julia, by convention, all the functions that modify their arguments should end with a bang or exclamation mark, !, see the style guide.

Vectorized operations

You can use a dot, ., to indicate that a function, e.g. log.(x), or operator, e.g. x .^ y, should be applied element by element, see dot syntax:

a = [1, -2, -3]
b = [-2, -4, 0]
a .* b
3-element Array{Int64,1}:
 -2
  8
  0

This notation allows vectorizing any function, even element-wise functions defined by the user:

f(x) = 3.45x + 4.76

f.(sin.(a))
3-element Array{Float64,1}:
 7.663074897587243 
 1.6229238774513979
 4.273135972193458 

Multiple vectorized operations get fused in a single loop without temporal arrays.

Comprehensions

You can use comprehensions to create arrays and perform some operation

[ 2x for x in 1:10 ]
10-element Array{Int64,1}:
  2
  4
  6
  8
 10
 12
 14
 16
 18
 20
result = [ 2x for x in 1:10 if x % 2 == 0 ]
5-element Array{Int64,1}:
  4
  8
 12
 16
 20

Exercise 3

Write the equivalent of the previous expression using a for loop and push!.

# result = []
# for ...

Matrices

Matrices, bidimentional arrays, can be defined with the following notation:

matrix = [ 1.0 4.0 7.0
           2.0 5.0 8.0
		   3.0 6.0 9.0 ]
3×3 Array{Float64,2}:
 1.0  4.0  7.0
 2.0  5.0  8.0
 3.0  6.0  9.0

You can use linear indexing (Julia arrays are stored in column major order) to access an element

matrix[2]
2.0

Or using one index by dimension, i.e. matrix[row_index, col_index] :

matrix[2, 1]
2.0

You can also use ranges and end. The colon, :, means that all the indices from that dimension should be used:

matrix[2:end, :]
2×3 Array{Float64,2}:
 2.0  5.0  8.0
 3.0  6.0  9.0

Comprehensions

You can also use comprehensions to create matrices. In fact, you can create array of any desired dimension:

[ x + y for x in 1:5, y in 1:10 ]
5×10 Array{Int64,2}:
 2  3  4  5   6   7   8   9  10  11
 3  4  5  6   7   8   9  10  11  12
 4  5  6  7   8   9  10  11  12  13
 5  6  7  8   9  10  11  12  13  14
 6  7  8  9  10  11  12  13  14  15

Dictionaries and pairs

Dictionaries (hash tables) stores key => values pairs:

dictionary = Dict('A' => 'T', 'C' => 'G', 'T' => 'A', 'G' => 'C')
Dict{Char,Char} with 4 entries:
  'A' => 'T'
  'G' => 'C'
  'T' => 'A'
  'C' => 'G'

You can get a value by indexing with the key:

dictionary['A'] # get(dictionary, 'A')
'T': ASCII/Unicode U+0054 (category Lu: Letter, uppercase)

If the key is not present in the dictionary, an error is raised:

dictionary['N'] # get(dictionary, 'N')

The function get allows to specify a default value that is returned if the key is absent in the dictionary:

get(dictionary, 'N', '-')
'-': ASCII/Unicode U+002d (category Pd: Punctuation, dash)

A nice thing about hash tables (dictionary keys, sets) is that test membership is $O(1)$ while it is $O(N)$ in lists/vectors/arrays:

'N' in keys(dictionary)
false

A dictionary gives pairs when it is iterated:

for pair in dictionary
	println("pair: ", pair)  # each pair is key => value
	println("key: ", pair.first)  # pair.first == pair[1]
	println("value: ", pair.second)  # pair.second == pair[2]
end
pair: 'A' => 'T'
key: A
value: T
pair: 'G' => 'C'
key: G
value: C
pair: 'T' => 'A'
key: T
value: A
pair: 'C' => 'G'
key: C
value: G

Tuples

Tuples are immutable collections, while arrays are mutable:

point = [1.0, 2.0, 3.0]  # vector
point[1] = 10.0
point
3-element Array{Float64,1}:
 10.0
  2.0
  3.0
point = (1.0, 2.0, 3.0)  # tuple
(1.0, 2.0, 3.0)
point[1] = 10.0

You can index a tuple, like a vector, to get the stored element(s):

point[1:2]
(1.0, 2.0)

Tuples, vectors, pairs and other iterables can be easily unpacked using an assignation:

x, y, z = point
y
2.0

You can use this unpacking when iterating a dictionary:

for (key, value) in dictionary
	println("key: ", key, " value: ", value)
end
key: A value: T
key: G value: C
key: T value: A
key: C value: G

Exercise 3

Write a function to return the reverse complement of a DNA sequence (string) using a dictionary, the join function and the Base.Iterators.reverse iterator. It should use a 'N' as complementary of any base different from 'A', 'C', 'T' or 'G':

# function reverse_complement(...
using Test
@test reverse_complement("ACTGGTCCCNT") == "ANGGGACCAGT"

Named tuples

They can be an easy and fast way to store data:

point = (x=1.0, y=2.0, z=3.0)  # named tuple
(x = 1.0, y = 2.0, z = 3.0)

You can use namedtuple.name to access a particular element:

point.y
2.0

Sets

You can use Set to represent a set of unique elements:

set = Set([1, 2, 2, 3, 3, 3, 4, 4, 4, 4])
Set([4, 2, 3, 1])

Test membership is $O(1)$

4 in set
true

You can get the intersection of two sets using intersect or (\cap<TAB>)

set_a = Set([1, 2, 3])
set_b = Set([2, 3, 4])
set_a ∩ set_b  # intersect(set, set_b)
Set([2, 3])

And the unioin of to sets using union or (\cup<TAB>)

set_a ∪ set_b  # union(set, set_b)
Set([4, 2, 3, 1])

The symmetric difference, i.e. disjunctive union, of two sets

symdiff(set_a, set_b)
Set([4, 1])

This page was generated using Literate.jl.