Counting residues

Counting residues

MIToS was designed to perform fast counting of residues. To achieve that, each Residue is encoded as an integer that can be used to index.

using MIToS.MSA
res = Residue('C')
Int(res)

Let's count residues in the UBA domain Pfam family.

using MIToS.Pfam
const msa_file = downloadpfam("PF16577")
const msa = read(msa_file, Stockholm)

We can access the MSA as a matrix to get a particular column or sequence:

msa[:, column]
msa[sequence, :]
col_i = msa[:, 10]

MIToS Information module has functions to get residue frequencies (counts or fractions) and to return contingency tables:

using MIToS.Information
Ni = count(col_i)
Pi = probabilities(col_i)

Counts and Probabilities objects wrap a ContingencyTable:

Pi.table

They can be quickly indexed by using Residues:

Pi[Residue('P')]

You can create a bidimensional ContingencyTable by using two MSA columns or sequences (there is no limit to the number of dimensions):

col_j = msa[:, 15]

Nij = count(col_i, col_j)
Nij[Residue('P'), Residue('R')]

You can also get probabilities by normalizing counts:

Pij = normalize(Nij.table)
Pij[Residue('P'), Residue('R')]

Plotting counts

We use a heatmap to plot a bidimensional table:

using Plots
gr()

We can list the available color libraries:

clibraries()

Visualize the color maps defined in a library:

showlibrary(:Plots)

And select the color library we want to use:

clibrary(:cmocean)

For this plot, we use the interactive Plotly backend

plotlyjs()

We need a vector of strings for the ticks labels...

x = string.(Residue.(UngappedAlphabet()))

...because xticks and xticks keyword arguments take a tuple: (positions, labels)

heatmap(Nij,
		color=:tempo,
		ratio=:equal,
		xticks=(1:length(x), x),
		yticks=(1:length(x), x))

Functions taking counts and probabilities

Information defines different functions that take contingency tables as an input, for example:

entropy(Ni)
mutual_information(Nij)

Conservation

You can use the MIToS' Information module to calculate the Shannon entropy of each MSA column:

using MIToS.Information

const alphabet = UngappedAlphabet()

You also have GappedAlphabet and ReducedAlphabet:

ReducedAlphabet("(AILMV)(RHK)(NQST)(DE)(FWY)CGP")

Empty contingency tables can be defined by using ContingencyTable(element_type, Val{dimensions}, alphabet):

const table = ContingencyTable(Int, Val{1}, alphabet)

The mapcolfreq!, mapseqfreq!, mapcolpairfreq! and mapseqpairfreq! functions from the Information module apply a function on the table filled using each column, sequence, column pair or sequence pair of the MSA, respectively:

Hx = mapcolfreq!(entropy, msa, Counts(table))

We use Plots.jl to visualize the entropy of each column to find variable and conserved regions

gr()
plot_entropy = plot(vec(Hx),
					color=:orange,
					fill=0,
					fillalpha=0.5,
					legend=false,
					ylab="Entropy")
plot_msa = plot(msa, legend=false)
plot(plot_entropy,
	 plot_msa,
	 layout=grid(2, 1, heights=[0.2, 0.8]),
	 link=:x)

Multivariate mutual information

We are going to measure mutual information in MSA column triplets. We use Julia parallelism because the number of combinations is high.

using Distributed
nprocs()
1
addprocs(3)
3-element Array{Int64,1}:
 2
 3
 4
nprocs()
4

We are going to use Combinatorics everywhere (in all the process) to get a generator of the combinations and ProgressMeter to see the progress of the parallel loop.

@everywhere using Combinatorics
@everywhere using ProgressMeter
@everywhere using MIToS.MSA
@everywhere using MIToS.Information

Exercise 1

Fill the missing parts of the function to calculate mutual information between MSA columns i, j and k.

function parallel_mmi(msa)
	ncols = ncolumns(msa)
	triplets = combinations(1:ncols, 3)
	@showprogress pmap(triplets) do (i, j, k)
#           ...to complete...
#           (i, j, k) => ...to complete...
	end
end
mi_values = parallel_mmi(msa)

This page was generated using Literate.jl.