MSA

MIToS.MSA — Module

The MSA module of MIToS has utilities for working with Multiple Sequence Alignments of protein Sequences (MSA).

Features

Read and write MSAs in Stockholm, FASTA, A3M, PIR, or Raw format
Handle MSA annotations
Edit the MSA, e.g. delete columns or sequences, change sequence order, shuffling...
Keep track of positions and annotations after modifications on the MSA
Describe a MSA, e.g. mean percent identity, sequence coverage, gap percentage...

using MIToS.MSA

source

Types

MIToS.MSA.AbstractAlignedObject — Type

MIToS MSA and aligned sequences (aligned objects) are subtypes of AbstractMatrix{Residue}, because MSAs and sequences are stored as Matrix of Residues.

source

MIToS.MSA.AbstractAlignedSequence — Type

A MIToS aligned sequence is an AbstractMatrix{Residue} with only 1 row/sequence.

source

MIToS.MSA.AbstractMultipleSequenceAlignment — Type

MSAs are stored as Matrix{Residue}. It's possible to use a NamedResidueMatrix{Array{Residue,2}} as the most simple MSA with sequence identifiers and column names.

source

MIToS.MSA.AbstractSequence — Type

A MIToS (unaligned) sequence is an AbstractMatrix{Residue} with only 1 row/sequence.

source

MIToS.MSA.AlignedSequence — Type

An AlignedSequence wraps a NamedResidueMatrix{Array{Residue,2}} with only 1 row/sequence. The NamedArray stores the sequence name and original column numbers as Strings.

source

MIToS.MSA.AnnotatedAlignedSequence — Type

This type represent an aligned sequence, similar to AlignedSequence, but It also stores its Annotations.

source

MIToS.MSA.AnnotatedMultipleSequenceAlignment — Type

This type represent an MSA, similar to MultipleSequenceAlignment, but It also stores Annotations. This annotations are used to store residue coordinates (i.e. mapping to UniProt residue numbers).

source

MIToS.MSA.AnnotatedSequence — Type

An AnnotationSequence wraps a NamedResidueMatrix{Array{Residue,2}} with only 1 row/sequence and its Annotations. The NamedArray stores the sequence name and original position numbers as Strings.

source

MIToS.MSA.Annotations — Type

The Annotations type is basically a container for Dicts with the annotations of a multiple sequence alignment. Annotations was designed for storage of annotations of the Stockholm format.

MIToS also uses MSA annotations to keep track of:

Modifications of the MSA (MIToS_...) as deletion of sequences or columns.
Positions numbers in the original MSA file (column mapping: ColMap)
Position of the residues in the sequence (sequence mapping: SeqMap)

source

MIToS.MSA.Clusters — Type

Data structure to represent sequence clusters. The sequence data itself is not included.

source

MIToS.MSA.GappedAlphabet — Type

This type defines the usual alphabet of the 20 natural residues and a gap character.

julia> using MIToS.MSA

julia> GappedAlphabet()
GappedAlphabet of length 21. Residues : res"ARNDCQEGHILKMFPSTWYV-"

source

MIToS.MSA.GappedXAlphabet — Type

GappedXAlphabet

Residue alphabet including the 20 natural amino acids, a gap character and the X residue used for unknown or ambiguous cases.

source

MIToS.MSA.MultipleSequenceAlignment — Type

This MSA type include a NamedArray wrapping a Matrix of Residues. The use of NamedArray allows to store sequence names and original column numbers as Strings, and fast indexing using them.

source

MIToS.MSA.NoClustering — Type

Use NoClustering() to avoid the use of clustering where a Clusters type is needed.

source

MIToS.MSA.ReducedAlphabet — Type

ReducedAlphabet allows the construction of reduced residue alphabets, where residues inside parenthesis belong to the same group.

julia> using MIToS.MSA

julia> ab = ReducedAlphabet("(AILMV)(RHK)(NQST)(DE)(FWY)CGP")
ReducedAlphabet of length 8 : "(AILMV)(RHK)(NQST)(DE)(FWY)CGP"

julia> ab[Residue('K')]
2

source

MIToS.MSA.Residue — Type

Most of the MIToS design is created around the Residue bitstype. It has representations for the 20 natural amino acids, a value representing insertions and deletions (GAP, '-') and one representing unknown, ambiguous and non standard residues (XAA, 'X'). Each Residue is encoded as an integer number, with the same bit representation and size than a Int. This allows fast indexing operation of probability or frequency matrices.

Residue creation and conversion

Creation and conversion of Residues should be treated carefully. Residue is encoded as a 32 or 64 bits type similar to Int, to get fast indexing using Int(x::Residue). Int simply calls reinterpret without checking if the residue is valid. Valid residues have integer values in the closed interval [1,22]. convert from Int and Char always returns valid residues, however it's possible to find invalid residues (they are shown using the character '�') after the creation of uninitialized Residue arrays (i.e. using Array). You can use zeros, ones or rand to get initialized Residue arrays with valid residues. Conversions to and from Chars changes the bit representation and allows the use of the usual character representation of residues and amino acids. This conversions are used in IO operations and always return valid residues. In conversions from Char, lowercase letters, '*', '-' and '.' are translated to GAP, letters representing the 20 natural amino (ARNDCQEGHILKMFPSTWYV) acids are translated to their corresponding Residue and any other character is translated to XAA. Since lowercase letters and dots are translated to gaps, Pfam MSA insert columns are converted to columns full of gaps.

julia> using MIToS.MSA

julia> alanine = Residue('A')
A

julia> Char(alanine)
'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)

julia> for residue in res"ARNDCQEGHILKMFPSTWYV-X"
           println(residue, " ", Int(residue))
       end
A 1
R 2
N 3
D 4
C 5
Q 6
E 7
G 8
H 9
I 10
L 11
K 12
M 13
F 14
P 15
S 16
T 17
W 18
Y 19
V 20
- 21
X 22

source

MIToS.MSA.ResidueAlphabet — Type

Abstract type to define residue alphabet types.

source

MIToS.MSA.UngappedAlphabet — Type

This type defines the usual alphabet of the 20 natural residues, without the gap character.

julia> using MIToS.MSA

julia> UngappedAlphabet()
UngappedAlphabet of length 20. Residues : res"ARNDCQEGHILKMFPSTWYV"

source

Constants

MIToS.MSA.GAP — Constant

GAP is the Residue representation on MIToS for gaps ('-', insertions and deletions). Lowercase residue characters, dots and '*' are encoded as GAP in conversion from Strings and Chars. This Residue constant is encoded as Residue(21).

source

MIToS.MSA.WeightTypes — Type

The WeightTypes type is the same as Union{Weights,NoClustering,Clusters}. This type is used to represent weights. Most of the functions taking the weights kerword argument in the Information module accept instances of WeightTypes.

source

MIToS.MSA.XAA — Constant

XAA is the Residue representation for unknown, ambiguous and non standard residues. This Residue constant is encoded as Residue(22).

source

Macros

MIToS.MSA.@res_str — Macro

The MIToS macro @res_str takes a string and returns a Vector of Residues (sequence).

julia> using MIToS.MSA

julia> res"MIToS"
5-element Vector{Residue}:
 M
 I
 T
 -
 S

source

Methods and functions

Base.isvalid — Method

isvalid(res::Residue)

It returns true if the encoded integer is in the closed interval [1,22].

source

Base.names — Method

It returns the name of each group. The name is a string with the one letter code of each residue that belong to the group.

julia> using MIToS.MSA

julia> ab = ReducedAlphabet("(AILMV)(RHK)(NQST)(DE)(FWY)CGP")
ReducedAlphabet of length 8 : "(AILMV)(RHK)(NQST)(DE)(FWY)CGP"

julia> names(ab)
8-element Vector{String}:
 "AILMV"
 "RHK"
 "NQST"
 "DE"
 "FWY"
 "C"
 "G"
 "P"

source

Base.rand — Method

It chooses from the 20 natural residues (it doesn't generate gaps).

julia> using MIToS.MSA

julia> using Random

julia> Random.seed!(1); # Reseed the random number generator.

julia> rand(Residue)
R

julia> rand(Residue, 4, 4)
4×4 Matrix{Residue}:
 E  D  D  A
 F  S  K  K
 M  S  I  M
 Y  F  E  D

source

MIToS.MSA.adjustreference — Function

Creates a new matrix of residues. This function deletes positions/columns of the MSA with gaps in the reference (first) sequence.

source

MIToS.MSA.adjustreference! — Function

It removes positions/columns of the MSA with gaps in the reference (first) sequence.

source

MIToS.MSA.annotate_modification! — Method

Annotates on file annotations the modifications realized by MIToS on the MSA. It always returns true, so It can be used in a boolean context.

source

MIToS.MSA.annotations — Method

The annotations function returns the Annotations of an annotated MSA or aligned sequence. If the object is not annotated, it returns an empty Annotations object.

source

MIToS.MSA.column_index — Method

column_index(msa, col_name)

Return the index (integer position) of the column with name col_name in the MSA msa. A KeyError is thrown if the column name does not exist. If col_name is an integer, the same integer is returned without checking if it is a valid index.

source

MIToS.MSA.columngapfraction — Method

Fraction of gaps per column/position on the MSA

source

MIToS.MSA.columnname_iterator — Method

columnname_iterator(msa)

It returns an iterator that returns the column names of the msa. If the msa is a Matrix{Residue} this function returns the actual column numbers as strings. Otherwise it returns the column number of the original MSA through the wrapped NamedArray column names.

source

MIToS.MSA.columnnames — Method

columnnames(msa)

It returns a Vector{String} with the sequence names/identifiers. If the msa is a Matrix{Residue} this function returns the actual column numbers as strings. Otherwise it returns the column number of the original MSA through the wrapped NamedArray column names.

source

MIToS.MSA.columnpairsmatrix — Method

Initialize an empty PairwiseListMatrix for a pairwise measure in sequence pairs. It uses the sequence names if they are available, otherwise it uses the actual sequence numbers. You can use the positional argument to indicate the number Type (default: Float64), if the PairwiseListMatrix should store the diagonal values on the list (default: false) and a default value for the diagonal (default: NaN).

source

MIToS.MSA.coverage — Method

Coverage of the sequences with respect of the number of positions on the MSA

source

MIToS.MSA.delete_annotated_modifications! — Method

Deletes all the MIToS annotated modifications

source

MIToS.MSA.deletefullgapcolumns! — Function

Deletes columns with 100% gaps, this columns are generated by inserts.

source

MIToS.MSA.filtercolumns! — Function

filtercolumns!(msa, mask[, annotate::Bool=true]) filtercolumns!(f, msa[, annotate::Bool=true])

It allows to filter MSA or aligned sequence columns/positions using a boolean mask or a function f applied to each column. Annotations are updated if annotate is true (default).

source

MIToS.MSA.filtercolumns! — Method

filtercolumns!(data::Annotations, mask)

It is useful for deleting column annotations (creating a subset in place).

source

MIToS.MSA.filtercolumns — Method

It's similar to filtercolumns! but for an AbstractMatrix{Residue}

source

MIToS.MSA.filtersequences! — Function

filtersequences!(msa, mask[, annotate::Bool=true]) filtersequences!(f, msa[, annotate::Bool=true])

It allows to filter msa sequences using a boolean mask or a function f (called once per sequence). Sequences with false values are removed. AnnotatedMultipleSequenceAlignment annotations are updated if annotate is true (default).

source

MIToS.MSA.filtersequences! — Method

filtersequences!(data::Annotations, ids::Vector{String}, mask::AbstractArray{Bool,1})

It is useful for deleting sequence annotations. ids should be a list of the sequence names and mask should be a logical vector.

source

MIToS.MSA.filtersequences — Method

It's similar to filtersequences! but for an AbstractMatrix{Residue}

source

MIToS.MSA.gapfraction — Method

It calculates the fraction of gaps on the Array (alignment, sequence, column, etc.). This function can take an extra dimension argument for calculation of the gap fraction over the given dimension.

source

MIToS.MSA.gapstrip! — Function

This functions deletes/filters sequences and columns/positions on the MSA on the following order:

Removes all the columns/position on the MSA with gaps on the reference (first) sequence.
Removes all the sequences with a coverage with respect to the number of columns/positions on the MSA less than a coveragelimit (default to 0.75: sequences with 25% of gaps).
Removes all the columns/position on the MSA with more than a gaplimit (default to 0.5: 50% of gaps).

source

MIToS.MSA.gapstrip — Method

Creates a new matrix of Residues (MSA) with deleted sequences and columns/positions. The MSA is edited in the following way:

Removes all the columns/position on the MSA with gaps on the reference (first) sequence
Removes all the sequences with a coverage with respect to the number of columns/positions on the MSA less than a coveragelimit (default to 0.75: sequences with 25% of gaps)
Removes all the columns/position on the MSA with more than a gaplimit (default to 0.5: 50% of gaps)

source

MIToS.MSA.getannotcolumn — Function

getannotcolumn(ann[, feature[,default]])

It returns per column annotation for feature

source

MIToS.MSA.getannotfile — Function

getannotfile(ann[, feature[,default]])

It returns per file annotation for feature

source

MIToS.MSA.getannotresidue — Function

getannotresidue(ann[, seqname, feature[,default]])

It returns per residue annotation for (seqname, feature)

source

MIToS.MSA.getannotsequence — Function

getannotsequence(ann[, seqname, feature[,default]])

It returns per sequence annotation for (seqname, feature)

source

MIToS.MSA.getcolumnmapping — Method

It returns a Vector{Int} with the original column number of each column on the actual MSA. The mapping is annotated in the ColMap file annotation of an AnnotatedMultipleSequenceAlignment or in the column names of an NamedArray or MultipleSequenceAlignment.

NOTE: When the MSA results from vertically concatenating MSAs using vcat, the column map annotations from the constituent MSAs (such as 1_ColMap, 2_ColMap, etc.) are not returned. Instead, the column numbers referenced in the column names are provided. To access the original annotations, utilize the getannotfile function.

source

MIToS.MSA.gethcatmapping — Method

It returns a vector of numbers from 1 to N for each column that indicates the source MSA. The mapping is annotated in the "HCat" file annotation of an AnnotatedMultipleSequenceAlignment or in the column names of an NamedArray or MultipleSequenceAlignment.

NOTE: When the MSA results from vertically concatenating MSAs using vcat, the "HCat" annotations from the constituent MSAs are renamed as "1_HCat", "2_HCat", etc. In that case, the MSA numbers referenced in the column names are provided. To access the original annotations, utilize the getannotfile function.

source

MIToS.MSA.getnamedict — Method

It takes a ResidueAlphabet and returns a dictionary from group name to group position.

julia> using MIToS.MSA

julia> ab = ReducedAlphabet("(AILMV)(RHK)(NQST)(DE)(FWY)CGP")
ReducedAlphabet of length 8 : "(AILMV)(RHK)(NQST)(DE)(FWY)CGP"

julia> getnamedict(ab)
OrderedCollections.OrderedDict{String, Int64} with 8 entries:
  "AILMV" => 1
  "RHK"   => 2
  "NQST"  => 3
  "DE"    => 4
  "FWY"   => 5
  "C"     => 6
  "G"     => 7
  "P"     => 8

source

MIToS.MSA.getresidues — Method

getresidues allows you to access the residues stored inside an MSA or aligned sequence as a Matrix{Residue} without annotations nor column/row names.

source

MIToS.MSA.getresiduesequences — Method

getresiduesequences returns a Vector{Vector{Residue}} with all the MSA sequences without annotations nor column/sequence names.

source

MIToS.MSA.getsequence — Function

getsequence takes an MSA and a sequence number or identifier and returns an aligned sequence object. If the MSA is an AnnotatedMultipleSequenceAlignment, it returns an AnnotatedAlignedSequence with the sequence annotations. From a MultipleSequenceAlignment, It returns an AlignedSequence object. If an Annotations object and a sequence identifier are used, this function returns the annotations related to the sequence.

source

MIToS.MSA.getsequencemapping — Method

It returns the sequence coordinates as a Vector{Int} for an MSA sequence. That vector has one element for each MSA column. If the number if 0 in the mapping, there is a gap in that column for that sequence.

source

MIToS.MSA.getweight — Method

getweight(c[, i::Int])

This function returns the weight of the sequence number i. getweight should be defined for any type used for frequencies!/frequencies in order to use his weigths. If i isn't used, this function returns a vector with the weight of each sequence.

source

MIToS.MSA.hobohmI — Method

hobohmI(msa, threshold)

This method allows to cluster the sequences contained in msa using percentidentity as the clustering predicate.

source

MIToS.MSA.hobohmI — Method

hobohmI(within_cluster, msa, threshold)

This method allows clustering the aligned sequences in msa using the within_cluster predicate. It converts the alignment into a vector of residue sequences and forwards the call to the general method.

source

MIToS.MSA.hobohmI — Method

hobohmI(within_cluster, items, threshold)

Cluster items using the Hobohm I algorithm from Hobohm et al. within_cluster is a predicate that receives two elements and threshold and returns true when they should be clustered together.

References

Hobohm, Uwe, et al. "Selection of representative protein data sets." Protein Science 1.3 (1992): 409-417.

source

MIToS.MSA.join_msas — Method

join_msas(msa_a::AnnotatedMultipleSequenceAlignment,
    msa_b::AnnotatedMultipleSequenceAlignment;
    kind::Symbol=:outer,
    axis::Int=1)::AnnotatedMultipleSequenceAlignment

join_msas(msa_a::AnnotatedMultipleSequenceAlignment,
    msa_b::AnnotatedMultipleSequenceAlignment,
    pairing;
    kind::Symbol=:outer,
    axis::Int=1)::AnnotatedMultipleSequenceAlignment

join_msas(msa_a::AnnotatedMultipleSequenceAlignment,
    msa_b::AnnotatedMultipleSequenceAlignment,
    positions_a,
    positions_b;
    kind::Symbol=:outer,
    axis::Int=1)::AnnotatedMultipleSequenceAlignment

Join two Multiple Sequence Alignments (MSAs), msa_a and msa_b, based on specified matching positions or names. If no explicit pairing or position lists are provided, sequences or columns with the same name are paired automatically. The function therefore supports three calling conventions: using only the two MSAs, providing a pairing iterator, or passing positions_a and positions_b separately. This function allows for various types of join operations (:inner, :outer, :left, :right) and can merge MSAs by sequences (axis 1) or by columns (axis 2).

Parameters:

msa_a::AnnotatedMultipleSequenceAlignment: The first MSA.
msa_b::AnnotatedMultipleSequenceAlignment: The second MSA.
pairing: An iterable where each element is a pair of sequence or column positions (Ints) or names (Strings) to match between msa_a and msa_b. For example, it can be a list of two-element tuples or pairs, or and OrderedDict.
positions_a, positions_b: Separate lists of positions or names in msa_a and msa_b, respectively.
kind::Symbol: Type of join operation. Default is :outer.
axis::Int: The axis along which to join (1 to match sequences, 2 to match columns).

Returns:

AnnotatedMultipleSequenceAlignment: A new MSA resulting from the join operation.

Behavior and Sequence Ordering:

The order of sequences or columns in the resulting MSA depends on the kind of join operation and the order of elements in the pairing or positions_a and positions_b lists.

For :inner joins, the function returns an MSA containing only those sequences/columns that are paired in both msa_a and msa_b. The order of elements in the output MSA follows the order in the pairing or position lists.
For :outer joins, the output MSA includes all sequences/columns from both msa_a and msa_b. Unpaired sequences/columns are filled with gaps as needed. The sequences/columns from msa_a are placed first. If the pairing or position lists are sorted, the output MSA columns and sequences will keep the same order as in the inputs. That's nice for situations such as profile alignments where the order of columns is important. If the pairing or position lists are not sorted, then the order of sequences/columns in the output MSA is not guaranteed to be the same as in the inputs. In particular, the matched sequences or columns will be placed first, followed by the unmatched ones.
For :left joins, all sequences/columns from msa_a are included in the output MSA keeping the same order as in msa_a. Sequences/columns from msa_b are added where matches are found, with gaps filling the unmatched positions.
For :right joins, the output MSA behaves like :left joins but with roles of msa_a and msa_b reversed.

Warning: When using Dict for pairing, the order of elements might not be preserved as expected. Dict in Julia does not maintain the order of its elements, which might lead to unpredictable order of sequences/columns in the output MSA. To preserve order, it is recommended to use an OrderedDict or a list of Pairs objects.

source

MIToS.MSA.meanpercentidentity — Function

Returns the mean of the percent identity between the sequences of a MSA. If the MSA has 300 sequences or less, the mean is exact. If the MSA has more sequences and the exact keyword is false (defualt), 44850 random pairs of sequences are used for the estimation. The number of samples can be changed using the second argument. Use exact=true to perform all the pairwise comparison (the calculation could be slow).

source

MIToS.MSA.n_effective — Function

n_effective(msa[, threshold=80.0])

Calculate the effective number of sequences (Neff/Meff) of an alignment using the weighting scheme described by Morcos et al.. For each sequence the weight is the inverse of the number of sequences (including itself) whose percent identity is greater than or equal to threshold. The sum of the weights is returned.

References

Morcos, Faruck, et al. "Direct-coupling analysis of residue coevolution captures native contacts across many protein families." Proceedings of the National Academy of Sciences 108.49 (2011): E1293–E1301.

source

MIToS.MSA.namedmatrix — Method

The namedmatrix function returns the NamedResidueMatrix{Array{Residue,2}} stored in an MSA or aligned sequence.

source

MIToS.MSA.ncolumns — Method

ncolumns returns the number of MSA columns or positions.

source

MIToS.MSA.ncolumns — Method

ncolumns(ann::Annotations) returns the number of columns/residues with annotations. This function returns -1 if there is not annotations per column/residue.

source

MIToS.MSA.nsequences — Method

nsequences returns the number of sequences on the MSA.

source

MIToS.MSA.percentidentity — Method

percentidentity(seq1, seq2, threshold)

Computes quickly if two aligned sequences have a identity value greater than a given threshold value. Returns a boolean value. Positions with gaps in both sequences doesn't count to the length of the sequences. Positions with a XAA in at least one sequence aren't counted.

source

MIToS.MSA.percentidentity — Method

percentidentity(seq1, seq2)

Calculates the fraction of identities between two aligned sequences. The identity value is calculated as the number of identical characters in the i-th position of both sequences divided by the length of both sequences. Positions with gaps in both sequences doesn't count to the length of the sequences. Positions with a XAA in at least one sequence aren't counted.

source

MIToS.MSA.percentidentity — Method

percentidentity(msa[, out::Type=Float64])

Calculates the identity between all the sequences on a MSA. You can indicate the output element type with the last optional parameter (Float64 by default). For a MSA with a lot of sequences, you can use Float32 or Flot16 in order to avoid the OutOfMemoryError().

source

MIToS.MSA.percentpositive — Method

percentpositive(msa[, out::Type=Float64]; matrix=ResidueSubstitutionMatrices.BLOSUM62)

Compute the percentage of positives (as in BLAST) between all pairs of sequences in the multiple sequence alignment msa. The substitution matrix is supplied via the matrix keyword argument and defaults to BLOSUM62. The output element type is selected with the optional positional argument out (Float64 by default). You can use Float32 or Float16 to avoid OutOfMemoryError() for MSAs with a large number of sequences.

source

MIToS.MSA.percentsimilarity — Function

Calculates the similarity percent between two aligned sequences. The 100% is the length of the aligned sequences minus the number of columns with gaps in both sequences and the number of columns with at least one residue outside the alphabet. So, columns with residues outside the alphabet (other than the specially treated GAP) aren't counted to the protein length. Two residues are considered similar if they below to the same group in a ReducedAlphabet. The alphabet (third positional argument) by default is:

ReducedAlphabet("(AILMV)(NQST)(RHK)(DE)(FWY)CGP")

The first group is composed of the non polar residues (AILMV), the second group is composed of polar residues, the third group are positive residues, the fourth group are negative residues, the fifth group is composed by the aromatic residues (FWY). C, G and P are considered unique residues.

Other residue groups/alphabets:

SMS (Sequence Manipulation Suite) Ident and Sim (Stothard Paul. 2000):

ReducedAlphabet("(GAVLI)(FYW)(ST)(KRH)(DENQ)P(CM)")

Stothard P (2000) The Sequence Manipulation Suite: JavaScript programs for analyzing and formatting protein and DNA sequences. Biotechniques 28:1102-1104.

Bio3D 2.2 seqidentity (Grant, Barry J., et al. 2006):

ReducedAlphabet("(GA)(MVLI)(FYW)(ST)(KRH)(DE)(NQ)PC")

References

source

MIToS.MSA.percentsimilarity — Method

Calculates the similarity percent between all the sequences on a MSA. You can indicate the output element type with the out keyword argument (Float64 by default). For an MSA with a lot of sequences, you can use out=Float32 or out=Flot16 in order to avoid the OutOfMemoryError().

source

MIToS.MSA.printmodifications — Method

Prints MIToS annotated modifications

source

MIToS.MSA.rename_columns! — Method

rename_columns!(msa, newnames::Vector{T}) where {T<:AbstractString}
rename_columns!(msa, old2new::AbstractDict)
rename_columns!(msa, old2new::Pair...)

Rename the columns of an MSA given a vector of new names, a dictionary mapping old names to new names, or one or more pairs going from old to new names. If the msa is an AnnotatedMultipleSequenceAlignment, the annotations are also updated. The function modifies the msa in place and returns it.

source

MIToS.MSA.rename_columns — Method

rename_columns(msa, newnames::Vector{T}) where {T<:AbstractString}
rename_columns(msa, old2new::AbstractDict)
rename_columns(msa, old2new::Pair...)

source

MIToS.MSA.rename_sequences! — Method

rename_sequences!(msa, newnames::Vector{T}) where {T<:AbstractString}
rename_sequences!(msa, old2new::AbstractDict)
rename_sequences!(msa, old2new::Pair...)

Rename the sequences of an MSA given a vector of new names, a dictionary mapping old names to new names, or one or more pairs going from old to new names. If the msa is an AnnotatedMultipleSequenceAlignment, the annotations are also updated. The function modifies the msa in place and returns it.

source

MIToS.MSA.rename_sequences — Method

rename_sequences(msa, newnames::Vector{T}) where {T<:AbstractString}
rename_sequences(msa, old2new::AbstractDict)
rename_sequences(msa, old2new::Pair...)

source

MIToS.MSA.residue2three — Method

This function returns the three letter name of the Residue.

julia> using MIToS.MSA

julia> residue2three(Residue('G'))
"GLY"

source

MIToS.MSA.residuefraction — Method

It calculates the fraction of residues (no gaps) on the Array (alignment, sequence, column, etc.). This function can take an extra dimension argument for calculation of the residue fraction over the given dimension

source

MIToS.MSA.sequence_id — Method

sequence_id(seq::Union{AbstractSequence,AbstractAlignedSequence})

It returns the sequence identifier of a sequence object.

source

MIToS.MSA.sequence_index — Method

sequence_index(msa, seq_name)

Return the index (integer position) of the sequence with name seq_name in the MSA msa. A KeyError is thrown if the sequence name does not exist. If seq_name is an integer, the same integer is returned without checking if it is a valid index.

source

MIToS.MSA.sequencename_iterator — Method

sequencename_iterator(msa)

It returns an iterator that returns the sequence names/identifiers of the msa.

source

MIToS.MSA.sequencenames — Method

sequencenames(msa)

It returns a Vector{String} with the sequence names/identifiers.

source

MIToS.MSA.sequencepairsmatrix — Method

Initialize an empty PairwiseListMatrix for a pairwise measure in column pairs. It uses the column mapping (column number in the input MSA file) if it’s available, otherwise it uses the actual column numbers. You can use the positional argument to indicate the number Type (default: Float64), if the PairwiseListMatrix should store the diagonal values on the list (default: false) and a default value for the diagonal (default: NaN).

source

MIToS.MSA.setannotcolumn! — Function

setannotcolumn!(ann, feature, annotation)

It stores per column annotation (1 char per column) for feature

source

MIToS.MSA.setannotfile! — Function

setannotfile!(ann, feature, annotation)

It stores per file annotation for feature

source

MIToS.MSA.setannotresidue! — Function

setannotresidue!(ann, seqname, feature, annotation)

It stores per residue annotation (1 char per residue) for (seqname, feature)

source

MIToS.MSA.setannotsequence! — Function

setannotsequence!(ann, seqname, feature, annotation)

It stores per sequence annotation for (seqname, feature)

source

MIToS.MSA.setreference! — Function

It puts the sequence i (name or position) as reference (first sequence) of the MSA. This function swaps the sequences 1 and i.

source

MIToS.MSA.shuffle_msa! — Method

shuffle_msa!([rng=default_rng(),] msa::AbstractMatrix{Residue}, subset=Colon(); dims=2, fixedgaps=true, fixed_reference=false)

In-place version of shuffle_msa. It randomly permute residues in the MSA msa along sequences (dims=1) or columns (dims=2, the default). The optional positional argument subset allows to shuffle only a subset of the sequences or columns. The optional keyword argument fixedgaps indicates if the gaps should remain their positions (true by default). The optional keyword argument fixed_reference indicates if the residues in the first sequence should remain in their positions (false by default).

source

MIToS.MSA.shuffle_msa — Method

shuffle_msa([rng=default_rng(),] msa::AbstractMatrix{Residue}, subset=Colon(); dims=2, fixedgaps=true, fixed_reference=false)

It randomly permute residues in the MSA msa along sequences (dims=1) or columns (dims=2, the default). The optional positional argument subset allows to shuffle only a subset of the sequences or columns. The optional keyword argument fixedgaps indicates if the gaps should remain their positions (true by default). The optional keyword argument fixed_reference indicates if the residues in the first sequence should remain in their positions (false by default). To shuffle in-place, see shuffle_msa!.

Example

using MIToS.MSA, Random
msa = hcat(res"RRE", res"DDK", res"G--")
Random.seed!(42);
shuffle_msa(msa, dims = 1, fixedgaps = true)
Random.seed!(42);
shuffle_msa(msa, dims = 1, fixedgaps = false)

source

MIToS.MSA.stringsequence — Method

stringsequence(seq)
stringsequence(msa, i::Int)
stringsequence(msa, id::String)

It returns the selected sequence as a String.

source

MIToS.MSA.sum_of_pairs_score — Method

sum_of_pairs_score(msa; gap_open=-10, gap_extend=-1,
                   matrix=ResidueSubstitutionMatrices.BLOSUM62)

Calculate the sum-of-pairs score of an alignment using a substitution matrix. By default, the BLOSUM62 matrix is used. The penalties for gap opening and extension can be adjusted with the gap_open and gap_extend keyword arguments. Pass a different matrix to use another substitution scheme.

source

MIToS.MSA.swapsequences! — Method

It swaps the sequences on the positions i and j of an MSA. Also it's possible to swap sequences using their sequence names/identifiers when the MSA object as names.

source

MIToS.MSA.three2residue — Method

It takes a three letter residue name and returns the corresponding Residue. If the name isn't in the MIToS dictionary, a XAA is returned.

julia> using MIToS.MSA

julia> three2residue("ALA")
A

source

MIToS.Utils.parse_file — Function

parse_file(io, format[, output; generatemapping, useidcoordinates, deletefullgaps])

The keyword argument generatemapping (false by default) indicates if the mapping of the sequences ("SeqMap") and columns ("ColMap") and the number of columns in the original MSA ("NCol") should be generated and saved in the annotations. If useidcoordinates is true (default: false) the sequence IDs of the form "ID/start-end" are parsed and used for determining the start and end positions when the mappings are generated. deletefullgaps (true by default) indicates if columns 100% gaps (generally inserts from a HMM) must be removed from the MSA.

source

Random.shuffle — Function

It's like shuffle but in-place. When a Matrix{Residue} or a AbstractAlignedObject (sequence or MSA) is used, you can indicate if the gaps should remain their positions using the last boolean argument.

DEPRECATED: This method is deprecated. Use shuffle_msa instead.

source

Random.shuffle! — Function

It's like Random.shuffle. When a Matrix{Residue} is used, you can indicate if the gaps should remain their positions using the last boolean argument. The previous argument should be the dimension to shuffle, 1 for shuffling residues in a sequence (row) or 2 for shuffling residues in a column.

DEPRECATED: This method is deprecated. Use shuffle_msa! instead.

source

MSA

Contents

Types

Constants

Macros

Methods and functions