Pfam
MIToS.Pfam
— Module.The Pfam
module, defines functions to measure the protein contact prediction performance of information measure between column pairs from a Pfam MSA.
Features
Read and download Pfam MSAs
Obtain PDB information from alignment annotations
Map between sequence/alignment residues/columns and PDB structures
Measure of AUC (ROC curve) for contact prediction of MI scores
using MIToS.Pfam
Contents
Types
Constants
Macros
Methods and functions
MIToS.Pfam.downloadpfam
— Method.Download a gzipped stockholm full alignment for the pfamcode
. The extension of the downloaded file is .stockholm.gz
by default. The filename
can be changed, but the .gz
at the end is mandatory.
MIToS.Pfam.getcontactmasks
— Method.This function takes a msacontacts
or its list of contacts contact_list
with 1.0 for true contacts and 0.0 for not contacts (NaN or other numbers for missing values). Returns two BitVector
s, the first with true
s where contact_list
is 1.0 and the second with true
s where contact_list
is 0.0. There are useful for AUC calculations.
MIToS.Pfam.getseq2pdb
— Method.Generates from a Pfam msa
a Dict{String, Vector{Tuple{String,String}}}
. Keys are sequence IDs and each value is a list of tuples containing PDB code and chain.
julia> getseq2pdb(msa)
Dict{String,Array{Tuple{String,String},1}} with 1 entry:
"F112_SSV1/3-112" => [("2VQC","A")]
MIToS.Pfam.hasresidues
— Method.Returns a BitVector
where there is a true
for each column with PDB residue.
MIToS.Pfam.msacolumn2pdbresidue
— Method.msacolumn2pdbresidue(msa, seqid, pdbid, chain, pfamid, siftsfile; strict=false, checkpdbname=false, missings=true)
This function returns a Dict{Int,String}
with MSA column numbers on the input file as keys and PDB residue numbers (""
for missings) as values. The mapping is performed using SIFTS. This function needs correct ColMap and SeqMap annotations. This checks correspondence of the residues between the MSA sequence and SIFTS (It throws a warning if there are differences). Missing residues are included if the keyword argument missings
is true
(default: true
). If the keyword argument strict
is true
(default: false
), throws an Error, instead of a Warning, when residues don't match. If the keyword argument checkpdbname
is true
(default: false
), throws an Error if the three letter name of the PDB residue isn't the MSA residue. If you are working with a downloaded Pfam MSA without modifications, you should read
it using generatemapping=true
and useidcoordinates=true
. If you don't indicate the path to the siftsfile
used in the mapping, this function downloads the SIFTS file in the current folder. If you don't indicate the Pfam accession number (pfamid
), this function tries to read the AC file annotation.
MIToS.Pfam.msacontacts
— Function.This function takes an AnnotatedMultipleSequenceAlignment
with correct ColMap annotations and two dicts:
The first is an
OrderedDict{String,PDBResidue}
from PDB residue number toPDBResidue
.The second is a
Dict{Int,String}
from MSA column number on the input file to PDB residue number.
msacontacts
returns a PairwiseListMatrix{Float64,false}
of 0.0
and 1.0
where 1.0
indicates a residue contact. Contacts are defined with an inter residue distance less or equal to distance_limit
(default to 6.05
) angstroms between any heavy atom. NaN
indicates a missing value.
MIToS.Pfam.msaresidues
— Method.This function takes an AnnotatedMultipleSequenceAlignment
with correct ColMap annotations and two dicts:
The first is an
OrderedDict{String,PDBResidue}
from PDB residue number toPDBResidue
.The second is a
Dict{Int,String}
from MSA column number on the input file to PDB residue number.
msaresidues
returns an OrderedDict{Int,PDBResidue}
from input column number (ColMap) to PDBResidue
. Residues on inserts are not included.
ROCAnalysis.AUC
— Method.AUC(scores_list::Vector, true_contacts::BitVector, false_contacts::BitVector)
Returns the Area Under a ROC (Receiver Operating Characteristic) Curve (AUC) of the scores_list
for true_contacts
prediction. The three vectors should have the same length and false_contacts
should be true
where there are not contacts.
ROCAnalysis.AUC
— Method.AUC(scores::PairwiseListMatrix, msacontacts::PairwiseListMatrix)
Returns the Area Under a ROC (Receiver Operating Characteristic) Curve (AUC) of the scores
for msacontact
prediction. score
and msacontact
lists are vinculated (inner join) by their labels (i.e. column number in the file). msacontact
should have 1.0 for true contacts and 0.0 for not contacts (NaN or other numbers for missing values).
ROCAnalysis.AUC
— Method.AUC(scores::PairwiseListMatrix, true_contacts::BitVector, false_contacts::BitVector)
Returns the Area Under a ROC (Receiver Operating Characteristic) Curve (AUC) of the scores
for true_contacts
prediction. scores
, true_contacts
and false_contacts
should have the same number of elements and false_contacts
should be true
where there are not contacts.