Pfam
MIToS.Pfam
— ModuleThe Pfam
module, defines functions to measure the protein contact prediction performance of information measure between column pairs from a Pfam MSA.
Features
- Read and download Pfam MSAs
- Obtain PDB information from alignment annotations
- Map between sequence/alignment residues/columns and PDB structures
- Measure of AUC (ROC curve) for contact prediction of MI scores
using MIToS.Pfam
Contents
Types
Constants
Macros
Methods and functions
MIToS.Pfam.downloadpfam
— MethodIt downloads a gzipped Stockholm alignment from InterPro for the Pfam family with the given pfamcode
.
By default, it downloads the full
Pfam alignment. You can use the alignment
keyword argument to download the seed
or the uniprot
alignment instead. For example, downloadpfam("PF00069")
will download the full alignment for the PF00069 Pfam family, while downloadpfam("PF00069", alignment="seed")
will download the seed alignment of the family.
The extension of the downloaded file is .stockholm.gz
by default; you can change it using the filename
keyword argument, but the .gz
at the end is mandatory.
MIToS.Pfam.getcontactmasks
— MethodThis function takes a msacontacts
or its list of contacts contact_list
with 1.0 for true contacts and 0.0 for not contacts (NaN or other numbers for missing values). Returns two BitVector
s, the first with true
s where contact_list
is 1.0 and the second with true
s where contact_list
is 0.0. There are useful for AUC calculations.
MIToS.Pfam.getseq2pdb
— MethodGenerates from a Pfam msa
a Dict{String, Vector{Tuple{String,String}}}
. Keys are sequence IDs and each value is a list of tuples containing PDB code and chain.
julia> getseq2pdb(msa)
Dict{String,Array{Tuple{String,String},1}} with 1 entry:
"F112_SSV1/3-112" => [("2VQC","A")]
MIToS.Pfam.hasresidues
— MethodReturns a BitVector
where there is a true
for each column with PDB residue.
MIToS.Pfam.msacolumn2pdbresidue
— Methodmsacolumn2pdbresidue(msa, seqid, pdbid, chain, pfamid, siftsfile; strict=false, checkpdbname=false, missings=true)
This function returns a OrderedDict{Int,String}
with MSA column numbers on the input file as keys and PDB residue numbers (""
for missings) as values. The mapping is performed using SIFTS. This function needs correct ColMap and SeqMap annotations. This checks correspondence of the residues between the MSA sequence and SIFTS (It throws a warning if there are differences). Missing residues are included if the keyword argument missings
is true
(default: true
). If the keyword argument strict
is true
(default: false
), throws an Error, instead of a Warning, when residues don't match. If the keyword argument checkpdbname
is true
(default: false
), throws an Error if the three letter name of the PDB residue isn't the MSA residue. If you are working with a downloaded Pfam MSA without modifications, you should read
it using generatemapping=true
and useidcoordinates=true
. If you don't indicate the path to the siftsfile
used in the mapping, this function downloads the SIFTS file in the current folder. If you don't indicate the Pfam accession number (pfamid
), this function tries to read the AC file annotation.
MIToS.Pfam.msacontacts
— FunctionThis function takes an AnnotatedMultipleSequenceAlignment
with correct ColMap annotations and two dicts:
- The first is an
OrderedDict{String,PDBResidue}
from PDB residue number toPDBResidue
. - The second is a
Dict{Int,String}
from MSA column number on the input file to PDB residue number.
msacontacts
returns a PairwiseListMatrix{Float64,false}
of 0.0
and 1.0
where 1.0
indicates a residue contact. Contacts are defined with an inter residue distance less or equal to distance_limit
(default to 6.05
) angstroms between any heavy atom. NaN
indicates a missing value.
MIToS.Pfam.msaresidues
— MethodThis function takes an AnnotatedMultipleSequenceAlignment
with correct ColMap annotations and two dicts:
- The first is an
OrderedDict{String,PDBResidue}
from PDB residue number toPDBResidue
. - The second is a
Dict{Int,String}
from MSA column number on the input file to PDB residue number.
msaresidues
returns an OrderedDict{Int,PDBResidue}
from input column number (ColMap) to PDBResidue
. Residues on inserts are not included.