Pfam

MIToS.PfamModule

The Pfam module, defines functions to measure the protein contact prediction performance of information measure between column pairs from a Pfam MSA.

Features

  • Read and download Pfam MSAs
  • Obtain PDB information from alignment annotations
  • Map between sequence/alignment residues/columns and PDB structures
  • Measure of AUC (ROC curve) for contact prediction of MI scores
using MIToS.Pfam
source

Contents

Types

Constants

Macros

Methods and functions

MIToS.Pfam.downloadpfamMethod

It downloads a gzipped Stockholm alignment from InterPro for the Pfam family with the given pfamcode.

By default, it downloads the full Pfam alignment. You can use the alignment keyword argument to download the seed or the uniprot alignment instead. For example, downloadpfam("PF00069") will download the full alignment for the PF00069 Pfam family, while downloadpfam("PF00069", alignment="seed") will download the seed alignment of the family.

The extension of the downloaded file is .stockholm.gz by default; you can change it using the filename keyword argument, but the .gz at the end is mandatory.

source
MIToS.Pfam.getcontactmasksMethod

This function takes a msacontacts or its list of contacts contact_list with 1.0 for true contacts and 0.0 for not contacts (NaN or other numbers for missing values). Returns two BitVectors, the first with trues where contact_list is 1.0 and the second with trues where contact_list is 0.0. There are useful for AUC calculations.

source
MIToS.Pfam.getseq2pdbMethod

Generates from a Pfam msa a Dict{String, Vector{Tuple{String,String}}}. Keys are sequence IDs and each value is a list of tuples containing PDB code and chain.

julia> getseq2pdb(msa)
Dict{String,Array{Tuple{String,String},1}} with 1 entry:
  "F112_SSV1/3-112" => [("2VQC","A")]
source
MIToS.Pfam.msacolumn2pdbresidueMethod

msacolumn2pdbresidue(msa, seqid, pdbid, chain, pfamid, siftsfile; strict=false, checkpdbname=false, missings=true)

This function returns a OrderedDict{Int,String} with MSA column numbers on the input file as keys and PDB residue numbers ("" for missings) as values. The mapping is performed using SIFTS. This function needs correct ColMap and SeqMap annotations. This checks correspondence of the residues between the MSA sequence and SIFTS (It throws a warning if there are differences). Missing residues are included if the keyword argument missings is true (default: true). If the keyword argument strict is true (default: false), throws an Error, instead of a Warning, when residues don't match. If the keyword argument checkpdbname is true (default: false), throws an Error if the three letter name of the PDB residue isn't the MSA residue. If you are working with a downloaded Pfam MSA without modifications, you should read it using generatemapping=true and useidcoordinates=true. If you don't indicate the path to the siftsfile used in the mapping, this function downloads the SIFTS file in the current folder. If you don't indicate the Pfam accession number (pfamid), this function tries to read the AC file annotation.

source
MIToS.Pfam.msacontactsFunction

This function takes an AnnotatedMultipleSequenceAlignment with correct ColMap annotations and two dicts:

  1. The first is an OrderedDict{String,PDBResidue} from PDB residue number to PDBResidue.
  2. The second is a Dict{Int,String} from MSA column number on the input file to PDB residue number.

msacontacts returns a PairwiseListMatrix{Float64,false} of 0.0 and 1.0 where 1.0 indicates a residue contact. Contacts are defined with an inter residue distance less or equal to distance_limit (default to 6.05) angstroms between any heavy atom. NaN indicates a missing value.

source
MIToS.Pfam.msaresiduesMethod

This function takes an AnnotatedMultipleSequenceAlignment with correct ColMap annotations and two dicts:

  1. The first is an OrderedDict{String,PDBResidue} from PDB residue number to PDBResidue.
  2. The second is a Dict{Int,String} from MSA column number on the input file to PDB residue number.

msaresidues returns an OrderedDict{Int,PDBResidue} from input column number (ColMap) to PDBResidue. Residues on inserts are not included.

source
ROCAnalysis.AUCMethod

AUC(scores::PairwiseListMatrix, msacontacts::PairwiseListMatrix)

Returns the Area Under a ROC (Receiver Operating Characteristic) Curve (AUC) of the scores for msacontact prediction. score and msacontact lists are vinculated (inner join) by their labels (i.e. column number in the file). msacontact should have 1.0 for true contacts and 0.0 for not contacts (NaN or other numbers for missing values). You need to do using ROCAnalysis before using this function.

source