Pfam

Pfam

MIToS.PfamModule.

The Pfam module, defines functions to measure the protein contact prediction performance of information measure between column pairs from a Pfam MSA.

Features

  • Read and download Pfam MSAs

  • Obtain PDB information from alignment annotations

  • Map between sequence/alignment residues/columns and PDB structures

  • Measure of AUC (ROC curve) for contact prediction of MI scores

using MIToS.Pfam
source

Contents

Types

Constants

Macros

Methods and functions

Download a gzipped stockholm full alignment for the pfamcode. The extension of the downloaded file is .stockholm.gz by default. The filename can be changed, but the .gz at the end is mandatory.

source

This function takes a msacontacts or its list of contacts contact_list with 1.0 for true contacts and 0.0 for not contacts (NaN or other numbers for missing values). Returns two BitVectors, the first with trues where contact_list is 1.0 and the second with trues where contact_list is 0.0. There are useful for AUC calculations.

source

Generates from a Pfam msa a Dict{String, Vector{Tuple{String,String}}}. Keys are sequence IDs and each value is a list of tuples containing PDB code and chain.

julia> getseq2pdb(msa)
Dict{String,Array{Tuple{String,String},1}} with 1 entry:
  "F112_SSV1/3-112" => [("2VQC","A")]
source

Returns a BitVector where there is a true for each column with PDB residue.

source

msacolumn2pdbresidue(msa, seqid, pdbid, chain, pfamid, siftsfile; strict=false, checkpdbname=false, missings=true)

This function returns a Dict{Int,String} with MSA column numbers on the input file as keys and PDB residue numbers ("" for missings) as values. The mapping is performed using SIFTS. This function needs correct ColMap and SeqMap annotations. This checks correspondence of the residues between the MSA sequence and SIFTS (It throws a warning if there are differences). Missing residues are included if the keyword argument missings is true (default: true). If the keyword argument strict is true (default: false), throws an Error, instead of a Warning, when residues don't match. If the keyword argument checkpdbname is true (default: false), throws an Error if the three letter name of the PDB residue isn't the MSA residue. If you are working with a downloaded Pfam MSA without modifications, you should read it using generatemapping=true and useidcoordinates=true. If you don't indicate the path to the siftsfile used in the mapping, this function downloads the SIFTS file in the current folder. If you don't indicate the Pfam accession number (pfamid), this function tries to read the AC file annotation.

source

This function takes an AnnotatedMultipleSequenceAlignment with correct ColMap annotations and two dicts:

  1. The first is an OrderedDict{String,PDBResidue} from PDB residue number to PDBResidue.

  2. The second is a Dict{Int,String} from MSA column number on the input file to PDB residue number.

msacontacts returns a PairwiseListMatrix{Float64,false} of 0.0 and 1.0 where 1.0 indicates a residue contact. Contacts are defined with an inter residue distance less or equal to distance_limit (default to 6.05) angstroms between any heavy atom. NaN indicates a missing value.

source

This function takes an AnnotatedMultipleSequenceAlignment with correct ColMap annotations and two dicts:

  1. The first is an OrderedDict{String,PDBResidue} from PDB residue number to PDBResidue.

  2. The second is a Dict{Int,String} from MSA column number on the input file to PDB residue number.

msaresidues returns an OrderedDict{Int,PDBResidue} from input column number (ColMap) to PDBResidue. Residues on inserts are not included.

source
ROCAnalysis.AUCMethod.

AUC(scores_list::Vector, true_contacts::BitVector, false_contacts::BitVector)

Returns the Area Under a ROC (Receiver Operating Characteristic) Curve (AUC) of the scores_list for true_contacts prediction. The three vectors should have the same length and false_contacts should be true where there are not contacts.

source
ROCAnalysis.AUCMethod.

AUC(scores::PairwiseListMatrix, msacontacts::PairwiseListMatrix)

Returns the Area Under a ROC (Receiver Operating Characteristic) Curve (AUC) of the scores for msacontact prediction. score and msacontact lists are vinculated (inner join) by their labels (i.e. column number in the file). msacontact should have 1.0 for true contacts and 0.0 for not contacts (NaN or other numbers for missing values).

source
ROCAnalysis.AUCMethod.

AUC(scores::PairwiseListMatrix, true_contacts::BitVector, false_contacts::BitVector)

Returns the Area Under a ROC (Receiver Operating Characteristic) Curve (AUC) of the scores for true_contacts prediction. scores, true_contacts and false_contacts should have the same number of elements and false_contacts should be true where there are not contacts.

source