SIFTS
MIToS.SIFTS
— ModuleThe SIFTS
module of MIToS allows to obtain the residue-level mapping between databases stored in the SIFTS XML files. It makes easy to assign PDB residues to UniProt/Pfam positions. Given the fact that pairwise alignments can lead to misleading association between residues in both sequences, SIFTS offers more reliable association between sequence and structure residue numbers.
Features
- Download and parse SIFTS XML files
- Store residue-level mapping in Julia
- Easy generation of
OrderedDict
s between residues numbers
using MIToS.SIFTS
Contents
Types
MIToS.SIFTS.SIFTSResidue
— TypeA SIFTSResidue
object stores the SIFTS residue level mapping for a residue. It has the following fields that you can access at any moment for query purposes:
- `PDBe` : A `dbPDBe` object, it's present in all the `SIFTSResidue`s.
- `UniProt` : A `dbUniProt` object or `missing`.
- `Pfam` : A `dbPfam` object or `missing`.
- `NCBI` : A `dbNCBI` object or `missing`.
- `InterPro` : An array of `dbInterPro` objects.
- `PDB` : A `dbPDB` object or `missing`.
- `SCOP` : A `dbSCOP` object or `missing`.
- `SCOP2` : An array of `dbSCOP2` objects.
- `SCOP2B` : A `dbSCOP2B` object or `missing`.
- `CATH` : A `dbCATH` object or `missing`.
- `Ensembl` : An array of `dbEnsembl` objects.
- `missing` : It's `true` if the residue is missing, i.e. not observed, in the structure.
- `sscode` : A string with the secondary structure code of the residue.
- `ssname` : A string with the secondary structure name of the residue.
MIToS.SIFTS.dbCATH
— TypedbCATH
stores the residue id
, number
, name
and chain
in CATH as strings.
MIToS.SIFTS.dbEnsembl
— TypedbEnsembl
stores the residue (gene) accession id
, the transcript
, translation
and exon
ids in Ensembl as strings, together with the residue number
and name
using the UniProt coordinates.
MIToS.SIFTS.dbInterPro
— TypedbInterPro
stores the residue id
, number
, name
and evidence
in InterPro as strings.
MIToS.SIFTS.dbNCBI
— TypedbNCBI
stores the residue id
, number
and name
in NCBI as strings.
MIToS.SIFTS.dbPDB
— TypedbPDB
stores the residue id
, number
, name
and chain
in PDB as strings.
MIToS.SIFTS.dbPDBe
— TypedbPDBe
stores the residue number
and name
in PDBe as strings.
MIToS.SIFTS.dbPfam
— TypedbPfam
stores the residue id
, number
and name
in Pfam as strings.
MIToS.SIFTS.dbSCOP
— TypedbSCOP
stores the residue id
, number
, name
and chain
in SCOP as strings.
MIToS.SIFTS.dbSCOP2
— TypedbSCOP2
stores the residue id
, number
, name
and chain
in SCOP2 as strings.
MIToS.SIFTS.dbSCOP2B
— TypedbSCOP2B
stores the residue id
, number
, name
and chain
in SCOP2B as strings. SCOP2B is expansion of SCOP2 domain annotations at superfamily level to every PDB with same UniProt accession having at least 80% SCOP2 domain coverage.
MIToS.SIFTS.dbUniProt
— TypedbUniProt
stores the residue id
, number
and name
in UniProt as strings.
Constants
Macros
Methods and functions
MIToS.SIFTS.downloadsifts
— Methoddownloadsifts(pdbcode::String; filename::String, source::String="https")
Download the gzipped SIFTS XML file for the provided pdbcode
. The downloaded file will have the default extension .xml.gz
. While you can change the filename
, it must include the .xml.gz
ending. The source
keyword argument is set to "https"
by default. Alternatively, you can choose "ftp"
as the source
, which will retrieve the file from the EBI FTP server at ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/. However, please note that using "https"
is highly recommended. This option will download the file from the EBI PDBe server at https://www.ebi.ac.uk/pdbe/files/sifts/.
MIToS.SIFTS.siftsmapping
— MethodParses a SIFTS XML file and returns a OrderedDict
between residue numbers of two DataBase
s with the given identifiers. A chain
could be specified (All
by default). If missings
is true
(default) all the residues are used, even if they haven’t coordinates in the PDB file.
MIToS.Utils.parse_file
— Methodparse_file(document::LightXML.XMLDocument, ::Type{SIFTSXML}; chain=All, missings::Bool=true)
Returns a Vector{SIFTSResidue}
parsed from a SIFTSXML
file. By default, parses all the chain
s and includes missing residues.