SIFTS
MIToS.SIFTS — ModuleThe SIFTS module of MIToS allows to obtain the residue-level mapping between databases stored in the SIFTS XML files. It makes easy to assign PDB residues to UniProt/Pfam positions. Given the fact that pairwise alignments can lead to misleading association between residues in both sequences, SIFTS offers more reliable association between sequence and structure residue numbers.
Features
- Download and parse SIFTS XML files
- Store residue-level mapping in Julia
- Easy generation of
OrderedDicts between residues numbers
using MIToS.SIFTSContents
Types
MIToS.SIFTS.SIFTSResidue — TypeA SIFTSResidue object stores the SIFTS residue level mapping for a residue. It has the following fields that you can access at any moment for query purposes:
- `PDBe` : A `dbPDBe` object, it's present in all the `SIFTSResidue`s.
- `UniProt` : A `dbUniProt` object or `missing`.
- `Pfam` : A `dbPfam` object or `missing`.
- `NCBI` : A `dbNCBI` object or `missing`.
- `InterPro` : An array of `dbInterPro` objects.
- `PDB` : A `dbPDB` object or `missing`.
- `SCOP` : A `dbSCOP` object or `missing`.
- `SCOP2` : An array of `dbSCOP2` objects.
- `SCOP2B` : A `dbSCOP2B` object or `missing`.
- `CATH` : A `dbCATH` object or `missing`.
- `Ensembl` : An array of `dbEnsembl` objects.
- `missing` : It's `true` if the residue is missing, i.e. not observed, in the structure.
- `sscode` : A string with the secondary structure code of the residue.
- `ssname` : A string with the secondary structure name of the residue.MIToS.SIFTS.dbCATH — TypedbCATH stores the residue id, number, name and chain in CATH as strings.
MIToS.SIFTS.dbEnsembl — TypedbEnsembl stores the residue (gene) accession id, the transcript, translation and exon ids in Ensembl as strings, together with the residue number and name using the UniProt coordinates.
MIToS.SIFTS.dbInterPro — TypedbInterPro stores the residue id, number, name and evidence in InterPro as strings.
MIToS.SIFTS.dbNCBI — TypedbNCBI stores the residue id, number and name in NCBI as strings.
MIToS.SIFTS.dbPDB — TypedbPDB stores the residue id, number, name and chain in PDB as strings.
MIToS.SIFTS.dbPDBe — TypedbPDBe stores the residue number and name in PDBe as strings.
MIToS.SIFTS.dbPfam — TypedbPfam stores the residue id, number and name in Pfam as strings.
MIToS.SIFTS.dbSCOP — TypedbSCOP stores the residue id, number, name and chain in SCOP as strings.
MIToS.SIFTS.dbSCOP2 — TypedbSCOP2 stores the residue id, number, name and chain in SCOP2 as strings.
MIToS.SIFTS.dbSCOP2B — TypedbSCOP2B stores the residue id, number, name and chain in SCOP2B as strings. SCOP2B is expansion of SCOP2 domain annotations at superfamily level to every PDB with same UniProt accession having at least 80% SCOP2 domain coverage.
MIToS.SIFTS.dbUniProt — TypedbUniProt stores the residue id, number and name in UniProt as strings.
Constants
Macros
Methods and functions
MIToS.SIFTS.downloadsifts — Methoddownloadsifts(pdbcode::AbstractString; filename::AbstractString, source::AbstractString="ftp")Download the gzipped SIFTS XML file for the provided pdbcode. The downloaded file will have the default extension .xml.gz. While you can change the filename, it must include the .xml.gz ending. The source keyword argument is set to "ftp" by default, downloading from the HTTPS mirror at https://ftp.ebi.ac.uk/pub/databases/msd/sifts/split_xml/. Alternatively, you can choose "https" as the source to download directly from the EBI PDBe server at https://www.ebi.ac.uk/pdbe/files/sifts/.
MIToS.SIFTS.siftsmapping — MethodParses a SIFTS XML file and returns a OrderedDict between residue numbers of two DataBases with the given identifiers. A chain could be specified (All by default). If missings is true (default) all the residues are used, even if they haven’t coordinates in the PDB file.
MIToS.Utils.parse_file — Methodparse_file(document::LightXML.XMLDocument, ::Type{SIFTSXML}; chain=All, missings::Bool=true)
Returns a Vector{SIFTSResidue} parsed from a SIFTSXML file. By default, parses all the chains and includes missing residues.