PDB
MIToS.PDB
— ModuleThe module PDB
defines types and methods to work with protein structures inside Julia. It is useful to link structural and sequential information, and needed for measure the predictive performance at protein contact prediction of mutual information scores.
Features
- Read and parse PDF and PDBML files
- Calculate distance and contacts between atoms or residues
- Determine interaction between residues
using MIToS.PDB
Contents
Types
MIToS.PDB.Coordinates
— TypeA Coordinates
object is a fixed size vector with the coordinates x,y,z.
MIToS.PDB.PDBAtom
— TypeA PDBAtom
object contains the information from a PDB atom, without information of the residue. It has the following fields that you can access at any moment for query purposes:
- `coordinates` : x,y,z coordinates, e.g. `Coordinates(109.641,73.162,42.7)`.
- `atom` : Atom name, e.g. `"CA"`.
- `element` : Element type of the atom, e.g. `"C"`.
- `occupancy` : A float number with the occupancy, e.g. `1.0`.
- `B` : B factor as a string, e.g. `"23.60"`.
- `alt_id` : Alternative location ID, e.g. `"A"`.
- `charge` : Charge of the atom, e.g. `"0"`.
MIToS.PDB.PDBFile
— TypePDBFile <: FileFormat
Protein Data Bank (PDB) format. It provides a standard representation for macromolecular structure data derived from X-ray diffraction and NMR studies.
MIToS.PDB.PDBML
— TypePDBML <: FileFormat
Protein Data Bank Markup Language (PDBML), a representation of PDB data in XML format.
MIToS.PDB.PDBResidue
— TypeA PDBResidue
object contains all the information about a PDB residue. It has the following fields that you can access at any moment for query purposes:
- `id` : A `PDBResidueIdentifier` object.
- `atoms` : A vector of `PDBAtom`s.
MIToS.PDB.PDBResidueIdentifier
— TypeA PDBResidueIdentifier
object contains the information needed to identity PDB residues. It has the following fields that you can access at any moment for query purposes:
- `PDBe_number` : It's only used when a PDBML is readed (PDBe number as a string).
- `number` : PDB residue number, it includes insertion codes, e.g. `"34A"`.
- `name` : Three letter residue name in PDB, e.g. `"LYS"`.
- `group` : It can be `"ATOM"` or `"HETATM"`.
- `model` : The model number as a string, e.g. `"1"`.
- `chain` : The chain as a string, e.g. `"A"`.
Constants
MIToS.PDB.covalentradius
— ConstantCovalent radius in Å of each element from the Additional file 1 of PICCOLO [1]. Hydrogen was updated using the value on Table 2 from Cordero et. al. [2].
- Bickerton, G. R., Higueruelo, A. P., & Blundell, T. L. (2011). Comprehensive, atomic-level characterization of structurally characterized protein-protein interactions: the PICCOLO database. BMC bioinformatics, 12(1), 313.
- Cordero, B., Gómez, V., Platero-Prats, A. E., Revés, M., Echeverría, J., Cremades, E., ... & Alvarez, S. (2008). Covalent radii revisited. Dalton Transactions, (21), 2832-2838.
MIToS.PDB.vanderwaalsradius
— Constantvan der Waals radius in Å from the Additional file 1 of Bickerton et. al. 2011
- Bickerton, G. R., Higueruelo, A. P., & Blundell, T. L. (2011). Comprehensive, atomic-level characterization of structurally characterized protein-protein interactions: the PICCOLO database. BMC bioinformatics, 12(1), 313.
Macros
MIToS.PDB.@atoms
— Macro@atoms ... model ... chain ... group ... residue ... atom ...
These return a vector of PDBAtom
s with the selected subset of atoms from a list of residues. You can use the type All
to avoid filtering that option.
DEPRECATED: This macro is deprecated. Use the select_atoms
function instead.
MIToS.PDB.@residues
— Macro@residues ... model ... chain ... group ... residue ...
These return a new vector with the selected subset of residues from a list of residues. You can use the type All
to avoid filtering that option.
DEPRECATED: This macro is deprecated. Use the select_residues
function instead.
MIToS.PDB.@residuesdict
— Macro@residuesdict ... model ... chain ... group ... residue ...
This macro returns a dictionary (using PDB residue numbers as keys) with the selected subset of residues from a list of residues. You can use the type All
to avoid filtering that option.
DEPRECATED: This macro is deprecated. Use the residuesdict
function instead.
Methods and functions
Base.angle
— Methodangle(a::Coordinates, b::Coordinates, c::Coordinates)
Angle (in degrees) at b
between a-b
and b-c
Base.any
— Methodany(f::Function, a::PDBResidue, b::PDBResidue, criteria::Function)
Test if the function f
is true for any pair of atoms between the residues a
and b
. This function only test atoms that returns true
for the fuction criteria
.
Base.any
— Methodany(f::Function, a::PDBResidue, b::PDBResidue)
Test if the function f
is true for any pair of atoms between the residues a
and b
MIToS.PDB.CAmatrix
— MethodReturns a matrix with the x, y and z coordinates of the Cα with best occupancy for each PDBResidue
of the ATOM group. If a residue doesn't have a Cα, its Cα coordinates are NaNs.
MIToS.PDB.aromatic
— MethodThere's an aromatic interaction if centriods are at 6.0 Å or less.
MIToS.PDB.aromaticsulphur
— MethodReturns true
if an sulphur and an aromatic atoms are 5.3 Å or less"
MIToS.PDB.bestoccupancy
— MethodTakes a Vector
of PDBAtom
s and returns a Vector
of the PDBAtom
s with best occupancy.
MIToS.PDB.center!
— Methodcenter!(A::AbstractMatrix{Float64})
Takes a set of points A
as an NxD matrix (N: number of points, D: dimension). Translates A
in place so that its centroid is at the origin of coordinates
MIToS.PDB.centeredcoordinates
— FunctionReturns a Matrix{Float64}
with the centered coordinates of all the atoms in residues
. An optional positional argument CA
(default: true
) defines if only Cα carbons should be used to center the matrix.
MIToS.PDB.centeredresidues
— FunctionReturns a new Vector{PDBResidue}
with the PDBResidue
s having centered coordinates. An optional positional argument CA
(default: true
) defines if only Cα carbons should be used to center the matrix.
MIToS.PDB.change_coordinates
— Functionchange_coordinates(residue::PDBResidue, coordinates::AbstractMatrix{Float64}, offset::Int=1)
Returns a new PDBResidues
with (x,y,z) from a coordinates AbstractMatrix{Float64}
You can give an offset
indicating in wich matrix row starts the (x,y,z) coordinates of the residue.
MIToS.PDB.change_coordinates
— Methodchange_coordinates(residues::AbstractVector{PDBResidue}, coordinates::AbstractMatrix{Float64})
Returns a new Vector{PDBResidues}
with (x,y,z) from a coordinates Matrix{Float64}
MIToS.PDB.change_coordinates
— Methodchange_coordinates(atom::PDBAtom, coordinates::Coordinates)
Returns a new PDBAtom
but with a new coordinates
MIToS.PDB.check_atoms_for_interactions
— MethodThis function takes a PDBResidue
and returns true
only if all the atoms can be used for checking interactions.
MIToS.PDB.contact
— Methodcontact(a::Coordinates, b::Coordinates, limit::AbstractFloat)
It returns true if the distance is less or equal to the limit. It doesn't call sqrt
because it does squared_distance(a,b) <= limit^2
.
MIToS.PDB.contact
— Methodcontact(A::PDBResidue, B::PDBResidue, limit::AbstractFloat; criteria::String="All")
Returns true
if the residues A
and B
are at contact distance (limit
). The available distance criteria
are: Heavy
, All
, CA
, CB
(CA
for GLY
)
MIToS.PDB.contact
— Methodcontact(residues::Vector{PDBResidue}, limit::AbstractFloat; criteria::String="All")
If contact
takes a Vector{PDBResidue}
, It returns a matrix with all the pairwise comparisons (contact map).
MIToS.PDB.coordinatesmatrix
— MethodReturns a matrix with the x, y, z coordinates of each atom in each PDBResidue
MIToS.PDB.covalent
— MethodReturns true
if the distance between atoms is less than the sum of the covalentradius
of each atom.
MIToS.PDB.distance
— MethodIt calculates the squared euclidean distance.
MIToS.PDB.distance
— Methoddistance(residues::Vector{PDBResidue}; criteria::String="All")
If distance
takes a Vector{PDBResidue}
returns a PairwiseListMatrix{Float64, false}
with all the pairwise comparisons (distance matrix).
MIToS.PDB.disulphide
— MethodReturns true
if two CYS
's S
are at 2.08 Å or less
MIToS.PDB.download_alphafold_structure
— Methoddownload_alphafold_structure(uniprot_accession::String; format::Type{T}=MMCIFFile) where T<:FileFormat
This function downloads the structure file (PDB or mmCIF) for a given UniProt Accession from AlphaFoldDB. The uniprot_accession
parameter specifies the UniProt Accession of the protein, e.g. "P00520"
. The format
parameter specifies the file format to download, with the default being mmCIF, i.e. MMCIFFile
. You can set format
to PDBFile
if you want to download a PDB file.
MIToS.PDB.downloadpdb
— Methoddownloadpdb(pdbcode::String; format::Type{T} = MMCIFFile, filename, baseurl, kargs...)
It downloads a gzipped PDB file from PDB database. It requires a four character pdbcode
. Its default format
is MMCIFFile
(mmCIF) and It uses the baseurl
"http://www.rcsb.org/pdb/files/". filename
is the path/name of the output file. This function calls MIToS.Utils.download_file
that calls Downloads.download
. So, you can use keyword arguments, such as headers
, from that function.
MIToS.PDB.downloadpdbheader
— MethodIt downloads a JSON file containing the PDB header information.
MIToS.PDB.findCB
— MethodReturns a vector of indices for CB
(CA
for GLY
)
MIToS.PDB.findatoms
— Methodfindatoms(res::PDBResidue, atom::String)
Returns a index vector of the atoms with the given atom
name.
MIToS.PDB.findheavy
— MethodReturns a list with the index of the heavy atoms (all atoms except hydrogen) in the PDBResidue
MIToS.PDB.getCA
— MethodReturns the Cα with best occupancy in the PDBResidue
. If the PDBResidue
has no Cα, missing
is returned.
MIToS.PDB.getpdbdescription
— MethodAccess general information about a PDB entry (e.g., Header information) using the GraphQL interface of the PDB database. It parses the JSON answer into a JSON3.Object
that can be used as a dictionary.
MIToS.PDB.hydrogenbond
— MethodThis function only works if there are hydrogens in the structure. The criteria for a hydrogen bond are:
- d(Ai, Aj) < 3.9Å
- d(Ah, Aacc) < 2.5Å
- θ(Adon, Ah, Aacc) > 90°
- θ(Adon, Aacc, Aacc-antecedent) > 90°
- θ(Ah, Aacc, Aacc-antecedent) > 90°
Where Ah is the donated hydrogen atom, Adon is the hydrogen bond donor atom, Aacc is the hydrogen bond acceptor atom and Aacc-antecednt is the atom antecedent to the hydrogen bond acceptor atom.
MIToS.PDB.hydrophobic
— MethodThere's an hydrophobic interaction if two hydrophobic atoms are at 5.0 Å or less.
MIToS.PDB.ionic
— MethodThere's an ionic interaction if a cationic and an anionic atoms are at 6.0 Å or less.
MIToS.PDB.is_aminoacid
— Methodis_aminoacid(residue::PDBResidue)
is_aminoacid(residue_id::PDBResidueIdentifier)
This function returns true
if the PDB residue is an amino acid residue. It checks if the residue's three-letter name exists in the MIToS.Utils.THREE2ONE
dictionary, and returns false
otherwise.
MIToS.PDB.isanionic
— MethodReturns true if the atom, e.g. ("GLU","CD")
, is an anionic atom in the residue.
MIToS.PDB.isaromatic
— MethodReturns true if the atom, e.g. ("HIS","CG")
, is an aromatic atom in the residue.
MIToS.PDB.isatom
— MethodIt tests if the atom has the indicated atom name.
MIToS.PDB.iscationic
— MethodReturns true if the atom, e.g. ("ARG","NE")
, is a cationic atom in the residue.
MIToS.PDB.ishbondacceptor
— MethodReturns true if the atom, e.g. ("ARG","O")
, is an acceptor in H bonds.
MIToS.PDB.ishbonddonor
— MethodReturns true if the atom, e.g. ("ARG","N")
, is a donor in H bonds.
MIToS.PDB.isresidue
— Method isresidue(res; model=All, chain=All, group=All, residue=All)
This function tests if a PDBResidue
has the indicated model
, chain
, group
and residue
names/numbers. You can use the type All
(default value) to avoid filtering that level.
MIToS.PDB.kabsch
— Methodkabsch(A::AbstractMatrix{Float64}, B::AbstractMatrix{Float64})
This function takes two sets of points, A
(refrence) and B
as NxD matrices, where D is the dimension and N is the number of points. Assumes that the centroids of A
and B
are at the origin of coordinates. You can call center!
on each matrix before calling kabsch
to center the matrices in the (0.0, 0.0, 0.0)
. Rotates B
so that rmsd(A,B)
is minimized. Returns the rotation matrix. You should do B * RotationMatrix
to get the rotated B.
MIToS.PDB.mean_coordinates
— MethodCalculates the average/mean position of each atom in a set of structure. The function takes a vector (AbstractVector
) of vectors (AbstractVector{PDBResidue}
) or matrices (AbstractMatrix{Float64}
) as first argument. As second (optional) argument this function can take an AbstractVector{Float64}
of matrix/structure weights to return a weighted mean. When a AbstractVector{PDBResidue} is used, if the keyword argument calpha
is false
the RMSF is calculated for all the atoms. By default only alpha carbons are used (default: calpha=true
).
MIToS.PDB.modelled_sequences
— Methodmodelled_sequences(residue_list::AbstractArray{PDBResidue,N};
model::Union{String,Type{All}}=All, chain::Union{String,Type{All}}=All,
group::Union{String,Regex,Type{All}}=All) where N
This function returns an OrderedDict
where each key is a named tuple (containing the model and chain identifiers), and each value is the protein sequence corresponding to the modelled residues in those chains. Therefore, the obtained sequences do not contain missing residues. All modelled residues are included by default, but those that don't satisfy specified criteria based on the model
, chain
, or group
keyword arguments are excluded. One-letter residue names are obtained from the MIToS.Utils.THREE2ONE
dictionary for all residue names that return true
for is_aminoacid
.
MIToS.PDB.pication
— MethodThere's a Π-Cation interaction if a cationic and an aromatic atoms are at 6.0 Å or less
MIToS.PDB.proximitymean
— Methodproximitymean
calculates the proximity mean/average for each residue as the average score (from a scores
list) of all the residues within a certain physical distance to a given amino acid. The score of that residue is not included in the mean unless you set include
to true
. The default values are 6.05 for the distance threshold/limit
and "Heavy"
for the criteria
keyword argument. This function allows to calculate pMI (proximity mutual information) and pC (proximity conservation) as in Buslje et. al. 2010.
References
MIToS.PDB.query_alphafolddb
— Methodquery_alphafolddb(uniprot_accession::String)
This function queries the AlphaFoldDB API to retrieve structure information for a given uniprot_accession
, e.g. "P00520"
. This function returns the structure information as a JSON3.Object
.
MIToS.PDB.residuepairsmatrix
— MethodIt creates a NamedArray
containing a PairwiseListMatrix
where each element (column, row) is identified with a PDBResidue
from the input vector. You can indicate the value type of the matrix (default to Float64
), if the list should have the diagonal values (default to Val{false}
) and the diagonal values (default to NaN
).
MIToS.PDB.residues
— MethodThe residues
function for AbstractArray{PDBResidue,N}
is deprecated. Use the select_residues
function instead. So, residues(residue_list, model, chain, group, residue)
becomes select_residues(residue_list; model=model, chain=chain, group=group, residue=residue)
.
MIToS.PDB.residuesdict
— Method residuesdict(residue_list; model=All, chain=All, group=All, residue=All)
This function returns a dictionary (using PDB residue numbers as keys) with the selected subset of residues. The residues are selected using the keyword arguments model
, chain
, group
and residue
. You can use the type All
(default value) to avoid filtering at a particular level.
MIToS.PDB.rmsd
— Methodrmsd(A::AbstractMatrix{Float64}, B::AbstractMatrix{Float64})
Return RMSD between two sets of points A
and B
, given as NxD matrices (N: number of points, D: dimension).
MIToS.PDB.rmsd
— Methodrmsd(A::AbstractVector{PDBResidue}, B::AbstractVector{PDBResidue}; superimposed::Bool=false)
Returns the Cα RMSD value between two PDB structures: A
and B
. If the structures are already superimposed between them, use superimposed=true
to avoid a new superimposition (superimposed
is false
by default).
MIToS.PDB.rmsf
— MethodCalculates the RMSF (Root Mean-Square-Fluctuation) between an atom and its average position in a set of structures. The function takes a vector (AbstractVector
) of vectors (AbstractVector{PDBResidue}
) or matrices (AbstractMatrix{Float64}
) as first argument. As second (optional) argument this function can take an AbstractVector{Float64}
of matrix/structure weights to return the root weighted mean-square-fluctuation around the weighted mean structure. When a Vector{PDBResidue} is used, if the keyword argument calpha
is false
the RMSF is calculated for all the atoms. By default only alpha carbons are used (default: calpha=true
).
MIToS.PDB.select_atoms
— Methodselect_atoms(residue_list; model=All, chain=All, group=All, residue=All, atom=All, alt_id=All, charge=All)
This function returns a vector of PDBAtom
s with the selected subset of atoms from a list of residues. The atoms are selected using the keyword arguments model
, chain
, group
, residue
, atom
, alt_id
, and charge
. You can use the type All
(default value) to avoid filtering at a particular level.
MIToS.PDB.select_residues
— Methodselect_residues(residue_list; model=All, chain=All, group=All, residue=All)
This function returns a new vector with the selected subset of residues from a list of residues. You can use the keyword arguments model
, chain
, group
and residue
to select the residues. You can use the type All
(default value) to avoid filtering at a particular level.
MIToS.PDB.selectbestoccupancy
— MethodTakes a PDBResidue
and a Vector
of atom indices. Returns the index value of the Vector
with maximum occupancy.
MIToS.PDB.squared_distance
— MethodIt calculates the squared euclidean distance, i.e. it doesn't spend time in sqrt
MIToS.PDB.squared_distance
— Methodsquared_distance(A::PDBResidue, B::PDBResidue; criteria::String="All")
Returns the squared distance between the residues A
and B
. The available criteria
are: Heavy
, All
, CA
, CB
(CA
for GLY
)
MIToS.PDB.superimpose
— FunctionAsuper, Bsuper, RMSD = superimpose(A, B, matches=nothing)
This function takes A::AbstractVector{PDBResidue}
(reference) and B::AbstractVector{PDBResidue}
. Translates A
and B
to the origin of coordinates, and rotates B
so that rmsd(A,B)
is minimized with the Kabsch algorithm (using only their α carbons). Returns the rotated and translated versions of A
and B
, and the RMSD value.
Optionally provide matches
which iterates over matched index pairs in A
and B
, e.g., matches = [(3, 5), (4, 6), ...]
. The alignment will be constructed using just the matching residues.
MIToS.PDB.vanderwaals
— MethodTest if two atoms or residues are in van der Waals contact using: distance(a,b) <= 0.5 + vanderwaalsradius[a] + vanderwaalsradius[b]
. It returns distance <= 0.5
if the atoms aren't in vanderwaalsradius
.
MIToS.PDB.vanderwaalsclash
— MethodReturns true
if the distance between the atoms is less than the sum of the vanderwaalsradius
of the atoms. If the atoms aren't on the list (i.e. OXT
), the vanderwaalsradius
of the element is used. If there is not data in the dict, distance 0.0
is used.
MIToS.Utils.parse_file
— Methodparse_file(pdbml, ::Type{PDBML}; chain=All, model=All, group=All, atomname=All, onlyheavy=false, label=true, occupancyfilter=false)
Reads a LightXML.XMLDocument
representing a pdb file. Returns a list of PDBResidue
s (view MIToS.PDB.PDBResidues
). Setting chain
, model
, group
, atomname
and onlyheavy
values can be used to select of a subset of all residues. If not set, all residues are returned. If the keyword argument label
(default: true
) is false
,the auth_ attributes will be use instead of the label_ attributes for chain
, atom
and residue name
fields. The auth_ attributes are alternatives provided by an author in order to match the identification/values used in the publication that describes the structure. If the keyword argument occupancyfilter
(default: false
) is true
, only the atoms with the best occupancy are returned.
MIToS.Utils.parse_file
— Methodparse_file(io, ::Type{MMCIFFile}; chain=All, model=All, group=All, atomname=All, onlyheavy=false, label=true, occupancyfilter=false)
Parse an mmCIF file and returns a list of PDBResidue
s. Setting chain
, model
, group
, atomname
and onlyheavy
values can be used to select a subset of residues. Group can be "ATOM"
or "HETATM"
. If those keyword arguments are not set, all residues are returned. If the keyword argument label
(default: true
) is false
, the auth_ attributes will be used instead of the label_ attributes for chain
, atom
, and residue name
fields. The auth_ attributes are alternatives provided by an author in order to match the identification/values used in the publication that describes the structure. If the keyword argument occupancyfilter
(default: false
) is true
, only the atoms with the best occupancy are returned.
MIToS.Utils.parse_file
— Methodparse_file(io, ::Type{PDBFile}; chain=All, model=All, group=All, atomname=All, onlyheavy=false, occupancyfilter=false)
Reads a text file of a PDB entry. Returns a list of PDBResidue
(view MIToS.PDB.PDBResidues
). Setting chain
, model
, group
, atomname
and onlyheavy
values can be used to select of a subset of all residues. Group can be "ATOM"
or "HETATM"
. If not set, all residues are returned. If the keyword argument occupancyfilter
(default: false
) is true
, only the atoms with the best occupancy are returned.
MIToS.Utils.print_file
— Functionprint_file(io, res, format::Type{PDBFile})
print_file(res, format::Type{PDBFile})
Print a PDBResidue
or a vector of PDBResidue
s in PDB format.