PDB

MIToS.PDBModule

The module PDB defines types and methods to work with protein structures inside Julia. It is useful to link structural and sequential information, and needed for measure the predictive performance at protein contact prediction of mutual information scores.

Features

  • Read and parse mmCIF, PDB and PDBML files.
  • Calculate distance and contacts between atoms or residues
  • Determine interaction between residues
using MIToS.PDB
source

Contents

Types

BioStructures.MolecularStructureMethod
BioStructures.MolecularStructure(residues::Vector{PDBResidue})

Build a BioStructures.MolecularStructure from a vector of PDBResidue objects without creating an intermediate mmCIF dictionary.

source
MIToS.PDB.PDBAtomType

A PDBAtom object contains the information from a PDB atom, without information of the residue. It has the following fields that you can access at any moment for query purposes:

- `coordinates` : x,y,z coordinates, e.g. `Coordinates(109.641,73.162,42.7)`.
- `atom` : Atom name, e.g. `"CA"`.
- `element` : Element type of the atom, e.g. `"C"`.
- `occupancy` : A float number with the occupancy, e.g. `1.0`.
- `B` : B factor as a string, e.g. `"23.60"`.
- `alt_id` : Alternative location ID, e.g. `"A"`.
- `charge` : Charge of the atom, e.g. `"0"`.
source
MIToS.PDB.PDBFileType

PDBFile <: FileFormat

Protein Data Bank (PDB) format. It provides a standard representation for macromolecular structure data derived from X-ray diffraction and NMR studies.

source
MIToS.PDB.PDBMLType

PDBML <: FileFormat

Protein Data Bank Markup Language (PDBML), a representation of PDB data in XML format.

source
MIToS.PDB.PDBResidueType

A PDBResidue object contains all the information about a PDB residue. It has the following fields that you can access at any moment for query purposes:

- `id` : A `PDBResidueIdentifier` object.
- `atoms` : A vector of `PDBAtom`s.
source
MIToS.PDB.PDBResidueIdentifierType

A PDBResidueIdentifier object contains the information needed to identity PDB residues. It has the following fields that you can access at any moment for query purposes:

- `PDBe_number` : It's only used when a PDBML is readed (PDBe number as a string).
- `number` : PDB residue number, it includes insertion codes, e.g. `"34A"`.
- `name` : Three letter residue name in PDB, e.g. `"LYS"`.
- `group` : It can be `"ATOM"` or `"HETATM"`.
- `model` : The model number as a string, e.g. `"1"`.
- `chain` : The chain as a string, e.g. `"A"`.
source

Constants

MIToS.PDB.COVALENT_RADIIConstant

Covalent radii (Å) for the chemical elements, taken from Table 2 of Cordero et al. (2008). When multiple values are reported for an element, we select a single value; typically the largest to minimize false positives when looking for covalent bonds. Specifically, for C, we use the sp3 value, and for Mn, Fe, and Co, we use the high-spin (h.s.) values.

References

- [Cordero, Beatriz, et al. "Covalent radii revisited." Dalton Transactions 
  21 (2008): 2832-2838.](@cite 10.1039/B801115J)
source
MIToS.PDB.VAN_DER_WAALS_RADIIConstant

A dictionary mapping element symbols to their van der Waals radii (in Å), as reported by Alvarez (2013, Table S1).

References

- [Alvarez, Santiago. "A cartography of the van der Waals territories." Dalton 
  Transactions 42.24 (2013): 8617-8636.](@cite C3DT50599E)
source
MIToS.PDB.vanderwaalsradiusConstant

van der Waals radius in Å from the Additional file 1 of Bickerton et al.

Warning

This variable is deprecated and will be removed in future releases. Please use VAN_DER_WAALS_RADII instead.

References

- [Bickerton, George R., Alicia P. Higueruelo, and Tom L. Blundell. "Comprehensive, 
  atomic-level characterization of structurally characterized protein-protein 
  interactions: the PICCOLO database." BMC bioinformatics 
  12 (2011): 1-15.](@cite 10.1186/1471-2105-12-313)
source

Macros

MIToS.PDB.@atomsMacro

@atoms ... model ... chain ... group ... residue ... atom ...

These return a vector of PDBAtoms with the selected subset of atoms from a list of residues. You can use the type All to avoid filtering that option.

DEPRECATED: This macro is deprecated. Use the select_atoms function instead.

source
MIToS.PDB.@residuesMacro

@residues ... model ... chain ... group ... residue ...

These return a new vector with the selected subset of residues from a list of residues. You can use the type All to avoid filtering that option.

DEPRECATED: This macro is deprecated. Use the select_residues function instead.

source
MIToS.PDB.@residuesdictMacro

@residuesdict ... model ... chain ... group ... residue ...

This macro returns a dictionary (using PDB residue numbers as keys) with the selected subset of residues from a list of residues. You can use the type All to avoid filtering that option.

DEPRECATED: This macro is deprecated. Use the residuesdict function instead.

source

Methods and functions

Base.angleMethod

angle(a::Coordinates, b::Coordinates, c::Coordinates)

Angle (in degrees) at b between a-b and b-c

source
Base.anyMethod

any(f::Function, a::PDBResidue, b::PDBResidue, criteria::Function; kwargs...)

Test if the function f is true for any pair of atoms between the residues a and b. This function only tests atoms that return true for the function criteria. Any keyword arguments provided are forwarded to f. The function f must return a boolean value and have the following signature:

f(a::PDBAtom, b::PDBAtom, resname_a::String, resname_b::String; kwargs...)

For example, the functions covalent, vanderwaals, vanderwaalsclash, disulphide, aromaticsulphur, pication, ionic, and hydrophobic can be used with any.

The function criteria should also return a boolean value and it should have the following signature:

criteria(a::PDBAtom, resname_a::String)
source
Base.anyMethod

any(f::Function, a::PDBResidue, b::PDBResidue)

Test if the function f is true for any pair of atoms between the residues a and b

source
MIToS.PDB.CAmatrixMethod

Returns a matrix with the x, y and z coordinates of the Cα with best occupancy for each PDBResidue of the ATOM group. If a residue doesn't have a Cα, its Cα coordinates are NaNs.

source
MIToS.PDB.center!Method

center!(A::AbstractMatrix{Float64})

Takes a set of points A as an NxD matrix (N: number of points, D: dimension). Translates A in place so that its centroid is at the origin of coordinates

source
MIToS.PDB.centeredcoordinatesFunction

Returns a Matrix{Float64} with the centered coordinates of all the atoms in residues. An optional positional argument CA (default: true) defines if only Cα carbons should be used to center the matrix.

source
MIToS.PDB.centeredresiduesFunction

Returns a new Vector{PDBResidue} with the PDBResidues having centered coordinates. An optional positional argument CA (default: true) defines if only Cα carbons should be used to center the matrix.

source
MIToS.PDB.change_b_factor!Method
change_b_factor!(residue::PDBResidue, value; atom=All)

Change the B-factor of the selected atoms in residue in place and return the modified residue. By default all atoms in the residue are updated. Atom selection follows the same conventions as select_atoms.

source
MIToS.PDB.change_b_factorMethod
change_b_factor(atom::PDBAtom, value)

Return a new PDBAtom with the B-factor changed to value. value is formatted using the PDB Real(6.2) specification and must not exceed six characters once formatted.

source
MIToS.PDB.change_b_factorMethod
change_b_factor(residue::PDBResidue, value; atom=All)

Return a new PDBResidue with the B-factor of the selected atoms changed to value. By default all atoms in the residue are updated. Atom selection follows the same conventions as select_atoms.

source
MIToS.PDB.change_coordinatesFunction

change_coordinates(residue::PDBResidue, coordinates::AbstractMatrix{Float64}, offset::Int=1)

Returns a new PDBResidues with (x,y,z) from a coordinates AbstractMatrix{Float64} You can give an offset indicating in which matrix row starts the (x,y,z) coordinates of the residue.

source
MIToS.PDB.change_coordinatesMethod

change_coordinates(residues::AbstractVector{PDBResidue}, coordinates::AbstractMatrix{Float64})

Returns a new Vector{PDBResidues} with (x,y,z) from a coordinates Matrix{Float64}

source
MIToS.PDB.contactMethod

contact(a::Coordinates, b::Coordinates, limit::AbstractFloat)

It returns true if the distance is less or equal to the limit. It doesn't call sqrt because it does squared_distance(a,b) <= limit^2.

source
MIToS.PDB.contactMethod

contact(A::PDBResidue, B::PDBResidue, limit::AbstractFloat; criteria::String="All")

Returns true if the residues A and B are at contact distance (limit). The available distance criteria are: Heavy, All, CA, CB (CA for GLY)

source
MIToS.PDB.contactMethod

contact(residues::Vector{PDBResidue}, limit::AbstractFloat; criteria::String="All")

If contact takes a Vector{PDBResidue}, It returns a matrix with all the pairwise comparisons (contact map).

source
MIToS.PDB.covalentMethod
covalent(
    a::PDBAtom,
    b::PDBAtom;
    tolerance_factor::Float64 = 1.1,
)

Return true if the distance between atoms is less than the sum of the COVALENT_RADII of each atom taking into account the tolerance factor. By default, the tolerance factor is set to 1.1, allowing for a 10% increase in the sum of the covalent radii. This multiplicative factor can be adjusted using the tolerance_factor keyword argument. This function is based on equation 5 from Kim and Kim (2015). Covalent radii are obtained from Cordero et al. (2008). If the element is not listed in COVALENT_RADII, the radius is set to 0.0, and this function will return false.

Warning

This check considers only interatomic distances; it does not evaluate bond angles, coordination, or other chemical context. Any atom pair closer than the sum of their covalent radii (taking into account the tolerance factor) is flagged as a covalent contact, even if this corresponds to a steric clash rather than a true bond.

References

- [Kim, Y. and Kim, W.Y. (2015), Universal Structure Conversion Method for Organic
  Molecules: From Atomic Connectivity to Three-Dimensional Geometry. 
  Bull. Korean Chem. Soc., 36: 1769-1777.](@cite https://doi.org/10.1002/bkcs.10334)
source
MIToS.PDB.distanceMethod

distance(residues::Vector{PDBResidue}; criteria::String="All")

If distance takes a Vector{PDBResidue} returns a PairwiseListMatrix{Float64, false} with all the pairwise comparisons (distance matrix).

source
MIToS.PDB.download_alphafold_structureMethod
download_alphafold_structure(uniprot_accession::AbstractString; format::Type{T}=MMCIFFile) where T<:FileFormat

This function downloads the structure file (PDB or mmCIF) for a given UniProt Accession from AlphaFoldDB. The uniprot_accession parameter specifies the UniProt Accession of the protein, e.g. "P00520". The format parameter specifies the file format to download, with the default being mmCIF, i.e. MMCIFFile. You can set format to PDBFile if you want to download a PDB file.

source
MIToS.PDB.downloadpdbMethod
downloadpdb(pdbcode::String; format::Type{T} = MMCIFFile, filename, baseurl, kargs...)

It downloads a gzipped PDB file from PDB database. It requires a four character pdbcode. Its default format is MMCIFFile (mmCIF) and It uses the baseurl "http://www.rcsb.org/pdb/files/". filename is the path/name of the output file. This function calls MIToS.Utils.download_file that calls Downloads.download. So, you can use keyword arguments, such as headers, from that function.

source
MIToS.PDB.findatomsMethod

findatoms(res::PDBResidue, atom::String)

Returns a index vector of the atoms with the given atom name.

source
MIToS.PDB.findheavyMethod

Returns a list with the index of the heavy atoms (all atoms except hydrogen) in the PDBResidue

source
MIToS.PDB.getCAMethod

Returns the Cα with best occupancy in the PDBResidue. If the PDBResidue has no Cα, missing is returned.

source
MIToS.PDB.getpdbdescriptionMethod

Access general information about a PDB entry (e.g., Header information) using the GraphQL interface of the PDB database. It parses the JSON answer into a JSON3.Object that can be used as a dictionary.

source
MIToS.PDB.hydrogenbondMethod

This function only works if there are hydrogens in the structure. The criteria for a hydrogen bond are:

  • d(Ai, Aj) < 3.9Å
  • d(Ah, Aacc) < 2.5Å
  • θ(Adon, Ah, Aacc) > 90°
  • θ(Adon, Aacc, Aacc-antecedent) > 90°
  • θ(Ah, Aacc, Aacc-antecedent) > 90°

Where Ah is the donated hydrogen atom, Adon is the hydrogen bond donor atom, Aacc is the hydrogen bond acceptor atom and Aacc-antecednt is the atom antecedent to the hydrogen bond acceptor atom.

source
MIToS.PDB.ionicMethod

There's an ionic interaction if a cationic and an anionic atoms are at 6.0 Å or less.

source
MIToS.PDB.is_aminoacidMethod
is_aminoacid(residue::PDBResidue)
is_aminoacid(residue_id::PDBResidueIdentifier)

This function returns true if the PDB residue is an amino acid residue. It checks if the residue's three-letter name exists in the MIToS.Utils.THREE2ONE dictionary, and returns false otherwise.

source
MIToS.PDB.isresidueMethod
 isresidue(res; model=All, chain=All, group=All, residue=All)

This function tests if a PDBResidue has the indicated model, chain, group and residue names/numbers. You can use the type All (default value) to avoid filtering that level.

source
MIToS.PDB.kabschMethod

kabsch(A::AbstractMatrix{Float64}, B::AbstractMatrix{Float64})

This function takes two sets of points, A (refrence) and B as NxD matrices, where D is the dimension and N is the number of points. Assumes that the centroids of A and B are at the origin of coordinates. You can call center! on each matrix before calling kabsch to center the matrices in the (0.0, 0.0, 0.0). Rotates B so that rmsd(A,B) is minimized. Returns the rotation matrix. You should do B * RotationMatrix to get the rotated B.

source
MIToS.PDB.mean_coordinatesMethod

Calculates the average/mean position of each atom in a set of structure. The function takes a vector (AbstractVector) of vectors (AbstractVector{PDBResidue}) or matrices (AbstractMatrix{Float64}) as first argument. As second (optional) argument this function can take an AbstractVector{Float64} of matrix/structure weights to return a weighted mean. When a AbstractVector{PDBResidue} is used, if the keyword argument calpha is false the RMSF is calculated for all the atoms. By default only alpha carbons are used (default: calpha=true).

source
MIToS.PDB.modelled_sequencesMethod
modelled_sequences(residue_list::AbstractArray{PDBResidue,N}; 
    model::Union{String,Type{All}}=All, chain::Union{String,Type{All}}=All, 
    group::Union{String,Regex,Type{All}}=All) where N

This function returns an OrderedDict where each key is a named tuple (containing the model and chain identifiers), and each value is the protein sequence corresponding to the modelled residues in those chains. Therefore, the obtained sequences do not contain missing residues. All modelled residues are included by default, but those that don't satisfy specified criteria based on the model, chain, or group keyword arguments are excluded. One-letter residue names are obtained from the MIToS.Utils.THREE2ONE dictionary for all residue names that return true for is_aminoacid.

source
MIToS.PDB.peptide_bondFunction
peptide_bond(
    res_a::PDBResidue,
    a::PDBAtom,
    res_b::PDBResidue,
    b::PDBAtom,
    cutoff = 1.38,
)

Return true if the atoms form a peptide bond.

The bond is recognized when the atoms are named "C" and "N", belong to the same chain and model, and are separated by at most cutoff Å. If neither atom is "C" nor "N", the function returns missing.

Peptide bonds have been reported in the 1.28–1.38 Å range, so the maximum value is used to be as inclusive as possible.

References

source
MIToS.PDB.picationMethod

There's a Π-Cation interaction if a cationic and an aromatic atoms are at 6.0 Å or less

source
MIToS.PDB.proximitymeanMethod

proximitymean calculates the proximity mean/average for each residue as the average score (from a scores list) of all the residues within a certain physical distance to a given amino acid. The score of that residue is not included in the mean unless you set include to true. The default values are 6.05 for the distance threshold/limit and "Heavy" for the criteria keyword argument. This function allows to calculate pMI (proximity mutual information) and pC (proximity conservation) as in Buslje et al..

References

source
MIToS.PDB.query_alphafolddbMethod
query_alphafolddb(uniprot_accession::String)

This function queries the AlphaFoldDB API to retrieve structure information for a given uniprot_accession, e.g. "P00520". This function returns the structure information as a JSON3.Object.

source
MIToS.PDB.residuepairsmatrixMethod

It creates a NamedArray containing a PairwiseListMatrix where each element (column, row) is identified with a PDBResidue from the input vector. You can indicate the value type of the matrix (default to Float64), if the list should have the diagonal values (default to Val{false}) and the diagonal values (default to NaN).

source
MIToS.PDB.residuesMethod

The residues function for AbstractArray{PDBResidue,N} is deprecated. Use the select_residues function instead. So, residues(residue_list, model, chain, group, residue) becomes select_residues(residue_list; model=model, chain=chain, group=group, residue=residue).

source
MIToS.PDB.residuesdictMethod
 residuesdict(residue_list; model=All, chain=All, group=All, residue=All)

This function returns a dictionary (using PDB residue numbers as keys) with the selected subset of residues. The residues are selected using the keyword arguments model, chain, group and residue. You can use the type All (default value) to avoid filtering at a particular level.

source
MIToS.PDB.rmsdMethod

rmsd(A::AbstractMatrix{Float64}, B::AbstractMatrix{Float64})

Return RMSD between two sets of points A and B, given as NxD matrices (N: number of points, D: dimension).

source
MIToS.PDB.rmsdMethod

rmsd(A::AbstractVector{PDBResidue}, B::AbstractVector{PDBResidue}; superimposed::Bool=false)

Returns the Cα RMSD value between two PDB structures: A and B. If the structures are already superimposed between them, use superimposed=true to avoid a new superimposition (superimposed is false by default).

source
MIToS.PDB.rmsfMethod

Calculates the RMSF (Root Mean-Square-Fluctuation) between an atom and its average position in a set of structures. The function takes a vector (AbstractVector) of vectors (AbstractVector{PDBResidue}) or matrices (AbstractMatrix{Float64}) as first argument. As second (optional) argument this function can take an AbstractVector{Float64} of matrix/structure weights to return the root weighted mean-square-fluctuation around the weighted mean structure. When a Vector{PDBResidue} is used, if the keyword argument calpha is false the RMSF is calculated for all the atoms. By default only alpha carbons are used (default: calpha=true).

source
MIToS.PDB.select_atomsMethod
select_atoms(residue_list; model=All, chain=All, group=All, residue=All, atom=All, alt_id=All, charge=All)

This function returns a vector of PDBAtoms with the selected subset of atoms from a list of residues. The atoms are selected using the keyword arguments model, chain, group, residue, atom, alt_id, and charge. You can use the type All (default value) to avoid filtering at a particular level.

source
MIToS.PDB.select_residuesMethod
select_residues(residue_list; model=All, chain=All, group=All, residue=All)

This function returns a new vector with the selected subset of residues from a list of residues. You can use the keyword arguments model, chain, group and residue to select the residues. You can use the type All (default value) to avoid filtering at a particular level.

source
MIToS.PDB.squared_distanceMethod

squared_distance(A::PDBResidue, B::PDBResidue; criteria::String="All")

Returns the squared distance between the residues A and B. The available criteria are: Heavy, All, CA, CB (CA for GLY)

source
MIToS.PDB.superimposeFunction
Asuper, Bsuper, RMSD = superimpose(A, B, matches=nothing)

This function takes A::AbstractVector{PDBResidue} (reference) and B::AbstractVector{PDBResidue}. Translates A and B to the origin of coordinates, and rotates B so that rmsd(A,B) is minimized with the Kabsch algorithm (using only their α carbons). Returns the rotated and translated versions of A and B, and the RMSD value.

Optionally provide matches which iterates over matched index pairs in A and B, e.g., matches = [(3, 5), (4, 6), ...]. The alignment will be constructed using just the matching residues.

source
MIToS.PDB.vanderwaalsMethod
vanderwaals(a::PDBAtom, b::PDBAtom)

Test if two atoms or residues are in van der Waals contact using the criterion in Alvarez (2013). That means, if the distance between two atoms is between 0.7 Å less and 0.7 Å more than the sum of their van der Waals radii. If the element is not listed in VAN_DER_WAALS_RADII, the radius is set to 0.0, and this function will return false.

References

- [Alvarez, Santiago. "A cartography of the van der Waals territories." Dalton 
  Transactions 42.24 (2013): 8617-8636.](@cite C3DT50599E)
source
MIToS.PDB.vanderwaalsclashMethod
vanderwaalsclash(a::PDBAtom, b::PDBAtom; tolerance_value::Float64 = -0.7)

This function considered a van der Waals clash if the distance between two atoms falls within the van der Waals gap as defined by Alvarez (2013). That means, if the distance between two atoms that are not covalently bonded is less than or equal to the sum of their van der Waals radii less 0.7 Å (as the default value for tolerance_value is -0.7). Here we use the covalent function to check for covalent bonds instead of fixing a distance threshold as in Alvarez (2013) — that paper suggest that "distances shorter than the radii sum by more than 1.3 Å correspond most likely to a chemical bond". Unknown elements fall back to 0.0 returning false. Only distances are checked; no chemical context is considered.

Warning

The tolerance_value as been set to -0.7 by default in MIToS 3.2.0 and later. Previous versions used 0.0 as default, matching the definition on Bickerton et al. (2012). However, that value is too high and results in many van der Waals contacts being classified as clashes. MIToS 3.2.0 also replaced the van der Waals radii from Bickerton et al. (2012) with the values reported by Alvarez (2013). Alvarez (2013) suggests that "the position of the shortest non-bonded distance can be roughly estimated to be 0.7 Å shorter than the radii sum."

References

- [Alvarez, Santiago. "A cartography of the van der Waals territories." Dalton 
  Transactions 42.24 (2013): 8617-8636.](@cite C3DT50599E)
- [Bickerton, George R., Alicia P. Higueruelo, and Tom L. Blundell. "Comprehensive, 
  atomic-level characterization of structurally characterized protein-protein 
  interactions: the PICCOLO database." BMC bioinformatics 
  12 (2011): 1-15.](@cite 10.1186/1471-2105-12-313)
source
MIToS.Utils.parse_fileMethod

parse_file(pdbml, ::Type{PDBML}; chain=All, model=All, group=All, atomname=All, onlyheavy=false, label=true, occupancyfilter=false)

Reads a LightXML.XMLDocument representing a pdb file. Returns a list of PDBResidues (view MIToS.PDB.PDBResidues). Setting chain, model, group, atomname and onlyheavy values can be used to select of a subset of all residues. If not set, all residues are returned. If the keyword argument label (default: true) is false,the auth_ attributes will be use instead of the label_ attributes for chain, atom and residue name fields. The auth_ attributes are alternatives provided by an author in order to match the identification/values used in the publication that describes the structure. If the keyword argument occupancyfilter (default: false) is true, only the atoms with the best occupancy are returned.

source
MIToS.Utils.parse_fileMethod

parse_file(io, ::Type{MMCIFFile}; chain=All, model=All, group=All, atomname=All, onlyheavy=false, label=true, occupancyfilter=false)

Parse an mmCIF file and returns a list of PDBResidues. Setting chain, model, group, atomname and onlyheavy values can be used to select a subset of residues. Group can be "ATOM" or "HETATM". If those keyword arguments are not set, all residues are returned. If the keyword argument label (default: true) is false, the auth_ attributes will be used instead of the label_ attributes for chain, atom, and residue name fields. The auth_ attributes are alternatives provided by an author in order to match the identification/values used in the publication that describes the structure. If the keyword argument occupancyfilter (default: false) is true, only the atoms with the best occupancy are returned.

source
MIToS.Utils.parse_fileMethod

parse_file(io, ::Type{PDBFile}; chain=All, model=All, group=All, atomname=All, onlyheavy=false, occupancyfilter=false)

Reads a text file of a PDB entry. Returns a list of PDBResidue (view MIToS.PDB.PDBResidues). Setting chain, model, group, atomname and onlyheavy values can be used to select of a subset of all residues. Group can be "ATOM" or "HETATM". If not set, all residues are returned. If the keyword argument occupancyfilter (default: false) is true, only the atoms with the best occupancy are returned.

source
MIToS.Utils.print_fileFunction

print_file(io, res, format::Type{PDBFile}) print_file(res, format::Type{PDBFile})

Print a PDBResidue or a vector of PDBResidues in PDB format.

source