MSA and structures

MSA and structures

For this example, we use the Multiple Sequence Alignment (MSA) of the Mu DNA-binding domain.

using MIToS.Pfam
using MIToS.MSA

const pfam_file = downloadpfam("PF02316")
const msa = read(pfam_file,
				 Stockholm,
				 generatemapping=true,
				 useidcoordinates=true)

We use the getseq2pdb function to look into the Pfam annotations for PDBs:

const seq2pdb = getseq2pdb(msa)

For this example, we use the crystallographic structure 4FCY...

selected = [ (seq, pdb, chain) for (seq, pdbs) in seq2pdb
				for (pdb, chain) in pdbs if pdb == "4FCY" ]

...and we take the first one

seq_id, pdb_id, chain = selected[1]

We download and read the PDB file

using MIToS.PDB
pdb_file = downloadpdb(pdb_id, format=PDBFile)
const pdb_res = read(pdb_file, PDBFile)

Each PDBResidue contains the information in the ATOM and HETATM PDB lines:

first_residue = pdb_res[1]

You can access the information by accessing the field names:

first_residue.id.name

Exercise 1

4

How many missing residues are in the chain A

How many chains are in the PDB?

Hint: You can use Set or unique

There are 5 chains: A, B, C, D, E

Distances and contacts

MIToS PDB module has functions to measure distance between residues and identify contacts:

res_i = pdb_res[1]
res_j = pdb_res[4]
distance(res_i, res_j)

contact(i, j, threshold) is faster than distance(i, j) < threshold

contact(res_i, res_j, 8.0)

distance and contact can take a criteria keyword argument with one of the following values: Heavy, All, CA, CB (it uses CA for GLY).

distance(res_i, res_j, criteria="CB")

Exercise 2

4

How many missing residues are in the chain A

Write a function that returns the Set of residues in the first vector that are in contact with the residues in the second vector:

function get_contacts(residues_i, residues_j; threshold::Float64=8.0)
	result = Set{PDBResidue}()
#   ...your code here...
	result
end
get_contacts (generic function with 1 method)
using Test
@test get_contacts(pdb_res[[1, end]], pdb_res[2:3], threshold=6.0) == Set([pdb_res[1]])

Exercise 3

4

How many missing residues are in the chain A

Use the get_contacts function to get all the residues from the chain A (protein) that are in contact with the chains C, D and E (DNA).

# ...your solution...

PDB Plots

You can use Plots to get an idea of where your residues are:

using Plots
plotlyjs()
plot(pdb_res)

Or use Bio3DView to get an interactive view (but we need to save the residues in an uncompressed PDB file first):

write("selected_residues.pdb", pdb_res, PDBFile)
using Bio3DView
viewfile("selected_residues.pdb", "pdb")

SIFTS

We can download the XML SIFTS file of the PDB using the SIFTS module of MIToS. It has a residue level mapping between databases and information about the structure.

using MIToS.SIFTS

const sifts_file = downloadsifts(pdb_id)
const sifts_res = read(sifts_file, SIFTSXML)

Similar to PDBResidue, a SIFTSResidue contains information about a single residue and their fields can be accessed using dots:

sifts_res[1]

In this case, the first residue is missing in the structure:

sifts_res[1].missing
sifts_res[end]
sifts_res[end].missing

Exercise 4

How many missing residues are in the chain A?

# ...your code here...

sum(res.missing for res in sifts_res if res.PDB.chain == "A")

This page was generated using Literate.jl.