Root Mean Squared Fluctuation (RMSF)

md # md #

Problem description

The Root Mean Squared Fluctuation (RMSF) is a common way to measure residue flexibility in a structural ensemble. It is a measure of how far is the residue moving from its average position in the group of structures. Usually, we represent a residue position with the spatial coordinates of its alpha carbon.

The protein structures should be previously superimposed to calculate the RMSF, for example, by using the superimpose function of the PDB module of MIToS. In this example, we are going to measure the RMSF of each residue from an NMR ensemble using the rmsf function.

The structure superimposition could be the most complicated step of the process, depending on the input data. In particular, it structures come from different PDB structures or homologous proteins can require the use of external programs, as MAMMOTH-mult or MUSTANG among others, tailored for this task.

In this case, we are going to use an NMR ensemble. Therefore, we are not going to need to superimpose the structures as NMR models have the same protein sequence and are, usually, well-aligned.

MIToS solution

import MIToS
using MIToS.PDB
using Plots

Lets read the NMR ensemble:

pdb_file = abspath(pathof(MIToS), "..", "..", "test", "data", "1AS5.pdb")
pdb_res = read_file(pdb_file, PDBFile, occupancyfilter = true)

We set occupancyfilter to true to ensure that we have one single set of coordinates for each atom. That filter isn't essential for NMR structures, but It can avoid multiple alpha carbons in crystallographic structures with disordered atoms. We can get an idea of the alpha carbon positions by plotting these residues:

scatter(pdb_res, legend = false)

As we saw in the previous plot, the structure doesn't need to be superimposed. Now, we are going to separate each model into different vectors, storing each vector into a Dict:

models = Dict{String,Vector{PDBResidue}}()
for res in pdb_res
    push!(get!(models, res.id.model, []), res)
end

Then, we simply need to collect all the PDB models in the values of the Dict, to get the vector of PDBResidues vectors required to calculate the RMSF.

pdb_models = collect(values(models))

And, finally, call the rmsf function on the list of structures. It is important that all the vectors has the same number of PDBResidues. This function assumes that the nth element of each vector corresponds to the same residue:

RMSF = rmsf(pdb_models)

21-element Vector{Float64}:
 0.6008049320091593
 0.6562466637427343
 0.5868752127208325
 0.7275663200011392
 1.6512606331440922
 1.3117607364118795
 0.6982103470842509
 0.7109607828100489
 0.6927992280655076
 0.8222912705808504
 ⋮
 0.5339744031929865
 0.5653746094224534
 0.7416600691443928
 0.7898229264221448
 0.8414943371097368
 0.8242922195831439
 1.004681790419235
 1.4029641961626411
 3.0733292325145656

This return the vector of RMSF values for each residue, calculated using the coordinates of the alpha carbons. You can plot this vector to get an idea of the which are the most flexible position in your structure:

plot(RMSF, legend = false, xlab = "Residue", ylab = "RMSF [Å]")

This page was generated using Literate.jl.