Root Mean Squared Fluctuation (RMSF)
Problem description
The Root Mean Squared Fluctuation (RMSF) is a common way to measure residue flexibility in a structural ensemble. It is a measure of how far is the residue moving from its average position in the group of structures. Usually, we represent a residue position with the spatial coordinates of its alpha carbon.
The protein structures should be previously superimposed to calculate the RMSF, for example, by using the superimpose
function of the PDB
module of MIToS
. In this example, we are going to measure the RMSF of each residue from an NMR ensemble using the rmsf
function.
The structure superimposition could be the most complicated step of the process, depending on the input data. In particular, it structures come from different PDB structures or homologous proteins can require the use of external programs, as MAMMOTH-mult or MUSTANG among others, tailored for this task.
In this case, we are going to use an NMR ensemble. Therefore, we are not going to need to superimpose the structures as NMR models have the same protein sequence and are, usually, well-aligned.
MIToS solution
import MIToS
using MIToS.PDB
using Plots
Lets read the NMR ensemble:
pdb_file = abspath(pathof(MIToS), "..", "..", "test", "data", "1AS5.pdb")
pdb_res = read_file(pdb_file, PDBFile, occupancyfilter = true)
We set occupancyfilter
to true
to ensure that we have one single set of coordinates for each atom. That filter isn't essential for NMR structures, but It can avoid multiple alpha carbons in crystallographic structures with disordered atoms. We can get an idea of the alpha carbon positions by plotting these residues:
scatter(pdb_res, legend = false)
As we saw in the previous plot, the structure doesn't need to be superimposed. Now, we are going to separate each model into different vectors, storing each vector into a Dict
:
models = Dict{String,Vector{PDBResidue}}()
for res in pdb_res
push!(get!(models, res.id.model, []), res)
end
Then, we simply need to collect all the PDB models in the values of the Dict
, to get the vector of PDBResidues
vectors required to calculate the RMSF.
pdb_models = collect(values(models))
And, finally, call the rmsf
function on the list of structures. It is important that all the vectors has the same number of PDBResidue
s. This function assumes that the nth element of each vector corresponds to the same residue:
RMSF = rmsf(pdb_models)
21-element Vector{Float64}:
0.6008049320091593
0.6562466637427343
0.5868752127208325
0.7275663200011392
1.6512606331440922
1.3117607364118795
0.6982103470842509
0.7109607828100489
0.6927992280655076
0.8222912705808504
⋮
0.5339744031929865
0.5653746094224534
0.7416600691443928
0.7898229264221448
0.8414943371097368
0.8242922195831439
1.004681790419235
1.4029641961626411
3.0733292325145656
This return the vector of RMSF values for each residue, calculated using the coordinates of the alpha carbons. You can plot this vector to get an idea of the which are the most flexible position in your structure:
plot(RMSF, legend = false, xlab = "Residue", ylab = "RMSF [Å]")
This page was generated using Literate.jl.