Structural alignment

MolSimToolkit.jl provides tools to perform rigid-body structural alignment throughout trajectories.

Two types of alignments are avaialble: the "standard" rigid-body alignment, frequently used to compute RMSDs and RMSFs in MD trajectories, and a robust structural alignment method, mdlovofit. The robust alignment method provides better alignments of the rigid (or core) fractions of the structures, while magnifying the variability of the flexibile regions.

Conventional rigid-body alignment

Conventional rigid-body aligments routines are:

MolSimToolkitShared.rmsdFunction
rmsd(simulation::Simulation, indices::AbstractVector{Int}; mass = nothing, reference_frame = nothing, show_progress = true)

Computes the root mean square deviation (RMSD) between two sets of points in along a trajectory.

Arguments

  • indices vector contains the indices of the atoms to be considered.
  • mass argument can be used to provide the mass of the atoms if they are not the same.
  • reference_frame argument can be used to provide a reference frame to align the trajectory to:
    • If reference_frame == nothing, the first frame will be used (default behavior).
    • If reference_frame == :average, the average structure will be used.
    • If reference_frame is an integer, the frame at that index will be used as reference.

Examples

Computing the rmsd along a trajectory

julia> using MolSimToolkit, MolSimToolkit.Testing

julia> using PDBTools

julia> atoms = readPDB(Testing.namd_pdb);

julia> simulation = Simulation(Testing.namd_pdb, Testing.namd_traj);

julia> cas = findall(sel"name CA", atoms); # CA indices

julia> rmsd(simulation, cas; show_progress=false)
5-element Vector{Float64}:
 0.0
 2.8388710154609034
 2.9776998440690385
 2.4621444212469483
 3.8035683196100796

julia> rmsd(simulation, cas; reference_frame=:average, show_progress=false)
5-element Vector{Float64}:
 1.8995986972454748
 2.1512244220536973
 1.5081703191869376
 1.1651111324544219
 2.757039151265317
source
rmsd(x::AbstractVector,y::AbstractVector)

Calculate the root mean square deviation between two vectors of coordinates.

Arguments

  • x::AbstractVector: A vector of coordinates.
  • y::AbstractVector: A vector of coordinates.

Returns

  • rmsd::Real: The root mean square deviation between the two vectors.
julia> import MolSimToolkitShared: rmsd

julia> x = [ [1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0] ];

julia> y = [ [2.0, 3.0, 4.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0] ];

julia> rmsd(x, y)
1.0
source
MolSimToolkit.rmsd_matrixFunction
rmsd_matrix(
    simulation::Simulation, 
    indices::AbstractVector{Int}; 
    mass::Union{AbstractVector{Int}, Nothing} = nothing,
    align::Bool = true,
    show_progress = true,
)

Computes the RMSD matrix for a set of atoms along a trajectory.

The indices vector contains the indices of the atoms to be considered. The mass argument can be used to provide the mass of the atoms if they are not the same. The align argument can be used to align the frames before computing the RMSD.

The show_progress argument can be used to show a progress bar.

Returns

A symetric matrix with the RMSD values between each pair of frames. For example, in a trajectory with 5 frames, the matrix will be a 5x5 matrix with the RMSD values between the structures of each pair of frames.

Example

julia> using MolSimToolkit, MolSimToolkit.Testing

julia> using PDBTools

julia> atoms = readPDB(Testing.namd_pdb);

julia> cas = findall(Select("name CA"), atoms); # CA indices

julia> simulation = Simulation(Testing.namd_pdb, Testing.namd_traj);

julia> rmsd_matrix(simulation, cas; show_progress=false)
5×5 Matrix{Float64}:
 0.0      2.83887  2.9777   2.46214  3.80357
 2.83887  0.0      2.35492  2.64463  4.68028
 2.9777   2.35492  0.0      2.08246  3.46149
 2.46214  2.64463  2.08246  0.0      2.97835
 3.80357  4.68028  3.46149  2.97835  0.0
source
MolSimToolkitShared.center_of_massFunction
center_of_mass(
    indices::AbstractVector{Int};
    simulation::Simulation,
    positions::FramePositions,
    iref::Union{Nothing,Int} = max(1, div(length(indices),2)),
)

Calculate the center of mass of a selection of atoms in a simulation given the positions. The selection is defined by the indices vector, which is the indices of the atoms.

The iref parameter is the index of the reference atom. The center of mass is calculated by first computing the minimum-image of all atoms relative to this atom. By default, it is the atom closest to the middle of the indices vector. If iref is nothing, the center of mass is calculated without wrapping the coordinates.

julia> using PDBTools 

julia> using MolSimToolkit, MolSimToolkit.Testing

julia> simulation = Simulation(Testing.namd_pdb, Testing.namd_traj);

julia> protein_indices = findall(sel"protein", atoms(simulation));

julia> first_frame!(simulation); # move simulation to the first frame

julia> coor = positions(current_frame(simulation));

julia> cm = center_of_mass(protein_indices, simulation, coor)
3-element Point3D{Float64} with indices SOneTo(3):
 -3.7290442807974906
 -1.5339226637687564
  1.960640754560446
Compat

The iref=nothing option was added in version 1.22.0.

source
center_of_mass(x::AbstractVector{<:AbstractVector}[, mass::AbstractVector=nothing])

Calculate the center of mass of a set of points.

Arguments

  • x::AbstractVector{<:AbstractVector}: A vector of coordinates.
  • mass::AbstractVector: A vector of masses. If not provided, all masses are assumed to be equal.

Example

julia> import MolSimToolkitShared: center_of_mass

julia> x = [ [1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0] ];

julia> center_of_mass(x)
3-element Vector{Float64}:
 4.0
 5.0
 6.0

julia> center_of_mass(x, [1.0, 2.0, 3.0]) # providing masses
3-element Vector{Float64}:
 5.0
 6.0
 7.0
source
MolSimToolkitShared.alignFunction
align(x, y; mass = nothing)
align!(x, y; mass = nothing)

Aligns two structures (sets of points in 3D space). Solves the "Procrustes" problem, which is to find the best translation, and rotation, that aligns the two structures, minimizing the RMSD between them.

Structures are expected to be of the same size, and the correspondence is assumed from the vector indices.

align returns a new vector containing the coordinates of x aligned to y. align! modifies the input vector x in place.

source
MolSimToolkitShared.align!Function
align(x, y; mass = nothing)
align!(x, y; mass = nothing)

Aligns two structures (sets of points in 3D space). Solves the "Procrustes" problem, which is to find the best translation, and rotation, that aligns the two structures, minimizing the RMSD between them.

Structures are expected to be of the same size, and the correspondence is assumed from the vector indices.

align returns a new vector containing the coordinates of x aligned to y. align! modifies the input vector x in place.

source

Robust rigid-body alignment

MDLovoFit is a package for the analysis of the mobility and structural fluctuation in Molecular Dynamics simulations. It allows the automatic identification of rigid and mobile regions of protein structures.

For example, it is possible to automatically identifiy a stable region of a protein in simulation in which the protein displays high structural flexibility, as illustrated in the example. The regions of low mobility are automatically detected by the method.

The software provides, as output, the Root-Mean-Square Deviations of the conserved structures, and of the divergent structures. A trajectory PDB file is output for the visualization of the results.

The execution of the MDLovoFit procedure typically has three steps:

  • Step1: Identify the length of the structure that can be aligned with a precision lower than a threshold. This is done with the map_fractions function.
  • Step 2: Align the trajectory for the fraction of atoms desired, with the mdlovofit function.
  • Step 3: Visualize the aligned trajectory and aligned atoms.
Note

Please cite the following references if these functions were useful:

L. Martínez, Automatic identification of mobile and rigid substructures in molecular dynamics simulations and fractional structural fluctuation analysis. PLoS One 10(3): e0119264, 2015. Full text.

L. Martínez. R. Andreani, J. M. Martínez. Convergent Algorithms for Protein Structural Alignment. BMC Bioinformatics, 8, 306, 2007. Full text.

It is possible to use mdlovofit as a standalone program by downloading it here.

Step 1: Map fractions of aligned atoms and overlap precision

MolSimToolkit.map_fractionsFunction
map_fractions(simulation::Simulation; maxframes=nothing)

Run MDLovoFit on a trajectory, to obtain the RMSD of the fractions of the structure with the lowest RMSD, the fraction with the highest RMSD, and the whole structure.

Only CA atoms of protein residues are considered.

Argument:

  • simulation is a Simulation object, with the trajectory of the system and the atom data.

Keyword arguments:

  • maxframes=nothing is the number of frames to be considered. If nothing, 100 frames are considered. Set to length(simulation) to consider all frames.

Example

julia> using MolSimToolkit, MolSimToolkit.Testing

julia> sim = Simulation(Testing.namd_pdb, Testing.namd_traj);

julia> mf = map_fractions(sim)
-------------------------------------------------------------------
MapFractionsResult: Simulation(structure.pdb, trajectory.dcd)

-------------------------------------------------------------------
Fields: 
- fraction: fraction of atoms considered in the alignment.
- frame_indices: the frame index of each frame considered.
- rmsd_low: RMSD of the fraction of the structure with the lowest RMSD.
- rmsd_high: RMSD of the fraction not considered for the alignment.
- rmsd_all: RMSD of the whole structure.

Greatest fraction for which the RMSD-low is smaller than 1.0: 0.79
                                                         2.0: 0.9
                                                         3.0: 0.99
-------------------------------------------------------------------
source
MolSimToolkit.MapFractionsResultType
MapFractionsResult

Data structure to store the output of the map_fractions function.

Fields:

  • simulation is a Simulation object with the trajectory of the system and the atom data.
  • frame_indices contains the frame index of each frame considered.
  • fraction contains the fraction of atoms considered in the alignment.
  • rmsd_low contains the RMSD of the fraction of the structure with the lowest RMSD.
  • rmsd_high contains the RMSD of the fraction not considered for the alignment.
  • rmsd_all contains the RMSD of the whole structure.
source

The map_fractions function scans the trajectory and tries to align the C$\alpha$ atoms of the protein structures, but considering only a fraction of the atoms for the alignment. The alignment is performed such that only the fraction of atoms of smaller displacements interfere with the overlap. Three RMSDs are then obtained: the RMSD of the best aligned atoms, the RMSD of the remaining (highly mobile) atoms, and the RMSD of all atoms. For example:

using MolSimToolkit, MolSimToolkit.Testing

# Create Simulation object from pdb and trajectory file names
sim = Simulation(Testing.namd_pdb, Testing.namd_traj)

# Run map_fractions
mf = map_fractions(sim)
-------------------------------------------------------------------
MapFractionsResult: Simulation(structure.pdb, trajectory.dcd)

-------------------------------------------------------------------
Fields: 
- fraction: fraction of atoms considered in the alignment.
- frame_indices: the frame index of each frame considered.
- rmsd_low: RMSD of the fraction of the structure with the lowest RMSD.
- rmsd_high: RMSD of the fraction not considered for the alignment.
- rmsd_all: RMSD of the whole structure.

Greatest fraction for which the RMSD-low is smaller than 1.0: 0.79
                                                         2.0: 0.9
                                                         3.0: 0.99
-------------------------------------------------------------------

In the above run, we can readily see that 79% of the atoms of the structure can be aligned with RMSD smaller than 1.0Å. The RMSDs of the low and high mobility, and all-C$\alpha$ RMSDs are available as the fields of the mf object generated.

A convenience recipe can plot the results:

using Plots
plot(mf)
Example block output

Where we see that about 80% of atoms can be aligned to less than 1.0Å RMSD, while at the same time the 20% more flexible atoms will display an RMSD of about 7Å.

We will now consider that, in this case, 1.0Å is a good threshold for defining the conserved structural core of the protein, and proceed with the next analysis, where we will ask mdlovofit to obtain the best alignment of 80% of the atoms, of each frame relative to the first frame.

Step 2: Align the trajectory for a given fraction of atoms

MolSimToolkit.mdlovofitFunction
mdlovofit(
    simulation::Simulation; 
    fraction::AbstractFloat, 
    output_name::Union{String,Nothing} = nothing, 
    reference_frame::Int = 1, 
    maxframes=nothing
)

Run MDLovoFit on a trajectory, aligning only the atoms which are in the fraction of the structure with the lowest RMSD. Only CA atoms of protein residues are considered.

Arguments:

  • simulation is a Simulation object, with the trajectory of the system and the atom data.
  • fraction is the fraction of atoms to be considered in the alignment.

Optional arguments:

  • output_name=nothing is the base name of the output files. If nothing, default names will be used.
  • reference_frame=1 is the index of the frame to be used as reference for the alignment.
  • maxframes=nothing is the number of frames to be considered. If nothing, 100 frames are considered. Set to length(simulation) to consider all frames.

Output:

  • MDLovoFitResult data structure with the results of the alignment.

and the output files:

  • output_name_rmsf.dat with the RMSF data.
  • output_name_rmsd.dat with the RMSD data.
  • output_name_aligned.pdb with the aligned structure.

Example

julia> using MolSimToolkit, MolSimToolkit.Testing

julia> sim = Simulation(Testing.namd_pdb, Testing.namd_traj);

julia> md = mdlovofit(sim, fraction=0.7, output_name="mdlovofit_50")
-------------------------------------------------------------------
MDLovoFitResult: Simulation(structure.pdb, trajectory.dcd)

-------------------------------------------------------------------

Aligned pdb file: mdlovofit_50_aligned.pdb
RMSF data file: mdlovofit_50_rmsf.dat
RMSD data file: mdlovofit_50_rmsd.dat

Number of frames considered: 5
Average RMSD of all atoms: 1.79
Average RMSD of the 70.0% atoms of lowest RMSD: 0.53
Average RMSD of the 30.0% atoms of highest RMSD: 4.71

Frame indices availabe in field: frame_indices
RMSD data availabe in fields: rmsd_low, rmsd_high, and rmsd_all

RMSF data availabe in field: rmsf (Number of atoms: 43)
-------------------------------------------------------------------
source
MolSimToolkit.MDLovoFitResultType
MDLovoFitResult

Data structure to store the output of the mdlovofit function.

Fields:

  • simulation: Simulation object with the trajectory of the system and the atom data.
  • frame_indices: vector with the frame index of each frame.
  • rmsd_low: RMSD of the fraction of the structure with the lowest RMSD.
  • rmsd_high: RMSD of the fraction of the structure with the highest RMSD.
  • rmsd_all: RMSD of the whole structure.
  • rmsf: RMSF as a function of the residue or atom index.
  • rmsf_file: name of the file with the RMSF data.
  • rmsd_file: name of the file with the RMSD data.
  • aligned_pdb: name of the PDB file with the aligned structure.
source

Continuing with the above example, now we will call the mdlovofit function, to obtain the best aligment of 80% of the atoms in each frame relative to the first frame:

md = mdlovofit(sim; fraction=0.8, output_name="mysim")
-------------------------------------------------------------------
MDLovoFitResult: Simulation(structure.pdb, trajectory.dcd)

-------------------------------------------------------------------

Aligned pdb file: mysim_aligned.pdb
RMSF data file: mysim_rmsf.dat
RMSD data file: mysim_rmsd.dat

Number of frames considered: 5
Average RMSD of all atoms: 2.01
Average RMSD of the 80.0% atoms of lowest RMSD: 1.13
Average RMSD of the 20.0% atoms of highest RMSD: 5.31

Frame indices availabe in field: frame_indices
RMSD data availabe in fields: rmsd_low, rmsd_high, and rmsd_all

RMSF data availabe in field: rmsf (Number of atoms: 43)
-------------------------------------------------------------------

We now see that we can, in average, align 80% of the atoms in all frames to 1.13Å. The output RMSD and RMSF data is available in the md data structure, and additionally three files were generated, containing:

  1. mysim_aligned.pdb: a sequence of PDB files of the aligned trajectory.
  2. mysim_rmsd.dat: the RMSD data for low, high, and all atoms.
  3. mysim_rmsf.dat: the RMSF data for the atoms, resulting from the robust alignment.

We can plot an overview of the results with:

plot(md)
Example block output

And we can see that, in this short (5-frame) trajectory, the N-terminal region of the protein is the one responsible for the structural deviations, while there is a rigid structural core between residues I221 and K249.

Step 3: Visualization of the aligned trajectory

The PDB file created by mdlovofit contains the frames aligned to the reference frame, according to the robust alignment based on the least mobile substructure (determined by fraction). The file contains multiple models, separated by the END keyword.

The occupancy field of the atoms determines if the atom was used in the aligment, thus if it was found among the least mobile atoms of the simulation. The beta field contains the RMSF of the atom, in each frame.

Using, for example, VMD, it is possible to display all frames at once, colored by occupancy, providing an overview of the mobility of the structure:

mdlovofit.png

This structural superposition and the associated colors are correlatd with the RMSF plot above. Red atoms were not used for the aligment and display high mobility, and blue atoms represent the stable core of the structure and were aligned.