Secondary structures
This package provides convenience functions to analyze the protein secondary structure along molecular dynamics simulations.
Secondary structure map
The secondary structure map is the profile of the secondary structure computed for each frame of the trajectory. This computation may be costly, particularly with the DSSP algorithm, so it is recommended to save the result. See Saving and loading a map for further information.
MolSimToolkit.ss_map
— Functionss_map(
simulation::Simulation;
selection::Union{AbstractString,Function}=PDBTools.isprotein,
ss_method=stride_run,
show_progress=true
)
Calculates the secondary structure map of the trajectory. Returns a matrix of secondary structure codes, where each row is a residue and each column is a frame.
By default, all protein atoms are considered. The selection
keyword argument can be used to choose a different selection. The PDBTools
selection syntax can be used, for example selection="protein and chain A"
, or general Julia functions, like selection=at -> chain(at) in ('A', 'B')
.
The ss_method
keyword argument can be used to choose the secondary structure prediction method, which can be either stride_run
or dssp_run
. The default is stride_run
. STRIDE is a faster algorithm, while DSSP is the default one in PDB database.
The show_progress
keyword argument controls whether a progress bar is shown.
For the classes, refer to the ProteinSecondaryStructures.jl package documentation.
Example
julia> using MolSimToolkit, MolSimToolkit.Testing
julia> simulation = Simulation(Testing.namd_pdb, Testing.namd_traj);
julia> ssmap = ss_map(simulation; selection="residue >= 30 and residue <= 35", show_progress=false)
6×5 Matrix{Int64}:
5 9 5 5 5
5 9 5 5 5
5 1 5 5 5
5 1 5 5 5
5 1 5 5 5
9 9 9 9 9
julia> ss_name.(ssmap)
6×5 Matrix{String}:
"turn" "coil" "turn" "turn" "turn"
"turn" "coil" "turn" "turn" "turn"
"turn" "310 helix" "turn" "turn" "turn"
"turn" "310 helix" "turn" "turn" "turn"
"turn" "310 helix" "turn" "turn" "turn"
"coil" "coil" "coil" "coil" "coil"
A complete example for computing a secondary structure map is shown below:
julia> using MolSimToolkit, MolSimToolkit.Testing
julia> simulation = Simulation(Testing.namd_pdb, Testing.namd_traj);
julia> ssmap = ss_map(simulation; selection="residue >= 30 and residue <= 35", show_progress=false)
6×5 Matrix{Int64}:
5 9 5 5 5
5 9 5 5 5
5 1 5 5 5
5 1 5 5 5
5 1 5 5 5
9 9 9 9 9
Here we have computed the secondary structure map for 6 residues of the structure, along the 5 frames of the trajectory. The resulting map is a matrix, where each code represents a different class of secondary structure. The conversion between representations of the classes can be done with these three functions of the ProteinSecondaryStructures.jl package, which are reexported here:
ss_code
: convert the representation to one-letter codes likeH
,B
,C
, etc.ss_name
: convert the representation to secondary structure names likeAlpha-helix
,Beta-bridge
, etc.ss_number
: convert the representation to code numbers, like the ones used the matrix above.
The list of classes and code associations of is available here, in the ProteinSecondaryStructures.jl documentation.
For example, considering the secondary structure map matrix above, we can do:
julia> ss_name.(ssmap)
6×5 Matrix{String}:
"turn" "coil" "turn" "turn" "turn"
"turn" "coil" "turn" "turn" "turn"
"turn" "310 helix" "turn" "turn" "turn"
"turn" "310 helix" "turn" "turn" "turn"
"turn" "310 helix" "turn" "turn" "turn"
"coil" "coil" "coil" "coil" "coil"
Calculation methods: STRIDE and DSSP
The STRIDE or DSSP methods can be used to compute the secondary structure. STRIDE is faster, and DSSP is the default method used in the Protein Data Bank. The method is chosen with the method
keyword of ss_map
:
ssmap = ss_map(atoms, trajectory; method=stride_run)
ssmap = ss_map(atoms, trajectory; method=dssp_run)
Plotting the map
The ss_heatmap
function provides a convenient tool to plot the secondary structure along the trajectory:
MolSimToolkit.ss_heatmap
— Functionss_heatmap(ssmap::Matrix{<:Real}; scalex=1.0, kargs...)
Plots a heatmap of the secondary structure map.
This function requires loading the Plots
package. The residue_ticks
function is available in the PDBTools
` package.
The scalex
keyword argument can be used to scale the x-axis, which usually has the meaning of time in a simulation. By default, it is 1.0 and the x-axis is the frame number.
The kargs
keyword arguments are passed to the heatmap
function of the Plots package, to modify properties of the plot. In particular:
- the residue ticks can be set with
yticks
, and can be set to residue specific labels with theresidue_ticks
function of PDBTools. - the x-axis label can be set with
xlabel
to appropriate units, such as "time / ns", in combination withscalex
.
Example
julia> using MolSimToolkit, MolSimToolkit.Testing
julia> using Plots, PDBTools
julia> simulation = Simulation(Testing.namd_pdb, Testing.namd_traj);
julia> ssmap = ss_map(simulation; ss_method=stride_run, show_progress=false);
julia> protein = select(atoms(simulation), "protein");
julia> ss_heatmap(ssmap; scalex=0.1, xlabel="time / ns", yticks=residue_ticks(prot; stride=5))
Will plot a heatmap of the secondary structure map, with the x-axis scaled to 0.1, and residue ticks every 5 residues. The plot can be saved with the savefig
function of the Plots package.
This function requires loading the Plots
package, and residue_ticks
is provided by PDBTools
.
For example:
julia> using MolSimToolkit, MolSimToolkit.Testing
julia> using Plots, PDBTools
julia> simulation = Simulation(Testing.namd_pdb, Testing.namd_traj);
julia> ssmap = ss_map(simulation; ss_method=stride_run, show_progress=false);
julia> protein = select(atoms(simulation), "protein");
julia> ss_heatmap(ssmap; scalex=0.1, xlabel="time / ns", yticks=residue_ticks(prot; stride=5))
The above code will produce the following plot, which can be saved with savefig("plot.svg")
:
Saving and loading a map
The secondary structure map computed is just a matrix of integer codes. Thus, it can be saved or read in any preferred format. As a suggestion, it is possible to use writedlm
and readdlm
function from the DelimitedFiles
package:
using DelimitedFiles
# save data to ssmap.dat
writedlm("ssmap.dat", ssmap)
# load data
ssmat = readdlm("ssmap.dat", Int)
Average structure of each class
From a precomputed secondary structure map the ss_mean
helper functions will provide the content of a specific call of secondary structure along the simulation:
MolSimToolkit.ss_mean
— Functionss_mean(ssmap::AbstractMatrix{<:Integer}; class, dims=nothing)
Calculates the mean secondary structure class content of the trajectory, given the secondary structure map.
The secondary structure class to be considered must be defined by the class
keyword argument.
class
can be either a string, a character, or an integer, or a set of values, setting the class(es) of secondary structure to be consdiered. For example, for alpha helix
, use "H". It can also be a vector of classes, such as class=["H", "E"]
.
The mean can be calculated along the residues (default) or along the frames, by setting the dims
keyword argument.
dims=nothing
(default) calculates the mean occurence ofss_class
of the whole matrix.dims=1
calculates the mean occurence ofss_class
along the frames, for each residue.dims=2
calculates the mean occurence ofss_class
along the residues, for each frame.
The classes can be found in the ProteinSecondaryStructures.jl package documentation.
Example
julia> using MolSimToolkit, MolSimToolkit.Testing
julia> simulation = Simulation(Testing.namd_pdb, Testing.namd_traj);
julia> ssmap = ss_map(simulation; # 5 frames
selection="residue >= 30 and residue <= 35", # 6 residues
show_progress=false
);
julia> ss_mean(ssmap; class="C")
0.23333333333333334
julia> ss_mean(ssmap; class="C", dims=1) # mean coil content per residue
5-element Vector{Float64}:
0.16666666666666666
0.5
0.16666666666666666
0.16666666666666666
0.16666666666666666
julia> ss_mean(ssmap; class="C", dims=2) # mean coil content per frame
6-element Vector{Float64}:
0.2
0.2
0.0
0.0
0.0
1.0
julia> ss_mean(ssmap; class=["C", "T"]) # mean coil or turn
0.9
For example, given the ssmap
matrix of the examples above, compute the average content of alpha-helices with:
julia> ss_mean(ssmap; class="H")
0.6093023255813953
The average content per frame is computed by averaging over the first dimension of the matrix (the residues):
julia> h = ss_mean(ssmap; class="H", dims=1)
5-element Vector{Float64}:
0.627906976744186
0.627906976744186
0.5813953488372093
0.6046511627906976
0.6046511627906976
Which can be plotted with:
julia> plot(MolSimStyle, h,
xlabel="frame",
ylabel="helical content"
)
producing the time-dependence plot of the helical content:
Average structure per residue
And the average content per residue is obtained by averaging over the frames, that is, the columns of the matrix:
julia> h = ss_mean(ssmap; class="H", dims=2)
43-element Vector{Float64}:
0.0
0.0
0.0
⋮
1.0
0.4
0.0
This can be plotted, for example, with:
julia> using Plots, PDBTools
julia> ticks = residue_ticks(select(atoms(simulation), "protein"); stride=5)
(1:5:41, ["I211", "G216", "I221", "S226", "F231", "L236", "C241", "K246", "I251"])
julia> plot(MolSimStyle, h,
xlabel="residue", xticks=ticks, xrotation=60,
ylabel="helical content"
)
Which will generate the following figure: