Secondary structures

This package provides convenience functions to analyze the protein secondary structure along molecular dynamics simulations.

Secondary structure map

The secondary structure map is the profile of the secondary structure computed for each frame of the trajectory. This computation may be costly, particularly with the DSSP algorithm, so it is recommended to save the result. See Saving and loading a map for further information.

MolSimToolkit.ss_mapFunction
ss_map(
    simulation::Simulation; 
    selection::Union{AbstractString,Function}=PDBTools.isprotein,
    ss_method=stride_run,
    show_progress=true
)

Calculates the secondary structure map of the trajectory. Returns a matrix of secondary structure codes, where each row is a residue and each column is a frame.

By default, all protein atoms are considered. The selection keyword argument can be used to choose a different selection. The PDBTools selection syntax can be used, for example selection="protein and chain A", or general Julia functions, like selection=at -> chain(at) in ('A', 'B').

The ss_method keyword argument can be used to choose the secondary structure prediction method, which can be either stride_run or dssp_run. The default is stride_run. STRIDE is a faster algorithm, while DSSP is the default one in PDB database.

The show_progress keyword argument controls whether a progress bar is shown.

For the classes, refer to the ProteinSecondaryStructures.jl package documentation.

Example

julia> using MolSimToolkit, MolSimToolkit.Testing

julia> simulation = Simulation(Testing.namd_pdb, Testing.namd_traj);

julia> ssmap = ss_map(simulation; selection="residue >= 30 and residue <= 35", show_progress=false)
6×5 Matrix{Int64}:
 5  9  5  5  5
 5  9  5  5  5
 5  1  5  5  5
 5  1  5  5  5
 5  1  5  5  5
 9  9  9  9  9

julia> ss_name.(ssmap)
6×5 Matrix{String}:
 "turn"  "coil"       "turn"  "turn"  "turn"
 "turn"  "coil"       "turn"  "turn"  "turn"
 "turn"  "310 helix"  "turn"  "turn"  "turn"
 "turn"  "310 helix"  "turn"  "turn"  "turn"
 "turn"  "310 helix"  "turn"  "turn"  "turn"
 "coil"  "coil"       "coil"  "coil"  "coil"
source

A complete example for computing a secondary structure map is shown below:

julia> using MolSimToolkit, MolSimToolkit.Testing

julia> simulation = Simulation(Testing.namd_pdb, Testing.namd_traj);

julia> ssmap = ss_map(simulation; selection="residue >= 30 and residue <= 35", show_progress=false)
6×5 Matrix{Int64}:
 5  9  5  5  5
 5  9  5  5  5
 5  1  5  5  5
 5  1  5  5  5
 5  1  5  5  5
 9  9  9  9  9

Here we have computed the secondary structure map for 6 residues of the structure, along the 5 frames of the trajectory. The resulting map is a matrix, where each code represents a different class of secondary structure. The conversion between representations of the classes can be done with these three functions of the ProteinSecondaryStructures.jl package, which are reexported here:

  • ss_code: convert the representation to one-letter codes like H, B, C, etc.
  • ss_name: convert the representation to secondary structure names like Alpha-helix, Beta-bridge, etc.
  • ss_number: convert the representation to code numbers, like the ones used the matrix above.

The list of classes and code associations of is available here, in the ProteinSecondaryStructures.jl documentation.

For example, considering the secondary structure map matrix above, we can do:

julia> ss_name.(ssmap)
6×5 Matrix{String}:
 "turn"  "coil"       "turn"  "turn"  "turn"
 "turn"  "coil"       "turn"  "turn"  "turn"
 "turn"  "310 helix"  "turn"  "turn"  "turn"
 "turn"  "310 helix"  "turn"  "turn"  "turn"
 "turn"  "310 helix"  "turn"  "turn"  "turn"
 "coil"  "coil"       "coil"  "coil"  "coil"

Calculation methods: STRIDE and DSSP

The STRIDE or DSSP methods can be used to compute the secondary structure. STRIDE is faster, and DSSP is the default method used in the Protein Data Bank. The method is chosen with the method keyword of ss_map:

ssmap = ss_map(atoms, trajectory; method=stride_run)
ssmap = ss_map(atoms, trajectory; method=dssp_run)

Plotting the map

The ss_heatmap function provides a convenient tool to plot the secondary structure along the trajectory:

MolSimToolkit.ss_heatmapFunction
ss_heatmap(ssmap::Matrix{<:Real}; scalex=1.0, kargs...)

Plots a heatmap of the secondary structure map.

Note

This function requires loading the Plots package. The residue_ticks function is available in the PDBTools` package.

The scalex keyword argument can be used to scale the x-axis, which usually has the meaning of time in a simulation. By default, it is 1.0 and the x-axis is the frame number.

The kargs keyword arguments are passed to the heatmap function of the Plots package, to modify properties of the plot. In particular:

  • the residue ticks can be set with yticks, and can be set to residue specific labels with the residue_ticks function of PDBTools.
  • the x-axis label can be set with xlabel to appropriate units, such as "time / ns", in combination with scalex.

Example

julia> using MolSimToolkit, MolSimToolkit.Testing

julia> using Plots, PDBTools

julia> simulation = Simulation(Testing.namd_pdb, Testing.namd_traj);

julia> ssmap = ss_map(simulation; ss_method=stride_run, show_progress=false);

julia> protein = select(atoms(simulation), "protein");

julia> ss_heatmap(ssmap; scalex=0.1, xlabel="time / ns", yticks=residue_ticks(prot; stride=5))

Will plot a heatmap of the secondary structure map, with the x-axis scaled to 0.1, and residue ticks every 5 residues. The plot can be saved with the savefig function of the Plots package.

source
Note

This function requires loading the Plots package, and residue_ticks is provided by PDBTools.

For example:

julia> using MolSimToolkit, MolSimToolkit.Testing

julia> using Plots, PDBTools

julia> simulation = Simulation(Testing.namd_pdb, Testing.namd_traj);

julia> ssmap = ss_map(simulation; ss_method=stride_run, show_progress=false);

julia> protein = select(atoms(simulation), "protein");

julia> ss_heatmap(ssmap; scalex=0.1, xlabel="time / ns", yticks=residue_ticks(prot; stride=5))

The above code will produce the following plot, which can be saved with savefig("plot.svg"):

heatmap1

Saving and loading a map

The secondary structure map computed is just a matrix of integer codes. Thus, it can be saved or read in any preferred format. As a suggestion, it is possible to use writedlm and readdlm function from the DelimitedFiles package:

using DelimitedFiles
# save data to ssmap.dat
writedlm("ssmap.dat", ssmap)
# load data
ssmat = readdlm("ssmap.dat", Int)

Average structure of each class

From a precomputed secondary structure map the ss_mean helper functions will provide the content of a specific call of secondary structure along the simulation:

MolSimToolkit.ss_meanFunction
ss_mean(ssmap::AbstractMatrix{<:Integer}; class, dims=nothing)

Calculates the mean secondary structure class content of the trajectory, given the secondary structure map.

The secondary structure class to be considered must be defined by the class keyword argument.

class can be either a string, a character, or an integer, or a set of values, setting the class(es) of secondary structure to be consdiered. For example, for alpha helix, use "H". It can also be a vector of classes, such as class=["H", "E"].

The mean can be calculated along the residues (default) or along the frames, by setting the dims keyword argument.

  • dims=nothing (default) calculates the mean occurence of ss_class of the whole matrix.
  • dims=1 calculates the mean occurence of ss_class along the frames, for each residue.
  • dims=2 calculates the mean occurence of ss_class along the residues, for each frame.

The classes can be found in the ProteinSecondaryStructures.jl package documentation.

Example

julia> using MolSimToolkit, MolSimToolkit.Testing

julia> simulation = Simulation(Testing.namd_pdb, Testing.namd_traj);

julia> ssmap = ss_map(simulation; # 5 frames 
                   selection="residue >= 30 and residue <= 35", # 6 residues
                   show_progress=false
               );

julia> ss_mean(ssmap; class="C")
0.23333333333333334

julia> ss_mean(ssmap; class="C", dims=1) # mean coil content per residue
5-element Vector{Float64}:
 0.16666666666666666
 0.5
 0.16666666666666666
 0.16666666666666666
 0.16666666666666666 

julia> ss_mean(ssmap; class="C", dims=2) # mean coil content per frame
6-element Vector{Float64}:
 0.2
 0.2
 0.0
 0.0
 0.0
 1.0

julia> ss_mean(ssmap; class=["C", "T"]) # mean coil or turn
0.9
source

For example, given the ssmap matrix of the examples above, compute the average content of alpha-helices with:

julia> ss_mean(ssmap; class="H")
0.6093023255813953

The average content per frame is computed by averaging over the first dimension of the matrix (the residues):

julia> h = ss_mean(ssmap; class="H", dims=1)
5-element Vector{Float64}:
 0.627906976744186
 0.627906976744186
 0.5813953488372093
 0.6046511627906976
 0.6046511627906976

Which can be plotted with:

julia> plot(MolSimStyle, h, 
           xlabel="frame", 
           ylabel="helical content"
       )

producing the time-dependence plot of the helical content:

helical0

Average structure per residue

And the average content per residue is obtained by averaging over the frames, that is, the columns of the matrix:

julia> h = ss_mean(ssmap; class="H", dims=2)
43-element Vector{Float64}:
 0.0
 0.0
 0.0
 ⋮
 1.0
 0.4
 0.0

This can be plotted, for example, with:

julia> using Plots, PDBTools

julia> ticks = residue_ticks(select(atoms(simulation), "protein"); stride=5)
(1:5:41, ["I211", "G216", "I221", "S226", "F231", "L236", "C241", "K246", "I251"])

julia> plot(MolSimStyle, h, 
           xlabel="residue", xticks=ticks, xrotation=60,
           ylabel="helical content"
       )

Which will generate the following figure:

helical1