Density maps

2D density map per residue

The ResidueContributions object

One nice way to visualize the accumulation or depletion of a solvent around a macromolecule (a protein, for example), is to obtain a 2D map of the density as a function of the distance from its surface. For example, in the figure below the density of a solute (here, Glycerol), in the neighborhood of a protein is shown:

Here, one can see that Glycerol accumulates on Asp76 and on the proximity of hydrogen-bonding residues (Serine residues mostly). This figure was obtained by extracting from atomic contributions of the protein the contribution of each residue to the MDDF, coordination numbers or minimum-distance counts.

The computation of the contributions of each residue can be performed with the convenience function ResidueContributions, which creates an object containing the contributions of the residues to the mddf (or coordination numbers, or minimum-distance counts), the residue names, and distances:

ComplexMixtures.ResidueContributionsType
ResidueContributions (data structure)

Constructor function:

ResidueContributions(
    results::Result, atoms::AbstractVector{<:PDBTools.Atom};
    dmin=1.5, dmax=3.5,
    type=:mddf,
)

Compute the residue contributions to the solute-solvent pair distribution function. The function returns a ResidueContributions object that can be used to plot the residue contributions, or to perform arithmetic operations with other ResidueContributions objects.

Arguments

  • results::Result: The result object obtained from the mddf function.
  • atoms::AbstractVector{<:PDBTools.Atom}: The vector of atoms of the solute, or a part of it.

Optional arguments

  • dmin::Float64: The minimum distance to consider. Default is 1.5.
  • dmax::Float64: The maximum distance to consider. Default is 3.5.
  • type::Symbol: The type of the pair distribution function (:mddf, :md_count, or :coordination_number). Default is :mddf.
  • silent::Bool: If true, the progress bar is not shown. Default is false.

A structure of type ResultContributions can be used to plot the residue contributions to the solute-solvent pair distribution function, using the Plots.contourf function, and to perform arithmetic operations with other ResidueContributions objects, multiplying or dividing by a scalar, and slicing (see examples below).

A ResidueContributions object can be saved to a JSON file using the save function, and loaded back using the load function.

Examples

Constructing a ResidueContributions object

julia> using ComplexMixtures, PDBTools

julia> using ComplexMixtures.Testing: data_dir; ComplexMixtures._testing_show_method[] = true; # testing mode

julia> atoms = readPDB(data_dir*"/NAMD/Protein_in_Glycerol/system.pdb");

julia> results = load(data_dir*"/NAMD/Protein_in_Glycerol/protein_glyc.json");

julia> rc = ResidueContributions(results, select(atoms, "protein"); silent=true)

          Residue Contributions - 274 residues.
     3.51 █     █      █     █            █
     3.27 █              █   █
     3.03 █     █    █       █            █       █       █                █
     2.79 █    ██    █ █ █   █            █      ██          █        █    █
 d   2.55 █ █  ██    █ █ █   █            ██     ██ █  █  █  █  █     █    █
     2.31 █ █  ██    █ ███   █    ██      ██     ██ █  ██ █  ██ █     █    █
     2.07 █ █   █  █ █████   █    ██      ██     ██ █  ██ █  █ ██    █     █
     1.83 █   █ █  █ █████   █    ██      █      ██ █  ██    █ ██     █    █
     1.59
         A1      T33     T66     S98     S130    T162    A194    H226    G258     


julia> save("residue_contributions.json", rc)
"ResidueContributions saved in JSON file: residue_contributions.json"

julia> rc = load("residue_contributions.json", ResidueContributions);

Plotting

using ComplexMixtures, PDBTools, Plots
...
result = mddf(trajectory_file, solute, solvent, options)
rc = ResidueContributions(result, select(atoms, "protein"))
contourf(rc) # plots a contour map

Slicing

A slice of the residue contributions returns a new ResidueContributions object with the selected residues:

using ComplexMixtures, PDBTools, Plots
...
result = mddf(trajectory_file, solute, solvent, options)
rc = ResidueContributions(result, select(atoms, "protein"))
rc_range = rc[10:50] # slice the residue contributions
contourf(rc_range) # plots a contour map of the selected residues

Single-residue contributions

When the contributions of a single residue are computed, or a single-residue contribution is retrieved from a ResidueContributions object, the indexing and iteration over that object occurs over the contributions of that residue:

using ComplexMixtures, PDBTools, Plots
...
result = mddf(trajectory_file, solute, solvent, options)
rc = ResidueContributions(result, select(atoms, "protein"))
rc60 = rc[60] # contributions of residue 60 only
# index and iterate over the contributions of residue 7
rc60[1] # contribution at the first distance (equivalent to first(rc60))
rc60[end] # contribution at the last distance (equivalent to last(rc60))
count(>(0.01), rc60) # number of distances with contributions greater than 0.01
findmax(rc60) # index and value of maximum contribution

This is particular useful to retrieve the contributions from all residues at a given distance:

rc = ResidueContributions(result, select(atoms, "protein"))
rc_last_distance = [ c[end] for c in rc ]
# or, equivalently
rc_last_distance = last.(rc)
# index and value of maximum contribution at each distance, for all residues
findmax.(rc60)

Arithmetic operations

using ComplexMixtures, PDBTools, Plots
...
# first simulation (for example, low temperature):
result1 = mddf(trajectory_file1, solute, solvent, options)
rc1 = ResidueContributions(result1, select(atoms, "protein"))
# second simulation (for example, high temperature):
result2 = mddf(trajectory_file2, solute, solvent, options)
rc2 = ResidueContributions(result2, select(atoms, "protein"))
# difference of the residue contributions between the two simulations:
rc_diff = rc2 - rc1
contourf(rc_diff) # plots a contour map of the difference
# multiply and divide by scalars
rc3 = 3 * rc1
rc4 = rc2 / 2
Compat

Slicing, indexing, and multiplication and divison by scalars were introduces in v2.7.0. Saving and loading was introduced in v2.8.0. Iterators were introduced in v2.10.0.

source

The output of ResidueContributions is by default shown as a simple unicode plot:

The ResidueContribution object can be used to produce a high-quality contour plot using the Plots.contourf (or contour, or heatmap) function:

Plots.contourfMethod
contourf(
    rc::ResidueContributions; 
    step::Int=1, 
    oneletter=false, 
    xlabel="Residue", 
    ylabel="r / Å", 
    kargs...
)
contour(rc::ResidueContributions; kargs...)
heatmap(rc::ResidueContributions; kargs...)

Plot the contribution of each residue to the solute-solvent pair distribution function as a 2D density map. This function requires loading the Plots package. The calling syntax for contour and heatmap is the same as for contourf.

Arguments

  • rc::ResidueContributions: The residue contributions to the solute-solvent pair distribution function, as computed by the ResidueContributions function.

Optional arguments

  • step: The step of the residue ticks in the x-axis of the plot. Default is 1 or will be set to show at most 50 ticks labels.
  • oneletter::Bool: Use one-letter residue codes. Default is false. One-letter codes are only available for the 20 standard amino acids.
  • xlabel and ylabel: Labels for the x and y axes. Default is "Residue" and "r / Å".

This function will set color limits and choose the scale automatically. These parameters can be set with the clims and color arguments of Plots.contourf. Other plot customizations can be done by passing other keyword arguments to this function, which will be passed to Plots.contourf.

Example

julia> using ComplexMixtures, Plots, PDBTools

julia> results = load("mddf.json")

julia> atoms = readPDB("system.pdb", "protein")

julia> rc = ResidueContributions(results, atoms; oneletter=true)

julia> plt = contourf(rc; step=5)

This will produce a plot with the contribution of each residue to the solute-solvent pair distribution function, as a contour plot, with the residues in the x-axis and the distance in the y-axis.

To customize the plot, use the Plot.contourf keyword parameters, for example:

julia> plt = contourf(rc; step=5, size=(800,400), title="Title", clims=(-0.1, 0.1))
Compat

This function requires loading the Plots package.

Support for all Plots.contourf parameters was introduced in ComplexMixtures v2.6.0, and support for contour and heatmap was introduced in ComplexMixtures v2.11.0.

source

A complete example of its usage can be seen here.

Compat

All features described in this section are only available in v2.10.0 or greater.

Contributions of subgroups of residues

Residue contributions can also be computed for subgroups of the residues. For example, as a continuation of the protein in glycerol example , one can compute the map of residue contributions, but splitting the contributions of backbone and side-chains of the residues:

rc_bb = ResidueContributions(
  results, 
  select(protein, "backbone and resnum >= 70 and resnum <= 110")
)
rc_st = ResidueContributions(
  results, 
  select(protein, "sidechain and resnum >= 70 and resnum <= 110")
)

And we plot the difference between these two maps:

using Plots
contourf(rc_st - rc_bb; oneletter=true)

obtaining the following figure:

which shows that the side-chains contribute mostly to these densities, except here expectedly, for some Gly residues.

Indexing, slicing, arithmetic operations

The ResidueContributions object can be indexes and sliced, for the analysis of the contributions of specific residues or range of residues:

rc = ResidueContributions(results1, select(atoms, "protein")); 
rc_7 = rc[7] # contributions of residue 7
rc_range = rc[20:50] # contributions of a range of residues

Slicing will return a new ResidueContributions object.

Additionally, these ResidueContributions objects can be subtracted, divided, summed, or multiplied, to compare contributions of residues among different simulations. Typically, if one wants to compare the solvation of residues in two different simulations, one can do:

# first simulation (for example, low temperature)
rc1 = ResidueContributions(results1, select(atoms, "protein")); 

# second simulation (for example, high temperature)
rc2 = ResidueContributions(results2, select(atoms, "protein"));

# difference in residue contributions to solvation
rc_diff = rc2 - rc1

# Plot difference
using Plots
contourf(rc_diff; title="Density difference", step=2, colorbar=:left)

Which will produce a plot similar to the one below (the data of this plot is just illustrative):

which will return a new ResidueContributions object.

Finally, it is also possible to renormalize the contributions by multiplication or division by scalars,

rc2 = rc / 15
rc2 = 2 * rc

When the contributions of a single residue are computed, or a single-residue contribution is retrieved from a ResidueContributions object, the indexing and iteration over that object occurs over the contributions of that residue:

using ComplexMixtures, PDBTools, Plots
...
result = mddf(trajectory_file, solute, solvent, options)
rc = ResidueContributions(result, select(atoms, "protein"))
rc7 = rc[7] # contributions of residue 7
# iterate over the contributions of residue 7
rc7[1] # contribution of the first distance
rc7[end] # contribution of the last distance

This is particular useful to retrieve the contributions from all residues at a given distance:

rc = ResidueContributions(result, select(atoms, "protein"))
rc_last_distance = [ r[end] for r in rc ] 
# or, equivalently
rc_last_distance = last.(rc)
# compute the maximum contribution of each residue:
max_c = maximum.(rc)

Saving and loading a ResidueContributions object

The ResidueContributions object can be saved and loaded for easier data analysis. In particular, this is important for very large structures, where its computation can be costly. The saving and loading functions can be use with:

rc = ResidueContributions(results1, select(atoms, "protein")); 
# Save rc objecto to a file (json format):
save("residue_contributions.json", rc) 
# Load json file into a new rc_loaded object:
rc_loaded = load("residue_contributions.json", ResidueContributions)

Note that the load function requires, as a second argument, the ResidueContributions type, to differentiate the method from the loading of the Result data structure.

ComplexMixtures.loadMethod
load(filename::String, ResidueContributions)

Function to load the residue contributions saved into a JSON file into the ResidueContributions data structure.

Example

using ComplexMixtures
rc = ResidueContributions(resutls, SoluteGroup(protein))
save("residue_contributions.json", rc)
rc = load("residue_contributions.json", ResidueContributions)
source
ComplexMixtures.saveMethod
save(filename::String, rc::ResidueContributions)

Save the ResidueContributions object to a JSON file.

using ComplexMixtures
rc = ResidueContributions(resutls, SoluteGroup(protein))
save("residue_contributions.json", rc)
rc = load("residue_contributions.json", ResidueContributions)
source
Tip

These ResidueContributions methods are convenience functions only.

Basically, we are extracting the contribution of each residue independently and building a matrix where each row represents a distance and each column a residue. Using PDBTools, this can be done with, for example:

residues = collect(eachresidue(protein))
residue_contributions = zeros(length(R.d),length(residues))
for (i,residue) in pairs(residues)
  c = contributions(results, SoluteGroup(residue)) 
  residue_contributions[:,i] .= c
end

The above produces a matrix with a number of columns equal to the number of residues and a number of rows equal to the number of MDDF points. That matrix can be plotted as a contour map with adequate plotting software.

3D density map around a macromolecule

Three-dimensional representations of the distribution functions can also be obtained from the MDDF results. These 3D representations are obtained from the fact that the MDDFs can be decomposed into the contributions of each solute atom, and that each point in space is closest to a single solute atom as well. Thus, each point in space can be associated to one solute atom, and the contribution of that atom to the MDDF at the corresponding distance can be obtained.

A 3D density map is constructed with the grid3D function:

ComplexMixtures.grid3DFunction
grid3D(
    result::Result, atoms, output_file::Union{Nothing,String} = nothing; 
    dmin=1.5, ddax=5.0, step=0.5, silent = false,
)

This function builds the grid of the 3D density function and fills an array of mutable structures of type Atom, containing the position of the atoms of grid, the closest atom to that position, and distance.

result is a ComplexMixtures.Result object atoms is a vector of PDBTools.Atoms with all the atoms of the system. output_file is the name of the file where the grid will be written. If nothing, the grid is only returned as a matrix.

dmin and dmax define the range of distance where the density grid will be built, and step defines how fine the grid must be. Be aware that fine grids involve usually a very large (hundreds of thousands points).

silent is a boolean to suppress the progress bar.

The output PDB has the following characteristics:

  • The positions of the atoms are grid points.
  • The identity of the atoms correspond to the identity of the protein atom contributing to the MDDF at that point (the closest protein atom).
  • The temperature-factor column (beta) contains the relative contribution of that atom to the MDDF at the corresponding distance.
  • The occupancy field contains the distance itself.

Example

julia> using ComplexMixtures, PDBTools

julia> atoms = readPDB("./system.pdb");

julia> R = ComplexMixtures.load("./results.json");

julia> grid = grid3D(R, atoms, "grid.pdb");

Examples of how the grid can be visualized are provided in the user guide of ComplexMixtures.

source

The call to grid3D will write an output a PDB file with the grid points, which loaded in a visualization software side-by-side with the protein structure, allows the production of the images shown. The grid.pdb file contains a regular PDB format where:

  • The positions of the atoms are grid points.
  • The identity of the atoms correspond to the identity of the protein atom contributing to the MDDF at that point (the closest protein atom).
  • The temperature-factor column (beta) contains the relative contribution of that atom to the MDDF at the corresponding distance.
  • The occupancy field contains the distance itself.

For example, the distribution function of a hydrogen-bonding liquid solvating a protein will display a characteristic peak at about 1.8Å. The MDDF at that distance can be decomposed into the contributions of all atoms of the protein which were found to form hydrogen bonds to the solvent. A 3D representation of these contributions can be obtained by computing, around a static protein (solute) structure, which are the regions in space which are closer to each atom of the protein. The position in space is then marked with the atom of the protein to which that region "belongs" and with the contribution of that atom to the MDDF at each distance within that region. A special function to compute this 3D distribution is provided here: grid3D.

This is better illustrated by a graphical representation. In the figure below we see a 3D representation of the MDDF of Glycerol around a protein, computed from a simulation of this protein in a mixture of water and Glycerol. A complete set of files and a script to reproduce this example is available here.

In the figure on the left, the points in space around the protein are selected with the following properties: distance from the protein smaller than 2.0Å and relative contribution to the MDDF at the corresponding distance of at least 10% of the maximum contribution. Thus, we are selecting the regions of the protein corresponding to the most stable hydrogen-bonding interactions. The color of the points is the contribution to the MDDF, from blue to red. Thus, the most reddish-points corresponds to the regions where the most stable hydrogen bonds were formed. We have marked two regions here, on opposite sides of the protein, with arrows.

Clicking on those points we obtain which are the atoms of the protein contributing to the MDDF at that region. In particular, the arrow on the right points to the strongest red region, which corresponds to an Aspartic acid. These residues are shown explicitly under the density (represented as a transparent surface) on the figure in the center.

The figure on the right displays, overlapped with the hydrogen-bonding residues, the most important contributions to the second peak of the distribution, corresponding to distances from the protein between 2.0 and 3.5Å. Notably, the regions involved are different from the ones forming hydrogen bonds, indicating that non-specific interactions with the protein (and not a second solvation shell) are responsible for the second peak.