m-values (protein transfer free energy) calculator

Warning

This is an experimental feature. Breaking changes may occur without a breaking package release.

Computing m-values for single structure pairs

MolSimToolkit.mvalue — Function

mvalue(; model=MoeserHorinek, cosolvent="urea", atoms:AbstractVector{<:PDBTools.Atom}, sasas, type=1)

Calculates the m-value (transfer free energy of a protein in 1M solution) using the Tanford transfer model, as implemented by Moeser and Horinek [1] or by Auton and Bolen [2,3].

Arguments

model: The model to be used. Must be MoeserHorinek or AutonBolen. MoeserHorinek is only implemented for cosolvent="urea", and should be more precise in that case. Other solvents are available for AutonBolen.
cosolvent::String: One of "Betaine", "Proline", "Sarcosine", "Sorbitol", "Sucrose", "Tmao", "tmao", "TMAO", "urea", "Urea", "UREA", "UreaApp", "UreaMH"
atoms::AbstractVector{<:PDBTools.Atom}: Vector containing the atoms of the structure.
sasas::Dict{String, Dict{Symbol, Float64}}: A dictionary containing the change in solvent accessible surface area (SASA) upon denaturation for each amino acid type. This data can be obtained from the m-value server or calculated using GROMACS:
- The output of the server can be parsed using the parse_mvalue_server_sasa function defined in this module.
- Compute the SASA with delta_sasa_per_restype, a SASA calculation utility implemented in PDBTools.jl.
- SASA values can be calculated using GROMACS with the gmx_delta_sasa_per_restype function defined in this module.
type::Int: Specifies which SASA value to use from the provided data, because the server provides minimum, average, and maximum values, according to different denatured models for the protein. The recommended value is 2 for comparison with experimental data. Normally, GROMACS calculations will provide a single value, so type=1 should be used in that case.

Returns

A named tuple with the following fields:

tot: Total transfer free energy (kcal/mol).
bb: Contribution from the backbone (kcal/mol).
sc: Contribution from the side chains (kcal/mol).
restype: A dictionary with the transfer free energy contributions per residue type (kcal/mol).

Each entry in the dictionary is a named tuple with bb and sc fields representing the backbone and side chain contributions, respectively.

Example calls

using MolSimToolkit
protein = read_pdb("protein.pdb")

# Using SASA values calculated with PDBTools.jl
sasas=delta_sasa_per_restype(native=read_pdb("native.pdb"), desnat=read_pdb("desnat.pdb"))
mvalue(; model=AutonBolen, cosolvent="TMAO", atoms=protein, sasas=sasas)

# Using SASA values from the m-value server
sasas_from_server=parse_mvalue_server_sasa(server_output)
mvalue(; model=MoeserHorinek, cosolvent="urea", atoms=protein, sasas=sasas_from_server, type=2)

# Using SASA values calculated with GROMACS
sasas_gmx=gmx_delta_sasa_per_restype(native_pdb="native.pdb", desnat_pdb="desnat.pdb")
mvalue(; model=AutonBolen, cosolvent="TMAO", atoms=protein, sasas=sasas_gmx)

References

https://doi.org/10.1021/acs.jpcb.7b02138
https://doi.org/10.1016/s0076-6879(07)28023-1
https://www.pnas.org/doi/10.1073/pnas.0706251104

source

MolSimToolkit.delta_sasa_per_restype — Function

delta_sasa_per_restype(; 
    native::AbstractVector{<:PDBTools.Atom}, 
    desnat::AbstractVector{<:PDBTools.Atom}
)

Calculates the change in solvent accessible surface area (SASA) upon denaturation for each amino acid type using PDBTools. Returns a dictionary that can be directly used as input to the mvalue function.

Arguments

native: Vector of PDBTools.Atom objects for the native structure.
desnat: Vector of PDBTools.Atom objects for the denatured structure.

Returns

A dictionary where each key is an amino acid three-letter code (e.g., "ALA", "PHE"), and the value is another dictionary with two keys: :sc for side chain SASA values and :bb for backbone SASA values. Each of these keys maps to a tuple containing a single Float64 value representing the change in SASA upon denaturation in Å².

Optional arguments

n_dots::Int=500: Sets the precision of the SASA calculation (greater is better).
backbone::Function = at -> name(at) in ("N", "CA", "C", "O"): Define what is a backbone atom.
sidechain::Function = at -> !(name(at) in ("N", "CA", "C", "O")): Define what is a sidechain atom.
ignore_hydrogen::Bool = true: By default, ignore all Hydrogen atoms of the structure.
unitcell=nothing: By default, do not use periodic boundary conditions. To use PBCs, define A unitcell by providing either a 3x3 matrix or, for orthorhombic cells, a vector of length 3 of cell sides.

source

MolSimToolkit.parse_mvalue_server_sasa — Function

parse_mvalue_server_sasa(string::AbstractString)

Parses the SASA output from the m-value calculator server (http://best.bio.jhu.edu/mvalue/), into a dictionary that can be directly used as input to the mvalue function.

The input string should contain lines formatted as follows, and correspond to the SASA values for each amino acid type:

sasa_from_server = """
ALA 	8 	 (    11.1)     79.1 [   147.1] | (   -13.0)     51.4 [   115.8] 
PHE 	3 	 (   166.9)    197.1 [   230.2] | (    29.4)     56.4 [    83.4] 
LEU 	7 	 (   475.2)    532.2 [   589.3] | (    89.3)    145.3 [   201.3] 
...
LYS 	6 	 (   171.5)    220.4 [   269.3] | (    -4.5)     42.0 [    88.5] 
ARG 	1 	 (   110.2)    124.4 [   138.6] | (    17.1)     25.0 [    33.0] 
CYS 	0 	 (     0.0)      0.0 [     0.0] | (     0.0)      0.0 [     0.0] 
"""

This data can be found in the output of the server, under the title "Sidechain and Backbone changes in Accessible Surface Area".

The function returns a dictionary where each key is an amino acid three-letter code (e.g., "ALA", "PHE"), and the value is another dictionary with two keys: :sc for side chain SASA values and :bb for backbone SASA values. Each of these keys maps to a tuple containing three Float64 values representing the minimum, average, and maximum SASA values in Å².

source

MolSimToolkit.gmx_delta_sasa_per_restype — Function

gmx_delta_sasa_per_restype(; native_pdb::AbstractString, desnat_pdb::AbstractString)

Calculates the change in solvent accessible surface area (SASA) upon denaturation for each amino acid type using GROMACS. Returns a dictionary that can be directly used as input to the mvalue function.

Note

This function requires GROMACS (gmx sasa executable) to be installed and accessible from the command line. The path to the gmx executable can be provided with the gmx keyword.

Arguments

native_pdb::AbstractString: Path to the PDB file of the native protein structure.
desnat_pdb::AbstractString: Path to the PDB file of the denatured protein structure.

Optional arguments

gmx: the path to the gmx GROMACS exectuable (by default it expects gmx to be on the path).
n_dots::Int: sets the precision of the SASA grid (greater is better).
backbone::Function = at -> name(at) in ("N", "CA", "C", "O"): Define what is a backbone atom.
sidechain::Function = at -> !(name(at) in ("N", "CA", "C", "O")): Define what is a sidechain atom.
ignore_hydrogen::Bool=true: By default, ignore all hydrogen atoms.

Returns

source

Computing m-values along a trajectory

Here we show an example of how to compute m-values along a trajectory. We define a function to iterate over the simulation frames, and compute the m-values for each frame, considering the first frame as the reference state:

using MolSimToolkit, PDBTools
function mvalue_traj(sim::Simulation, protein_ref::AbstractVector{<:PDBTools.Atom})
    # indices of the protein atoms, to fetch frame coordinates
    inds_protein = index.(protein_ref)
    # Create a copy to update coordinates at each frame
    protein_at_frame = copy.(protein_ref)
    # Create arrays to store total and bb and sc contributions
    tot = Float32[]
    bb = Float32[]
    sc = Float32[]
    for frame in sim
        # fetch protein coordinates and unitcell
        p = positions(frame)[inds_protein]
        uc = unitcell(frame)
        # update coordinates (note the dot for broadcast)
        set_position!.(protein_at_frame, p)
        # Compute variations in SASA relative to reference
        dsasa = delta_sasa_per_restype(;
            native=protein_ref,
            desnat=protein_at_frame,
        )
        # Compute mvalue
        m = mvalue(;
            model=MoeserHorinek,
            cosolvent="urea",
            atoms=protein_ref,
            sasas=dsasa,
        )
        # push total, backbone and sidechain mvalues to arrays
        push!(tot, m.tot)
        push!(bb, m.bb)
        push!(sc, m.sc)
    end
    return tot, bb, sc
end

mvalue_traj (generic function with 1 method)

Running the above function over a trajectory can be done with:

using MolSimToolkit.Testing # to load test files
# Build Simulation object
sim = Simulation(Testing.namd_pdb, Testing.namd_traj)
# We are interested only in protein atoms
protein = read_pdb(Testing.namd_pdb, "protein")
# Lets get the positions of the first frame for reference
firstframe!(sim)
p_ref = positions(current_frame(sim))[index.(protein)]
set_position!.(protein, p_ref) # note the dot
# Run the mvalue-traj function
tot, bb, sc = mvalue_traj(sim, protein)

(Float32[0.0, -0.021589363, -0.10957764, -0.09288491, -0.11828558], Float32[-0.0, -0.023671875, -0.068694666, -0.053313598, -0.07829739], Float32[0.0, 0.0020825122, -0.04088297, -0.03957131, -0.03998819])

Plotting the results, we obtain the $\Delta_{\textrm{ref}\rightarrow\textrm{target}}\Delta G^{T}$, the difference in transfer free energy from water to urea at 1M, of the target structure at each frame relative to the reference structure:

using Plots, Statistics
plt = plot(MolSimStyle, layout=(1,2))
plot!(plt, tot; label="Total", lw=2, sp=1)
plot!(plt, bb; label="Backbone", lw=2, sp=1)
plot!(plt, sc; label="Sidechains", lw=2, sp=1)
plot!(xlabel="frame", ylabel="m-value / (kJ/mol)", sp=1)
bar!(plt, [1  2  3],
    [mean(tot)  mean(bb)  mean(sc)],
    yerr=[std(tot)/5  std(bb)/5  std(sc)/5], # just illustrative
    xticks=([1,2,3],["Total","Backbone","Sidechain"]),
    sp=2, label="", xlabel="", ylims=(-0.08,0),
    ylabel="m-value (kJ/mol)",
)

Thus, it seems that urea is favoring the evolution of the conformations of the protein in time (target structures have more negative transfer free energies than the reference structure) with the interactions with the backbone being more relevant than those of the sidechains. The errors are of course only illustrative.