m-values (protein transfer free energy) calculator
This is an experimental feature. Breaking changes may occur without a breaking package release.
Computing m-values for single structure pairs
MolSimToolkit.mvalue — Functionmvalue(; model=MoeserHorinek, cosolvent="urea", atoms:AbstractVector{<:PDBTools.Atom}, sasas, type=1)Calculates the m-value (transfer free energy of a protein in 1M solution) using the Tanford transfer model, as implemented by Moeser and Horinek [1] or by Auton and Bolen [2,3].
Arguments
model: The model to be used. Must beMoeserHorinekorAutonBolen.MoeserHorinekis only implemented forcosolvent="urea", and should be more precise in that case. Other solvents are available forAutonBolen.cosolvent::String: One of "Betaine", "Proline", "Sarcosine", "Sorbitol", "Sucrose", "Tmao", "tmao", "TMAO", "urea", "Urea", "UREA", "UreaApp", "UreaMH"atoms::AbstractVector{<:PDBTools.Atom}: Vector containing the atoms of the structure.sasas::Dict{String, Dict{Symbol, Float64}}: A dictionary containing the change in solvent accessible surface area (SASA) upon denaturation for each amino acid type. This data can be obtained from the m-value server or calculated using GROMACS:- The output of the server can be parsed using the
parse_mvalue_server_sasafunction defined in this module. - Compute the SASA with
delta_sasa_per_restype, a SASA calculation utility implemented in PDBTools.jl. - SASA values can be calculated using GROMACS with the
gmx_delta_sasa_per_restypefunction defined in this module.
- The output of the server can be parsed using the
type::Int: Specifies which SASA value to use from the provided data, because the server provides minimum, average, and maximum values, according to different denatured models for the protein. The recommended value is2for comparison with experimental data. Normally, GROMACS calculations will provide a single value, sotype=1should be used in that case.
Returns
A named tuple with the following fields:
tot: Total transfer free energy (kcal/mol).bb: Contribution from the backbone (kcal/mol).sc: Contribution from the side chains (kcal/mol).restype: A dictionary with the transfer free energy contributions per residue type (kcal/mol).
Each entry in the dictionary is a named tuple with bb and sc fields representing the backbone and side chain contributions, respectively.
Example calls
using MolSimToolkit
protein = read_pdb("protein.pdb")
# Using SASA values calculated with PDBTools.jl
sasas=delta_sasa_per_restype(native=read_pdb("native.pdb"), desnat=read_pdb("desnat.pdb"))
mvalue(; model=AutonBolen, cosolvent="TMAO", atoms=protein, sasas=sasas)
# Using SASA values from the m-value server
sasas_from_server=parse_mvalue_server_sasa(server_output)
mvalue(; model=MoeserHorinek, cosolvent="urea", atoms=protein, sasas=sasas_from_server, type=2)
# Using SASA values calculated with GROMACS
sasas_gmx=gmx_delta_sasa_per_restype(native_pdb="native.pdb", desnat_pdb="desnat.pdb")
mvalue(; model=AutonBolen, cosolvent="TMAO", atoms=protein, sasas=sasas_gmx)References
- https://doi.org/10.1021/acs.jpcb.7b02138
- https://doi.org/10.1016/s0076-6879(07)28023-1
- https://www.pnas.org/doi/10.1073/pnas.0706251104
MolSimToolkit.delta_sasa_per_restype — Functiondelta_sasa_per_restype(;
native::AbstractVector{<:PDBTools.Atom},
desnat::AbstractVector{<:PDBTools.Atom}
)Calculates the change in solvent accessible surface area (SASA) upon denaturation for each amino acid type using PDBTools. Returns a dictionary that can be directly used as input to the mvalue function.
Arguments
native: Vector of PDBTools.Atom objects for the native structure.desnat: Vector of PDBTools.Atom objects for the denatured structure.
Returns
A dictionary where each key is an amino acid three-letter code (e.g., "ALA", "PHE"), and the value is another dictionary with two keys: :sc for side chain SASA values and :bb for backbone SASA values. Each of these keys maps to a tuple containing a single Float64 value representing the change in SASA upon denaturation in Ų.
Optional arguments
n_dots::Int=500: Sets the precision of the SASA calculation (greater is better).backbone::Function = at -> name(at) in ("N", "CA", "C", "O"): Define what is a backbone atom.sidechain::Function = at -> !(name(at) in ("N", "CA", "C", "O")): Define what is a sidechain atom.ignore_hydrogen::Bool = true: By default, ignore all Hydrogen atoms of the structure.unitcell=nothing: By default, do not use periodic boundary conditions. To use PBCs, define A unitcell by providing either a 3x3 matrix or, for orthorhombic cells, a vector of length 3 of cell sides.
MolSimToolkit.parse_mvalue_server_sasa — Functionparse_mvalue_server_sasa(string::AbstractString)Parses the SASA output from the m-value calculator server (http://best.bio.jhu.edu/mvalue/), into a dictionary that can be directly used as input to the mvalue function.
The input string should contain lines formatted as follows, and correspond to the SASA values for each amino acid type:
sasa_from_server = """
ALA 8 ( 11.1) 79.1 [ 147.1] | ( -13.0) 51.4 [ 115.8]
PHE 3 ( 166.9) 197.1 [ 230.2] | ( 29.4) 56.4 [ 83.4]
LEU 7 ( 475.2) 532.2 [ 589.3] | ( 89.3) 145.3 [ 201.3]
...
LYS 6 ( 171.5) 220.4 [ 269.3] | ( -4.5) 42.0 [ 88.5]
ARG 1 ( 110.2) 124.4 [ 138.6] | ( 17.1) 25.0 [ 33.0]
CYS 0 ( 0.0) 0.0 [ 0.0] | ( 0.0) 0.0 [ 0.0]
"""This data can be found in the output of the server, under the title "Sidechain and Backbone changes in Accessible Surface Area".
The function returns a dictionary where each key is an amino acid three-letter code (e.g., "ALA", "PHE"), and the value is another dictionary with two keys: :sc for side chain SASA values and :bb for backbone SASA values. Each of these keys maps to a tuple containing three Float64 values representing the minimum, average, and maximum SASA values in Ų.
MolSimToolkit.gmx_delta_sasa_per_restype — Functiongmx_delta_sasa_per_restype(; native_pdb::AbstractString, desnat_pdb::AbstractString)Calculates the change in solvent accessible surface area (SASA) upon denaturation for each amino acid type using GROMACS. Returns a dictionary that can be directly used as input to the mvalue function.
This function requires GROMACS (gmx sasa executable) to be installed and accessible from the command line. The path to the gmx executable can be provided with the gmx keyword.
Arguments
native_pdb::AbstractString: Path to the PDB file of the native protein structure.desnat_pdb::AbstractString: Path to the PDB file of the denatured protein structure.
Optional arguments
gmx: the path to thegmxGROMACS exectuable (by default it expectsgmxto be on the path).n_dots::Int: sets the precision of the SASA grid (greater is better).backbone::Function = at -> name(at) in ("N", "CA", "C", "O"): Define what is a backbone atom.sidechain::Function = at -> !(name(at) in ("N", "CA", "C", "O")): Define what is a sidechain atom.ignore_hydrogen::Bool=true: By default, ignore all hydrogen atoms.
Returns
A dictionary where each key is an amino acid three-letter code (e.g., "ALA", "PHE"), and the value is another dictionary with two keys: :sc for side chain SASA values and :bb for backbone SASA values. Each of these keys maps to a tuple containing a single Float64 value representing the change in SASA upon denaturation in Ų.
Computing m-values along a trajectory
Here we show an example of how to compute m-values along a trajectory. We define a function to iterate over the simulation frames, and compute the m-values for each frame, considering the first frame as the reference state:
using MolSimToolkit, PDBTools
function mvalue_traj(sim::Simulation, protein_ref::AbstractVector{<:PDBTools.Atom})
# indices of the protein atoms, to fetch frame coordinates
inds_protein = index.(protein_ref)
# Create a copy to update coordinates at each frame
protein_at_frame = copy.(protein_ref)
# Create arrays to store total and bb and sc contributions
tot = Float32[]
bb = Float32[]
sc = Float32[]
for frame in sim
# fetch protein coordinates and unitcell
p = positions(frame)[inds_protein]
uc = unitcell(frame)
# update coordinates (note the dot for broadcast)
set_position!.(protein_at_frame, p)
# Compute variations in SASA relative to reference
dsasa = delta_sasa_per_restype(;
native=protein_ref,
desnat=protein_at_frame,
)
# Compute mvalue
m = mvalue(;
model=MoeserHorinek,
cosolvent="urea",
atoms=protein_ref,
sasas=dsasa,
)
# push total, backbone and sidechain mvalues to arrays
push!(tot, m.tot)
push!(bb, m.bb)
push!(sc, m.sc)
end
return tot, bb, sc
endmvalue_traj (generic function with 1 method)Running the above function over a trajectory can be done with:
using MolSimToolkit.Testing # to load test files
# Build Simulation object
sim = Simulation(Testing.namd_pdb, Testing.namd_traj)
# We are interested only in protein atoms
protein = read_pdb(Testing.namd_pdb, "protein")
# Lets get the positions of the first frame for reference
firstframe!(sim)
p_ref = positions(current_frame(sim))[index.(protein)]
set_position!.(protein, p_ref) # note the dot
# Run the mvalue-traj function
tot, bb, sc = mvalue_traj(sim, protein)(Float32[0.0, -0.021589363, -0.10957764, -0.09288491, -0.11828558], Float32[-0.0, -0.023671875, -0.068694666, -0.053313598, -0.07829739], Float32[0.0, 0.0020825122, -0.04088297, -0.03957131, -0.03998819])Plotting the results, we obtain the $\Delta_{\textrm{ref}\rightarrow\textrm{target}}\Delta G^{T}$, the difference in transfer free energy from water to urea at 1M, of the target structure at each frame relative to the reference structure:
using Plots, Statistics
plt = plot(MolSimStyle, layout=(1,2))
plot!(plt, tot; label="Total", lw=2, sp=1)
plot!(plt, bb; label="Backbone", lw=2, sp=1)
plot!(plt, sc; label="Sidechains", lw=2, sp=1)
plot!(xlabel="frame", ylabel="m-value / (kJ/mol)", sp=1)
bar!(plt, [1 2 3],
[mean(tot) mean(bb) mean(sc)],
yerr=[std(tot)/5 std(bb)/5 std(sc)/5], # just illustrative
xticks=([1,2,3],["Total","Backbone","Sidechain"]),
sp=2, label="", xlabel="", ylims=(-0.08,0),
ylabel="m-value (kJ/mol)",
)Thus, it seems that urea is favoring the evolution of the conformations of the protein in time (target structures have more negative transfer free energies than the reference structure) with the interactions with the backbone being more relevant than those of the sidechains. The errors are of course only illustrative.