Computing the MDDF
Minimum-Distance Distribution Function
The main function of the ComplexMixtures package actually computes the MDDF between the solute and the solvent chosen.
ComplexMixtures.mddf
— Functionmddf(
trajectory_file::String,
solute::AtomSelection,
solvent::AtomSelection, # optional: if omitted, an auto-correlation will be computed
options::Options=Options(); # optional: if omitted, default options will be used
# optional keywords:
trajectory_format="",
frame_weights = Float64[],
coordination_number_only = false,
low_memory = false,
chemfiles = false,
)
Computes the minimum-distance distribution function, atomic contributions, and KB integrals, given the trajectory file name, and the definition of the solute and solvent groups of atoms as AtomSelection
data structures. If the solvent
parameter is omitted, a self-correlation will be computed.
This is the main function of the ComplexMixtures
package.
The options
parameter is optional. If not set, the default Options()
structure will be used.
Optional execution keywords
trajectory_format
is a string that defines the format of the trajectory file. If not provided, the format will be guessed from the file extension, and theChemfiles
library will be used to read the trajectory.frame_weights
is an array of weights for each frame of the trajectory. If this is provided, the MDDF will be computed as a weighted average of the MDDFs of each frame. This can be used to study the solvation dependency in perturbed ensembles.coordination_number_only
is a boolean that, if set totrue
, will compute only the site-counts and coordination numbers of the solvent molecules around the solute, and not the MDDF, RDF, or KB integrals. This is useful when the normalization of the distribution is not possible or needed, for instance when the bulk solution is not properly defined. The computation is much faster in this case. The call tomddf
with this option is equivalent to direct the call tocoordination_number
.low_memory
can be set totrue
to reduce the memory requirements of the computation. This will parallelize the computation of the minimum distances at a lower level, reducing the memory requirements at the expense of some performance.chemfiles
is a boolean that, if set totrue
, will use try to use theChemfiles
library to read the trajectory file. independently of the file extension. By default, PDB and DCD trajectories are read by specific readers.
The current call signature was introduced in version 2.9.0. The previous call signature with the previous construction of the Trajectory
object, is still available, but it is no longer recommended because it more prone to user-level solute and solvent configuration errors.
Examples
julia> using ComplexMixtures, PDBTools
julia> using ComplexMixtures.Testing: data_dir
julia> atoms = readPDB(joinpath(data_dir,"NAMD/structure.pdb"));
julia> solute = AtomSelection(select(atoms, "protein"), nmols=1);
julia> solvent = AtomSelection(select(atoms, "resname TMAO"), natomspermol=14);
julia> options = Options(lastframe=10, bulk_range=(10.0, 15.0));
julia> trajectory_file = joinpath(data_dir,"NAMD/trajectory.dcd");
julia> results = mddf(trajectory_file, solute, solvent, options);
The mddf
functions is run with, for example:
results = mddf(trajectory_file, solute, solvent, Options(bulk_range=(10.0, 15.0)))
The MDDF along with other results, like the corresponding KB integrals, are returned in the results
data structure, which is described in the next section.
It is possible to tune several options of the calculation, by setting the Options
data structure with user-defined values in advance. The most common parameters to be set by the user are bulk_range
and stride
.
stride
defines if some frames will be skip during the calculation (for speedup). For example, if stride=5
, only one in five frames will be considered. Adjust stride with:
options = Options(stride=5, bulk_range=(10.0, 15.0))
results = mddf(trajectory_file, solute, solvent, options)
bulk_range
defines the subset of the system, as defined according to a range of distances from the solute, that are to be considered as the bulk solution. Within this range of distances, the user believes that the reference solute molecule does not significantly affect anymore the structure of the solvent.
By default, all molecules above 10 Angstroms from the solute are considered bulk molecules (corresponding to Options(dbulk=10.0)
), but it is highly recommended to use a manual definition of bulk_range
.
The definition of a range of distances within the system to compute the bulk density is adequate because this system subset is then an open system with a solvent molecule reservoir. The adequate choice of bulk_range
can be inspected by the proper convergence of the distribution functions (which must converge to 1.0) and a proper convergence of the KB integrals.
The bulk_range
option was introduced in version 2.1.0.
See the Options section for further details and other options to set.
Coordination numbers only
The coordination number is the unnormalized count of how many molecules of the solvent are within a given distance to the solute. Coordination numbers can be computed for systems where the normalization of the distribution functions is not possible (or needed) because of an ill definition of an ideal-gas state. For example, in highly heterogeneous systems, on in systems with only a few molecules of the "solvent", the density of the bulk solution might not be properly defined.
In these cases, nevertheless, coordination numbers can be computed and still provide valuable information about the molecular structure of the system. Coordination number can be computed also from the results obtained from a mddf
run, as explained in the corresponding section of the Tools menu.
The coordination_number
function, called with the same arguments as the mddf
function, can be used to compute coordination numbers without the normalization required for the MDDF, providing (possibly much) faster computations when the normalization is not possible or required:
ComplexMixtures.coordination_number
— Methodcoordination_number(
trajectory_file::String,
solute::AtomSelection,
solvent::AtomSelection, # optional: if omitted, an auto-correlation will be computed
options::Options=Options(); # optional: if omitted, default options will be used
# optional keywords:
trajectory_format="",
frame_weights = Float64[],
low_memory = false,
)
Computes the minimum-distance coordination number and atomic contributions, given the trajectory file name, and the definition of the solute and solvent groups of atoms as AtomSelection
data structures. If the solvent
parameter is omitted, a self-coordination number will be computed.
The output is a Result
structure, which contains the data as the result of a call to mddf
, except that all counters which require normalization of the distribution will be zero. In summary, this result data structure can be used to compute the coordination numbers, but not the MDDF, RDF, or KB integrals.
The options
parameter is optional. If not set, the default Options()
structure will be used.
Optional execution keywords
trajectory_format
is a string that defines the format of the trajectory file. If not provided, the format will be guessed from the file extension, and theChemfiles
library will be used to read the trajectory.frame_weights
is an array of weights for each frame of the trajectory. If this is provided, the MDDF will be computed as a weighted average of the MDDFs of each frame. This can be used to study the solvation dependency in perturbed ensembles.low_memory
can be set totrue
to reduce the memory requirements of the computation. This will parallelize the computation of the minimum distances at a lower level, reducing the memory requirements at the expense of some performance.chemfiles
is a boolean that, if set totrue
, will use try to use theChemfiles
library to read the trajectory file. independently of the file extension. By default, PDB and DCD trajectories are read by specific readers.
The current call signature was introduced in version 2.9.0. The previous call signature with the previous construction of the Trajectory
object, is still available, but it is no longer recommended because it more prone to user-level solute and solvent configuration errors.
Examples
julia> using ComplexMixtures, PDBTools
julia> using ComplexMixtures.Testing: data_dir
julia> atoms = readPDB(joinpath(data_dir,"NAMD/structure.pdb"));
julia> solute = AtomSelection(select(atoms, "protein"), nmols=1);
julia> solvent = AtomSelection(select(atoms, "resname TMAO"), natomspermol=14);
julia> options = Options(lastframe=10, bulk_range=(10.0, 15.0));
julia> trajectory_file = joinpath(data_dir,"NAMD/trajectory.dcd");
julia> results = coordination_number(trajectory_file, solute, solvent, options);
The mddf
, kb
, and random count parameters will be filled with zeros when using this options, and are meaningless.