The package is accompanied by an example input file, which is found in
the topolink/input directory. It is mostly
self-explicative. Possibly the best way to get started is to open
the input file and look at it. An examle is available
[here].
The executable must be run with:
topolink inputfile.inp > topolink.log
Alternatively, the PDB file of the structure can be provided on the
command line as the second argument, overwriting the definition
given in the input file:
The ouput file generated by TopoLink contains two sections which are of
interest of the investigator:
1. The result of the search of each topological distance.
2. The statistics of the links for each experiment.
3. The overall statistics of the links in the model.
Before these sections, the output file contains several input file
checking prints, which can be ignored. The package also outputs errors
and stops when inconsistencies in the input file are found, such as:
duplicate definition of observations, observations that are not
consistent with the defined reactivity, and others.
1. Result for the topological distance of each link
At some point, the output file of TopoLink will contain specific link
information, in the following form:
-------------------------------------------------------------------------------------------------------------------------
RESIDUE1 ATOM1 RESIDUE2 ATOM2 EUCLDIST TOPODIST OBSERVED DMIN DMAX RESULT OBSRES REACRES RA AA
-------------------------------------------------------------------------------------------------------------------------
LINK: MET A 1 N LYS A 17 CB 20.647 >16.900 YES 0.000 16.900 BAD: EUCL 0/ 1 0/ 1 YN YN
LINK: MET A 1 N LYS A 113 CB 8.955 9.058 YES 0.000 16.900 OK: FOUND 1/ 1 1/ 1 YY YY
LINK: GLU A 4 CB ASP A 22 CB 29.619 >15.600 NO >15.600 >15.600 OK: EUCL 0/ 1 0/ 1 YY YY
...
This is the result for the search of a topological distance for each
link. the results are to be intepreted in the following manner: First,
there is the definition of the atoms forming the possible link.
The 10th column contain the euclidean distance between these atoms.
The 11th column contain the topological distance, reported with the
following details: if an upper bound is reported, for example
>16.900, that means that TopoLink could not find any
topological distance smaller than that value. That upper bound is the
linker length of the largest linker that can bind the two atoms
involved, according to link type definitions.
Therefore, there is no topological distance shorter than the
maximum linker length which could bind these atoms. The link should not
be observed in this case, according to the model.
The 12th column reports if that particular link was observed in some
experiment. In this case, the first link was observed.
The 13th and 14th columns contain the experimental maximum and minimum
distances for that linker. That is, if the link was observed with a
linker of 16.900 length, the distance between the atoms
must be shorter than that, defining a range [0.,16.9]. If,
additionally, the link was NOT observed for a shorter link of, lets say,
12.0 length, that defines an lower bound for the distance,
and the range would be [12.0,16.9]. In the first link
example, the link was observed using a link of length 16.9.
The 15th column contains the specification of the overall result.
BAD specifiers indicate that the topological distance
obtained is not consistent with the observations. OK
indicates that it is consistent. The possible results are:
Code
Description
OK: FOUND
A valid topological distance was found, and it is
consistent with all observations.
OK: LONG
A valid topological distance was found, is it longer
than the linker length, and the link was NOT observed, thus the result
is consistent with observations.
OK: EUCL
The euclidean distance is already too long, and the
link was NOT observed, thus the result is consistent with observations.
OK: NOTFOUND
No valid topological distance was found, but
the euclidean distance is shorter than the linker
length. The link was also
NOT observed, such that the result is consistent with observations.
BAD: SHORT
A topological distance was found, which is shorter
than a linker length for which the link was NOT observed. Since it was
NOT observed, that linker should be a lower bound to the distance, which
is violated.
BAD: LONG
A topological distance was found, and the link was
observed. However, the distance is too long for the linker length. The
link, therefore, is not consistent with observations.
BAD: EUCL
The link was observed, but the euclidean distance in
the model is already too long to be consistent with that observation.
BAD: NOTFOUND
The link was observed, the euclidean distance is fine,
but no valid topological distance was found. Thus, the link is not
consistent with the observation.
BAD: MISSING
The link was NOT observed, and the topological
distance is such that it should have been.
The 16th column contains the ratio between consistent observations and
expected observations, considering that reactive atoms are only those which
were experimentally observed to react (at a crosslink or deadend).
That is, 1/3 means that the link was observed in one
experiment, but it should have been observed in three, according to the
linkers used and observed reactivity.
The 17th column contains the ratio between consistent observations and
expected observations, considering that reactive atoms are given by
residue types. That is, if the linker is expected to bind two Lysine
residues, all pairs of Lysine residues with consistent topological
distances should be observed to crosslink. The ratio reported is the number of actual
observations relative to the number of expected observations.
The 18th and 19th columns contain the information about solvent
accessibilities of residues (RA - Residue Accessibility) and specific
atoms (AA - Atom Accessibility). A "Y" indicates that a residue or atom
is accessible to solvent, and a "N" indicate it is not accessible. A
residue is considered accessible to solvent if ANY of its atoms is
accessible to solvent. Each pair of "YY" or "YN", etc, refers to the
first and second residue or atom, respectively. Therefore a "YN" in the
"RA" column indicates that the first residue is accessible to solvent,
but the second residue is not.
2. Statistics of the links for each experiment
For each experiment, a detailed account of the links is provided, in the
following form:
-------------------------------------------------------------------------------
Experiment: DSS
Number of type-reactive pairs of atoms: 105
Number of observed-reactive pairs of atoms: 21
Number of observed-reactive pairs of atoms within linker reach: 5
Number of observed-reactive pairs of atoms outside linker reach: 16
Missing links, according to the structure and observed-reactivity: 0
Number of observed links: 7
Number of observed links consistent with the structure: 5
Number of observed links NOT consistent with the structure: 2
Sum of scores of observed links: 0.0000000
Sensitivity of the cross-linking reaction: 1.0000000
False-assignment probability: 0.0190476
Likelihood of the experimental result: 0.9999953
Log-likelihood of the experimental result: -0.0000047
Likelihood using user-defined pbad and pgood: 0.9996471
Log-likelihood using user-defined pbad and pgood: -0.0003530
Using: pgood = 0.700; pbad = 0.010
-------------------------------------------------------------------------------
Important here is to understand what is the "observed-reactivity":
observed reactivity is defined by the experiment. If a residue is
observed, experimentally, to participate in a crosslink or form a
dead-end, it is a "observed-reactive" residue, meaning that one is sure
that it is solvent exposed and available for reaction. Type-reactivity,
on the other side, is simply the expected reactivity as a function of
the chemical nature of the residue and of the linker.
The sensitivity of the crosslinking reaction is defined,
here as the fraction of links that were observed relative to the links
that were expected to be formed from the reactivity and structure. In
the example above, 5 pairs of observed-reactive pairs of atoms were
found to be within linker reach, and all of these links were
experimentally observed, thus the sensitivity is 1.00.
The false-assignment probability is the ratio between the
number of links experimentally observed and the total number of
type-reactive pairs of atoms. That is, the probability of any pair of
type-reactive atoms be reported as forming a link, incorrectly. In the
example above, 2/105=0.019.
The results above allow for the estimation of a likelihood for the
model, which is reported. The likelihood computed using user-defined
sensitivities and false-positive probabilities is also reported.
3. Overall crosslink statistics of the model
The final result of TopoLink is reported as follows:
------------------------------------------------------------------------------------
FINAL RESULTS:
RESULT0: 12 : Number of observations that are consistent with the structure.
RESULT1: 12 : Number of topological distances consistent with all observations.
RESULT2: 16 : Number of topological distances NOT consistent with observations.
RESULT3: 0 : Number of links with missing observations.
RESULT4: 0.00000 : Sum of scores of observed links of all experiments.
RESULT5: 0.99980 : Likelihood of the set of experimental results.
RESULT6: -0.00020 : Log-likelihood of the set of experimental results.
Using: pgood = 0.700; pbad = 0.010
RESULT7: 0.99965 : Likelihood of the set of experimental results.
RESULT8: -0.00035 : Log-likelihood of the set of experimental results.
------------------------------------------------------------------------------------
The results are reported with flags "RESULT0/8" to
facilitate greping the data for multiple model evaluation. The results
are self explicative, but must be clarified in some points:
The number of observations that are consistent with the structure is not
the same as the number of topological distances which are consistent
with observations. For a topological distance be consistent with
observations, it must be consistent with all observations at the
same time. This is particularly important if different linkers that can
react with the same pairs of atoms were used in different experiments.
A topological distance is said to be NOT consistent with the
observations if it is not consistent with a single observation.