How to use TOPOLINK
The package is accompanied by an example input file, which is found in the
topolink/input directory. It is mostly
self-explicative. Possibly the best way to get started is to open
the input file and look at it. An examle is available
[here].
The executable must be run with:
topolink inputfile.inp > topolink.logAlternatively, the PDB file of the structure can be provided on the command line as the second argument, overwriting the definition given in the input file:
topolink inputfile.inp model.pdb > topolink.log
Sections: |
|
| INPUT | OUTPUT |
The output of TopoLink
The ouput file generated by TopoLink contains two sections which are of interest of the investigator:-
1. The result of the search of each topological distance.
2. The statistics of the links for each experiment.
3. The overall statistics of the links in the model.
Before these sections, the output file contains several input file checking prints, which can be ignored. The package also outputs errors and stops when inconsistencies in the input file are found, such as: duplicate definition of observations, observations that are not consistent with the defined reactivity, and others.
1. Result for the topological distance of each link
At some point, the output file of TopoLink will contain specific link information, in the following form:
This is the result for the search of a topological distance for each link. the results are to be intepreted in the following manner: First, there is the definition of the atoms forming the possible link.
The 10th column contain the euclidean distance between these atoms.
The 11th column contain the topological distance, reported with the following details: if an upper bound is reported, for example
>16.900, that means that TopoLink could not find any
topological distance smaller than that value. That upper bound is the
linker length of the largest linker that can bind the two atoms
involved, according to link type definitions.
Therefore, there is no topological distance shorter than the
maximum linker length which could bind these atoms. The link should not
be observed in this case, according to the model.
The 12th column reports if that particular link was observed in some experiment. In this case, the first link was observed.
The 13th and 14th columns contain the experimental maximum and minimum distances for that linker. That is, if the link was observed with a linker of
16.900 length, the distance between the atoms
must be shorter than that, defining a range [0.,16.9]. If,
additionally, the link was NOT observed for a shorter link of, lets say,
12.0 length, that defines an lower bound for the distance,
and the range would be [12.0,16.9]. In the first link
example, the link was observed using a link of length 16.9.
The 15th column contains the specification of the overall result.
BAD specifiers indicate that the topological distance
obtained is not consistent with the observations. OK
indicates that it is consistent. The possible results are:
| Code | Description |
OK: FOUND |
A valid topological distance was found, and it is consistent with all observations. |
OK: LONG |
A valid topological distance was found, is it longer than the linker length, and the link was NOT observed, thus the result is consistent with observations. |
OK: EUCL |
The euclidean distance is already too long, and the link was NOT observed, thus the result is consistent with observations. |
OK: NOTFOUND |
No valid topological distance was found, but the euclidean distance is shorter than the linker length. The link was also NOT observed, such that the result is consistent with observations. |
BAD: SHORT |
A topological distance was found, which is shorter than a linker length for which the link was NOT observed. Since it was NOT observed, that linker should be a lower bound to the distance, which is violated. |
BAD: LONG |
A topological distance was found, and the link was observed. However, the distance is too long for the linker length. The link, therefore, is not consistent with observations. |
BAD: EUCL |
The link was observed, but the euclidean distance in the model is already too long to be consistent with that observation. |
BAD: NOTFOUND |
The link was observed, the euclidean distance is fine, but no valid topological distance was found. Thus, the link is not consistent with the observation. |
BAD: MISSING |
The link was NOT observed, and the topological distance is such that it should have been. |
The 16th column contains the ratio between consistent observations and expected observations, considering that reactive atoms are only those which were experimentally observed to react (at a crosslink or deadend). That is,
1/3 means that the link was observed in one
experiment, but it should have been observed in three, according to the
linkers used and observed reactivity.
The 17th column contains the ratio between consistent observations and expected observations, considering that reactive atoms are given by residue types. That is, if the linker is expected to bind two Lysine residues, all pairs of Lysine residues with consistent topological distances should be observed to crosslink. The ratio reported is the number of actual observations relative to the number of expected observations.
The 18th and 19th columns contain the information about solvent accessibilities of residues (RA - Residue Accessibility) and specific atoms (AA - Atom Accessibility). A "Y" indicates that a residue or atom is accessible to solvent, and a "N" indicate it is not accessible. A residue is considered accessible to solvent if ANY of its atoms is accessible to solvent. Each pair of "YY" or "YN", etc, refers to the first and second residue or atom, respectively. Therefore a "YN" in the "RA" column indicates that the first residue is accessible to solvent, but the second residue is not.
2. Statistics of the links for each experiment
For each experiment, a detailed account of the links is provided, in the following form:
------------------------------------------------------------------------------- Experiment: DSS Number of type-reactive pairs of atoms: 105 Number of observed-reactive pairs of atoms: 21 Number of observed-reactive pairs of atoms within linker reach: 5 Number of observed-reactive pairs of atoms outside linker reach: 16 Missing links, according to the structure and observed-reactivity: 0 Number of observed links: 7 Number of observed links consistent with the structure: 5 Number of observed links NOT consistent with the structure: 2 Sum of scores of observed links: 0.0000000 Sensitivity of the cross-linking reaction: 1.0000000 False-assignment probability: 0.0190476 Likelihood of the experimental result: 0.9999953 Log-likelihood of the experimental result: -0.0000047 Likelihood using user-defined pbad and pgood: 0.9996471 Log-likelihood using user-defined pbad and pgood: -0.0003530 Using: pgood = 0.700; pbad = 0.010 -------------------------------------------------------------------------------Important here is to understand what is the "observed-reactivity": observed reactivity is defined by the experiment. If a residue is observed, experimentally, to participate in a crosslink or form a dead-end, it is a "observed-reactive" residue, meaning that one is sure that it is solvent exposed and available for reaction. Type-reactivity, on the other side, is simply the expected reactivity as a function of the chemical nature of the residue and of the linker.
The
sensitivity of the crosslinking reaction is defined,
here as the fraction of links that were observed relative to the links
that were expected to be formed from the reactivity and structure. In
the example above, 5 pairs of observed-reactive pairs of atoms were
found to be within linker reach, and all of these links were
experimentally observed, thus the sensitivity is 1.00.
The
false-assignment probability is the ratio between the
number of links experimentally observed and the total number of
type-reactive pairs of atoms. That is, the probability of any pair of
type-reactive atoms be reported as forming a link, incorrectly. In the
example above, 2/105=0.019.
The results above allow for the estimation of a likelihood for the model, which is reported. The likelihood computed using user-defined sensitivities and false-positive probabilities is also reported.
3. Overall crosslink statistics of the model
The final result of TopoLink is reported as follows:
------------------------------------------------------------------------------------ FINAL RESULTS: RESULT0: 12 : Number of observations that are consistent with the structure. RESULT1: 12 : Number of topological distances consistent with all observations. RESULT2: 16 : Number of topological distances NOT consistent with observations. RESULT3: 0 : Number of links with missing observations. RESULT4: 0.00000 : Sum of scores of observed links of all experiments. RESULT5: 0.99980 : Likelihood of the set of experimental results. RESULT6: -0.00020 : Log-likelihood of the set of experimental results. Using: pgood = 0.700; pbad = 0.010 RESULT7: 0.99965 : Likelihood of the set of experimental results. RESULT8: -0.00035 : Log-likelihood of the set of experimental results. ------------------------------------------------------------------------------------The results are reported with flags "
RESULT0/8" to
facilitate greping the data for multiple model evaluation. The results
are self explicative, but must be clarified in some points:
The number of observations that are consistent with the structure is not the same as the number of topological distances which are consistent with observations. For a topological distance be consistent with observations, it must be consistent with all observations at the same time. This is particularly important if different linkers that can react with the same pairs of atoms were used in different experiments.
A topological distance is said to be NOT consistent with the observations if it is not consistent with a single observation.

