Martínez Molecular Modeling Group

Institute of Chemistry and Center for Computing in Engineering and Science

University of Campinas

Main:

Home
People
Publications
Software
Contact

Material Didático

Home

Publications

Software

How to use TOPOLINK

The package is accompanied by an example input file, which is found in the topolink/input directory. It is mostly self-explicative. Possibly the best way to get started is to open the input file and look at it. An examle is available [here].

The executable must be run with:

topolink inputfile.inp > topolink.log

Alternatively, the PDB file of the structure can be provided on the command line as the second argument, overwriting the definition given in the input file:

topolink inputfile.inp model.pdb > topolink.log

Sections:
INPUT	OUTPUT

The output of TopoLink

The ouput file generated by TopoLink contains two sections which are of interest of the investigator: 1. The result of the search of each topological distance. 2. The statistics of the links for each experiment. 3. The overall statistics of the links in the model.
Before these sections, the output file contains several input file checking prints, which can be ignored. The package also outputs errors and stops when inconsistencies in the input file are found, such as: duplicate definition of observations, observations that are not consistent with the defined reactivity, and others.

1. Result for the topological distance of each link

At some point, the output file of TopoLink will contain specific link information, in the following form: ------------------------------------------------------------------------------------------------------------------------- RESIDUE1 ATOM1 RESIDUE2 ATOM2 EUCLDIST TOPODIST OBSERVED DMIN DMAX RESULT OBSRES REACRES RA AA ------------------------------------------------------------------------------------------------------------------------- LINK: MET A 1 N LYS A 17 CB 20.647 >16.900 YES 0.000 16.900 BAD: EUCL 0/ 1 0/ 1 YN YN LINK: MET A 1 N LYS A 113 CB 8.955 9.058 YES 0.000 16.900 OK: FOUND 1/ 1 1/ 1 YY YY LINK: GLU A 4 CB ASP A 22 CB 29.619 >15.600 NO >15.600 >15.600 OK: EUCL 0/ 1 0/ 1 YY YY ...
This is the result for the search of a topological distance for each link. the results are to be intepreted in the following manner: First, there is the definition of the atoms forming the possible link.

The 10th column contain the euclidean distance between these atoms.

The 11th column contain the topological distance, reported with the following details: if an upper bound is reported, for example >16.900, that means that TopoLink could not find any topological distance smaller than that value. That upper bound is the linker length of the largest linker that can bind the two atoms involved, according to link type definitions. Therefore, there is no topological distance shorter than the maximum linker length which could bind these atoms. The link should not be observed in this case, according to the model.

The 12th column reports if that particular link was observed in some experiment. In this case, the first link was observed.

The 13th and 14th columns contain the experimental maximum and minimum distances for that linker. That is, if the link was observed with a linker of 16.900 length, the distance between the atoms must be shorter than that, defining a range [0.,16.9]. If, additionally, the link was NOT observed for a shorter link of, lets say, 12.0 length, that defines an lower bound for the distance, and the range would be [12.0,16.9]. In the first link example, the link was observed using a link of length 16.9.

The 15th column contains the specification of the overall result. BAD specifiers indicate that the topological distance obtained is not consistent with the observations. OK indicates that it is consistent. The possible results are:

Code	Description
`OK: FOUND`	A valid topological distance was found, and it is consistent with all observations.
`OK: LONG`	A valid topological distance was found, is it longer than the linker length, and the link was NOT observed, thus the result is consistent with observations.
`OK: EUCL`	The euclidean distance is already too long, and the link was NOT observed, thus the result is consistent with observations.
`OK: NOTFOUND`	No valid topological distance was found, but the euclidean distance is shorter than the linker length. The link was also NOT observed, such that the result is consistent with observations.
`BAD: SHORT`	A topological distance was found, which is shorter than a linker length for which the link was NOT observed. Since it was NOT observed, that linker should be a lower bound to the distance, which is violated.
`BAD: LONG`	A topological distance was found, and the link was observed. However, the distance is too long for the linker length. The link, therefore, is not consistent with observations.
`BAD: EUCL`	The link was observed, but the euclidean distance in the model is already too long to be consistent with that observation.
`BAD: NOTFOUND`	The link was observed, the euclidean distance is fine, but no valid topological distance was found. Thus, the link is not consistent with the observation.
`BAD: MISSING`	The link was NOT observed, and the topological distance is such that it should have been.

The 16th column contains the ratio between consistent observations and expected observations, considering that reactive atoms are only those which were experimentally observed to react (at a crosslink or deadend). That is, 1/3 means that the link was observed in one experiment, but it should have been observed in three, according to the linkers used and observed reactivity.

The 17th column contains the ratio between consistent observations and expected observations, considering that reactive atoms are given by residue types. That is, if the linker is expected to bind two Lysine residues, all pairs of Lysine residues with consistent topological distances should be observed to crosslink. The ratio reported is the number of actual observations relative to the number of expected observations.

The 18th and 19th columns contain the information about solvent accessibilities of residues (RA - Residue Accessibility) and specific atoms (AA - Atom Accessibility). A "Y" indicates that a residue or atom is accessible to solvent, and a "N" indicate it is not accessible. A residue is considered accessible to solvent if ANY of its atoms is accessible to solvent. Each pair of "YY" or "YN", etc, refers to the first and second residue or atom, respectively. Therefore a "YN" in the "RA" column indicates that the first residue is accessible to solvent, but the second residue is not.

2. Statistics of the links for each experiment

For each experiment, a detailed account of the links is provided, in the following form:

-------------------------------------------------------------------------------

 Experiment:  DSS

   Number of type-reactive pairs of atoms:      105

   Number of observed-reactive pairs of atoms:       21
   Number of observed-reactive pairs of atoms within linker reach:        5
   Number of observed-reactive pairs of atoms outside linker reach:       16
   Missing links, according to the structure and observed-reactivity:        0

   Number of observed links:        7
   Number of observed links consistent with the structure:        5
   Number of observed links NOT consistent with the structure:        2

   Sum of scores of observed links:    0.0000000

   Sensitivity of the cross-linking reaction:    1.0000000
   False-assignment probability:    0.0190476

   Likelihood of the experimental result:    0.9999953
   Log-likelihood of the experimental result:   -0.0000047

   Likelihood using user-defined pbad and pgood:    0.9996471
   Log-likelihood using user-defined pbad and pgood:   -0.0003530
   Using: pgood =    0.700; pbad =    0.010

-------------------------------------------------------------------------------

Important here is to understand what is the "observed-reactivity": observed reactivity is defined by the experiment. If a residue is observed, experimentally, to participate in a crosslink or form a dead-end, it is a "observed-reactive" residue, meaning that one is sure that it is solvent exposed and available for reaction. Type-reactivity, on the other side, is simply the expected reactivity as a function of the chemical nature of the residue and of the linker.

The sensitivity of the crosslinking reaction is defined, here as the fraction of links that were observed relative to the links that were expected to be formed from the reactivity and structure. In the example above, 5 pairs of observed-reactive pairs of atoms were found to be within linker reach, and all of these links were experimentally observed, thus the sensitivity is 1.00.

The false-assignment probability is the ratio between the number of links experimentally observed and the total number of type-reactive pairs of atoms. That is, the probability of any pair of type-reactive atoms be reported as forming a link, incorrectly. In the example above, 2/105=0.019.

The results above allow for the estimation of a likelihood for the model, which is reported. The likelihood computed using user-defined sensitivities and false-positive probabilities is also reported.

3. Overall crosslink statistics of the model

The final result of TopoLink is reported as follows:

------------------------------------------------------------------------------------

 FINAL RESULTS:

  RESULT0:    12 : Number of observations that are consistent with the structure.

  RESULT1:    12 : Number of topological distances consistent with all observations.
  RESULT2:    16 : Number of topological distances NOT consistent with observations.
  RESULT3:     0 : Number of links with missing observations.

  RESULT4:      0.00000 : Sum of scores of observed links of all experiments.

  RESULT5:      0.99980 : Likelihood of the set of experimental results.
  RESULT6:     -0.00020 : Log-likelihood of the set of experimental results.

  Using: pgood =    0.700; pbad =    0.010
  RESULT7:      0.99965 : Likelihood of the set of experimental results.
  RESULT8:     -0.00035 : Log-likelihood of the set of experimental results.

------------------------------------------------------------------------------------

The results are reported with flags "RESULT0/8" to facilitate greping the data for multiple model evaluation. The results are self explicative, but must be clarified in some points:

The number of observations that are consistent with the structure is not the same as the number of topological distances which are consistent with observations. For a topological distance be consistent with observations, it must be consistent with all observations at the same time. This is particularly important if different linkers that can react with the same pairs of atoms were used in different experiments.

A topological distance is said to be NOT consistent with the observations if it is not consistent with a single observation.

Martínez Molecular Modeling Group

Institute of Chemistry and Center for Computing in Engineering and Science

University of Campinas

How to use TOPOLINK

Sections:

The output of TopoLink