Some auxiliary functions to quickly retrieve some data

Get the protein sequence

To obtain a list of the residue names of the protein with three- and one-letter codes, use

julia> using PDBTools

julia> pdb = read_pdb(PDBTools.SMALLPDB);

julia> getseq(pdb)
3-element Vector{String}:
 "A"
 "C"
 "D"

Use getseq(atoms,code=2) to get the sequence as three-letter residue codes, or code=3 to get full natural-aminoacid names, like "Alanine", "Proline", etc:

julia> using PDBTools

julia> pdb = read_pdb(PDBTools.SMALLPDB);

julia> getseq(pdb; code=2)
3-element Vector{String}:
 "ALA"
 "CYS"
 "ASP"

julia> getseq(pdb; code=3)
3-element Vector{String}:
 "Alanine"
 "Cysteine"
 "Aspartic acid"

PDBTools.getseq — Function

getseq(AbstractVector{<:Atom} or filename; selection, code)

Returns the sequence of aminoacids from the vector of atoms or file name. Selections may be applied. Code defines if the output will be a one-letter, three-letter or full-residue name array.

Example

julia> using PDBTools

julia> protein = read_pdb(PDBTools.TESTPDB);

julia> getseq(protein, "residue < 3")
2-element Vector{String}:
 "A"
 "C"

julia> getseq(protein, "residue < 3"; code=2)
2-element Vector{String}:
 "ALA"
 "CYS"

julia> getseq(protein, "residue < 3"; code=3)
2-element Vector{String}:
 "Alanine"
 "Cysteine"

source

PDBTools.Sequence — Type

Sequence

Wrapper for strings, or vectors of chars, strings, or residue names, to dispatch on functions that operate on amino acid sequences.

Example

julia> seq = ["Alanine", "Glutamic acid", "Glycine"];

julia> mass(Sequence(seq))
257.2432

julia> seq = "AEG";

julia> mass(Sequence(seq))
257.2432

source

Note

If there is some non-standard protein residue in the sequence, inform the getseq function by adding a selection:

julia> using PDBTools

julia> atoms = read_pdb(PDBTools.SMALLPDB);

julia> for at in atoms
          if resname(at) == "ALA"
              at.resname = "NEW"
          end
       end

julia> getseq(atoms, "protein or resname NEW"; code=2)
3-element Vector{String}:
 "NEW"
 "CYS"
 "ASP"

By default the selection will only return the sequence of natural amino acids.

The getseq function can of course be used on an Atom list, accepts selections as the last argument, as well as the reading and writing functions:

julia> using PDBTools

julia> atoms = read_pdb(PDBTools.SMALLPDB);

julia> getseq(atoms, "residue > 1")
2-element Vector{String}:
 "C"
 "D"

Distance between sets of atoms

The distance between atoms, or sets of atoms, can be computed with the distance function. This function returns the minimum distance between the atoms of the sets involved. For example:

julia> using PDBTools

julia> model = wget("1BSX");

julia> protein = select(model,"protein");

julia> ligand = select(model,"resname T3");

julia> distance(protein,ligand)
2.7775834820937417

PDBTools.distance — Function

distance(x,y)

Computes the minimum distance between two sets of atoms, between an atom and a set of atoms, or simply the distance between two atoms, or from the coordinates or sets of coordinates.

The input may be an Atom vector of Atoms, or a 3D vector, or a vector of 3D vector of coordinates, for examples as output by the coor function.

Examples

julia> model = wget("1BSX");

julia> protein = select(model,"protein");

julia> ligand = select(model,"resname T3");

julia> distance(protein,ligand)
2.7775834820937417

julia> distance(protein[1],ligand[3])
36.453551075306784

julia> distance(coor(ligand),protein)
2.7775834820937417

source

Closest atoms and their distance

A function similar to the one above is closest, which returns the shortest distance between atoms but also the identity of the atom or pair of atoms that satisfy that shortest distance:

julia> using PDBTools

julia> model = wget("1BSX");

julia> protein = select(model,"protein");

julia> ligand = select(model,"resname T3");

julia> closest(ligand,protein)
(43, 3684, 2.7775834820937417)

julia> ligand[43]
    4037   O1      T3     B        2      512  -22.568   81.625    3.159  1.00 36.59     1       -      4041

julia> protein[3684]
    3684  NE2     HIS     B      435      472  -21.539   82.145    5.686  1.00 44.44     1       -      3686

julia> distance(ligand[43],protein[3684])
2.7775834820937417

PDBTools.closest — Function

closest(x,y)

Computes the minimum distance between two sets of atoms and returns the indices of the atoms and their distance. Both vector of atoms or vectors of coordinates can be used as input.

Examples

julia> model = wget("1BSX");

julia> protein = select(model,"protein");

julia> ligand = select(model,"resname T3");

julia> closest(ligand,protein)
(43, 3684, 2.7775834820937417)

julia> ligand[43]
    4037   O1      T3     B        2      512  -22.568   81.625    3.159 36.59  1.00     1       -      4041

julia> closest(ligand[43],protein)
(1, 3684, 2.7775834820937417)

julia> x = coor(protein)
3994-element Vector{SVector{3, Float64}}:
 [52.884, 24.022, 35.587]
 [52.916, 24.598, 36.993]
 ⋮
 [-46.887, 86.925, 13.235]
 [-47.164, 83.593, 15.25]

julia> closest(ligand,x)
(43, 3684, 2.7775834820937417)

source

Obtain arrays with coordinates

Use the coor function:

julia> using PDBTools

julia> atoms = read_pdb(PDBTools.SMALLPDB);

julia> coor(atoms[1])
3-element StaticArraysCore.SVector{3, Float32} with indices SOneTo(3):
  -9.229
 -14.861
  -5.481

julia> coor(atoms[1:2])
2-element Vector{StaticArraysCore.SVector{3, Float32}}:
 [-9.229, -14.861, -5.481]
 [-10.048, -15.427, -5.569]

The coor function accepts selections:

C$\alpha$ coordinates:

julia> using PDBTools

julia> atoms = read_pdb(PDBTools.SMALLPDB);

julia> coor(atoms, "name CA")
3-element Vector{StaticArraysCore.SVector{3, Float64}}:
 [-8.483, -14.912, -6.726]
 [-5.113, -13.737, -5.466]
 [-3.903, -11.262, -8.062]

The coordinates are output as arrays of static arrays (more specifically, as a Vector{SVector{3,Float64}}, from StaticArrays).

PDBTools.coor — Function

coor(atoms; selection)

Returns the coordinates of the atoms. The input may be one atom (type Atom), a vector of atoms, or a Residue. The coordinates are returned as a vector of static vectors (from StaticArrays), more specifically as a Vector{SVector{3,Float64}}.

Examples

julia> using PDBTools, StaticArrays 

julia> protein = wget("1LBD");

julia> coor(protein[1])
3-element SVector{3, Float64} with indices SOneTo(3):
 45.228
 84.358
 70.638

julia> coor(protein[1],as=SVector{3,Float32})
3-element SVector{3, Float32} with indices SOneTo(3):
 45.228
 84.358
 70.638

julia> coor(protein, "index <= 2")
2-element Vector{SVector{3, Float64}}:
 [45.228, 84.358, 70.638]
 [46.08, 83.165, 70.327]

julia> coor(protein, at -> at.resname == "ALA")
110-element Vector{SVector{3, Float64}}:
 [43.94, 81.982, 70.474]
 [43.02, 80.825, 70.455]
 [41.996, 80.878, 69.34]
 ⋮
 [-17.866, 84.088, 51.741]
 [-18.496, 83.942, 52.777]
 [-15.888, 82.583, 51.706]
  
julia> residues = collect(eachresidue(protein));

julia> coor(residues[1])
6-element Vector{SVector{3, Float64}}:
 [45.228, 84.358, 70.638]
 [46.08, 83.165, 70.327]
 [45.257, 81.872, 70.236]
 [45.823, 80.796, 69.974]
 [47.147, 82.98, 71.413]
 [46.541, 82.639, 72.662]

source

Maximum and minimum coordinates of the atoms

Use maxmin(atoms), or maxmin(atoms,"resname CA"), for example:

julia> using PDBTools

julia> atoms = read_pdb(PDBTools.SMALLPDB);

julia> maxmin(atoms, "residue > 1")
 Minimum atom coordinates: xmin = [-6.974, -16.785, -10.863]
 Maximum atom coordinates: xmax = [-1.94, -9.552, -3.844]
 Length in each direction: xlength = [5.034000000000001, 7.2330000000000005, 7.019]

m is a structure containing the three vectors with minimum and maximum coordinates, and lengths.

PDBTools.maxmin — Function

maxmin(atoms::AbstractVector{<:Atom}; selection)

Returns the maximum and minimum coordinates of an atom vector, and the length (maximum minus minimum) in each direction.

Example

julia> protein = wget("1LBD");

julia> maxmin(protein)
 
 Minimum atom coordinates: xmin = [-29.301, 57.178, 45.668]
 Maximum atom coordinates: xmax = [47.147, 99.383, 86.886]
 Length in each direction: xlength = [76.448, 42.205, 41.217999999999996]

source

Residue tick labels for plots

The residue_ticks function provides a practical way to define tick labels in plots associated to an amino-acid sequence:

residue_ticks(
    atoms (or) residues (or) residue iterator; 
    first=nothing, last=nothing, stride=1, oneletter=true, serial=false,
)

The input structure can be provided as a vector of atoms (type Vector{<:Atom}) a residue iterator (obtained by eachresidue(atoms)) or a vector of residues (obtained by collect(eachresidue(atoms))).

The function returns a tuple with residue numbers and residue names for the given atoms, to be used as tick labels in plots.

first and last optional keyword parameters are integers that refer to the residue numbers to be included. The stride option can be used to skip residues and declutter the tick labels.

If oneletter is false, three-letter residue codes are returned. Residues with unknown names will be named X or XXX.

if serial=false the positions of the ticks will be returned as a the serial residue index in the structure. If serial=true the positions of the ticks are returned as their residue numbers. This difference is important if the residue numbers do not start at 1 and depending on the indexing of the data to be plotted.

PDBTools.residue_ticks — Function

residue_ticks(
    atoms (or) residues (or) residue iterator; 
    first=nothing, last=nothing, stride=1, oneletter=true, serial=false
)

Returns a tuple with residue numbers and residue names for the given atoms, to be used as tick labels in plots.

The structure data can be provided a vector of Atoms, a vector of Residues or an eachresidue iterator.

first and last optional keyword parameters are integers that refer to the residue numbers to be included. The stride option can be used to skip residues and declutter the tick labels.

If oneletter is false, three-letter residue codes are returned. Residues with unknown names will be named X or XXX.

If serial=true the sequential residue index will be used as the index of the ticks. If instead serial=false, the positions will be set to the residue numbers.

Examples

julia> using PDBTools

julia> atoms = wget("1LBD", "protein");

julia> residue_ticks(atoms; stride=50) # Vector{<:Atom} as input
(Int32[225, 275, 325, 375, 425], ["S225", "Q275", "L325", "L375", "L425"])

julia> residue_ticks(atoms; first=235, last=240) # first=10
(Int32[235, 236, 237, 238, 239, 240], ["I235", "L236", "E237", "A238", "E239", "L240"])

julia> residue_ticks(eachresidue(atoms); stride=50) # residue iterator as input
(Int32[225, 275, 325, 375, 425], ["S225", "Q275", "L325", "L375", "L425"])

julia> residue_ticks(collect(eachresidue(atoms)); stride=50) # Vector{Residue} as input
(Int32[225, 275, 325, 375, 425], ["S225", "Q275", "L325", "L375", "L425"])

julia> residue_ticks(atoms; first=10, stride=50, serial=true) # using serial=true
(10:50:210, ["R234", "K284", "R334", "S384", "E434"])

The resulting tuple of residue numbers and labels can be used as xticks in Plots.plot, for example.

source

PDBTools.oneletter — Function

oneletter(residue::Union{AbstractString,Char})

Function to return a one-letter residue code from the three letter code or residue name. The function is case-insensitive.

Examples

julia> oneletter("ALA")
"A"

julia> oneletter("Glutamic acid")
"E"

source

PDBTools.threeletter — Function

threeletter(residue::String)

Function to return the three-letter natural-amino acid residue code from the one-letter code or residue name. The function is case-insensitive.

Examples

julia> threeletter("A")
"ALA"

julia> threeletter("Aspartic acid")
"ASP"

julia> threeletter("HSD")
"HIS"

source

Example

Here we illustrate how to plot the average temperature factor of each residue of a crystallographic model as function of the residues.

julia> using PDBTools, Plots

julia> atoms = wget("1UBQ", "protein");

julia> residue_ticks(atoms; stride=10) # example of output
([1, 11, 21, 31, 41, 51, 61, 71], ["M1", "K11", "D21", "Q31", "Q41", "E51", "I61", "L71"])

julia> plot(
           resnum.(eachresidue(atoms)), # x-axis: residue numbers
           [ mean(beta.(res)) for res in eachresidue(atoms) ], # y-axis: average b-factor per residue
           xlabel="Residue", 
           xticks=residue_ticks(atoms; stride=10), # here we define the x-tick labels
           ylabel="b-factor", 
           xrotation=60,
           label=nothing, framestyle=:box,
      )

Produces the following plot:

./assets/residue_ticks.png

Alternatively (and sometimes conveniently), the residue ticks can be obtained by providing, instead of the atoms array, the residue iterator or the residue vector, as:

julia> residue_ticks(eachresidue(atoms); stride=10)
([1, 11, 21, 31, 41, 51, 61, 71], ["M1", "K11", "D21", "Q31", "Q41", "E51", "I61", "L71"])

julia> residue_ticks(collect(eachresidue(atoms)); stride=10)
([1, 11, 21, 31, 41, 51, 61, 71], ["M1", "K11", "D21", "Q31", "Q41", "E51", "I61", "L71"])