Some auxiliary functions to quickly retrieve some data
Get the protein sequence
To obtain a list of the residue names of the protein with three- and one-letter codes, use
julia> using PDBTools
julia> pdb = read_pdb(PDBTools.SMALLPDB);
julia> getseq(pdb)
3-element Vector{String}:
 "A"
 "C"
 "D"Use getseq(atoms,code=2) to get the sequence as three-letter residue codes, or code=3 to get  full natural-aminoacid names, like "Alanine", "Proline", etc:
julia> using PDBTools
julia> pdb = read_pdb(PDBTools.SMALLPDB);
julia> getseq(pdb; code=2)
3-element Vector{String}:
 "ALA"
 "CYS"
 "ASP"
julia> getseq(pdb; code=3)
3-element Vector{String}:
 "Alanine"
 "Cysteine"
 "Aspartic acid"PDBTools.getseq — Functiongetseq(AbstractVector{<:Atom} or filename; selection, code)Returns the sequence of aminoacids from the vector of atoms or file name. Selections may be applied. Code defines if the output will be a one-letter, three-letter or full-residue name array.
Example
julia> using PDBTools
julia> protein = read_pdb(PDBTools.TESTPDB);
julia> getseq(protein, "residue < 3")
2-element Vector{String}:
 "A"
 "C"
julia> getseq(protein, "residue < 3"; code=2)
2-element Vector{String}:
 "ALA"
 "CYS"
julia> getseq(protein, "residue < 3"; code=3)
2-element Vector{String}:
 "Alanine"
 "Cysteine"
PDBTools.Sequence — TypeSequenceWrapper for strings, or vectors of chars, strings, or residue names, to dispatch on functions that operate on amino acid sequences.
Example
julia> seq = ["Alanine", "Glutamic acid", "Glycine"];
julia> mass(Sequence(seq))
257.2432
julia> seq = "AEG";
julia> mass(Sequence(seq))
257.2432If there is some non-standard protein residue in the sequence, inform the getseq function by adding a selection:
julia> using PDBTools
julia> atoms = read_pdb(PDBTools.SMALLPDB);
julia> for at in atoms
          if resname(at) == "ALA"
              at.resname = "NEW"
          end
       end
julia> getseq(atoms, "protein or resname NEW"; code=2)
3-element Vector{String}:
 "NEW"
 "CYS"
 "ASP"By default the selection will only return the sequence of natural amino acids.
The getseq function can of course be used on an Atom list, accepts selections as the last argument, as well as the reading and writing functions:
julia> using PDBTools
julia> atoms = read_pdb(PDBTools.SMALLPDB);
julia> getseq(atoms, "residue > 1")
2-element Vector{String}:
 "C"
 "D"Distance between sets of atoms
The distance between atoms, or sets of atoms, can be computed with the distance function. This function returns the minimum distance between the atoms of the sets involved. For example:
julia> using PDBTools
julia> model = wget("1BSX");
julia> protein = select(model,"protein");
julia> ligand = select(model,"resname T3");
julia> distance(protein,ligand)
2.7775834820937417PDBTools.distance — Functiondistance(x,y)Computes the minimum distance between two sets of atoms, between an atom and a set of atoms, or simply the distance between two atoms, or from the coordinates or sets of coordinates.
The input may be an Atom vector of Atoms, or a 3D vector, or a vector of 3D vector of coordinates, for examples as output by the coor function.
Examples
julia> model = wget("1BSX");
julia> protein = select(model,"protein");
julia> ligand = select(model,"resname T3");
julia> distance(protein,ligand)
2.7775834820937417
julia> distance(protein[1],ligand[3])
36.453551075306784
julia> distance(coor(ligand),protein)
2.7775834820937417
Closest atoms and their distance
A function similar to the one above is closest, which returns the shortest distance between atoms but also the identity of the atom or pair of atoms that satisfy that shortest distance:
julia> using PDBTools
julia> model = wget("1BSX");
julia> protein = select(model,"protein");
julia> ligand = select(model,"resname T3");
julia> closest(ligand,protein)
(43, 3684, 2.7775834820937417)
julia> ligand[43]
    4037   O1      T3     B        2      512  -22.568   81.625    3.159  1.00 36.59     1       -      4041
julia> protein[3684]
    3684  NE2     HIS     B      435      472  -21.539   82.145    5.686  1.00 44.44     1       -      3686
julia> distance(ligand[43],protein[3684])
2.7775834820937417PDBTools.closest — Functionclosest(x,y)Computes the minimum distance between two sets of atoms and returns the indices of the atoms and their distance. Both vector of atoms or vectors of coordinates can be used as input.
Examples
julia> model = wget("1BSX");
julia> protein = select(model,"protein");
julia> ligand = select(model,"resname T3");
julia> closest(ligand,protein)
(43, 3684, 2.7775834820937417)
julia> ligand[43]
    4037   O1      T3     B        2      512  -22.568   81.625    3.159 36.59  1.00     1       -      4041
julia> closest(ligand[43],protein)
(1, 3684, 2.7775834820937417)
julia> x = coor(protein)
3994-element Vector{SVector{3, Float64}}:
 [52.884, 24.022, 35.587]
 [52.916, 24.598, 36.993]
 ⋮
 [-46.887, 86.925, 13.235]
 [-47.164, 83.593, 15.25]
julia> closest(ligand,x)
(43, 3684, 2.7775834820937417)
Obtain arrays with coordinates
Use the coor function:
julia> using PDBTools
julia> atoms = read_pdb(PDBTools.SMALLPDB);
julia> coor(atoms[1])
3-element StaticArraysCore.SVector{3, Float32} with indices SOneTo(3):
  -9.229
 -14.861
  -5.481
julia> coor(atoms[1:2])
2-element Vector{StaticArraysCore.SVector{3, Float32}}:
 [-9.229, -14.861, -5.481]
 [-10.048, -15.427, -5.569]The coor function accepts selections:
C$\alpha$ coordinates:
julia> using PDBTools
julia> atoms = read_pdb(PDBTools.SMALLPDB);
julia> coor(atoms, "name CA")
3-element Vector{StaticArraysCore.SVector{3, Float64}}:
 [-8.483, -14.912, -6.726]
 [-5.113, -13.737, -5.466]
 [-3.903, -11.262, -8.062]The coordinates are output as arrays of static arrays (more specifically, as a Vector{SVector{3,Float64}}, from StaticArrays). 
PDBTools.coor — Functioncoor(atoms; selection)Returns the coordinates of the atoms. The input may be one atom (type Atom), a vector of atoms, or a Residue.  The coordinates are returned as a vector of static vectors (from StaticArrays), more specifically as a Vector{SVector{3,Float64}}.
Examples
julia> using PDBTools, StaticArrays 
julia> protein = wget("1LBD");
julia> coor(protein[1])
3-element SVector{3, Float64} with indices SOneTo(3):
 45.228
 84.358
 70.638
julia> coor(protein[1],as=SVector{3,Float32})
3-element SVector{3, Float32} with indices SOneTo(3):
 45.228
 84.358
 70.638
julia> coor(protein, "index <= 2")
2-element Vector{SVector{3, Float64}}:
 [45.228, 84.358, 70.638]
 [46.08, 83.165, 70.327]
julia> coor(protein, at -> at.resname == "ALA")
110-element Vector{SVector{3, Float64}}:
 [43.94, 81.982, 70.474]
 [43.02, 80.825, 70.455]
 [41.996, 80.878, 69.34]
 ⋮
 [-17.866, 84.088, 51.741]
 [-18.496, 83.942, 52.777]
 [-15.888, 82.583, 51.706]
  
julia> residues = collect(eachresidue(protein));
julia> coor(residues[1])
6-element Vector{SVector{3, Float64}}:
 [45.228, 84.358, 70.638]
 [46.08, 83.165, 70.327]
 [45.257, 81.872, 70.236]
 [45.823, 80.796, 69.974]
 [47.147, 82.98, 71.413]
 [46.541, 82.639, 72.662]
Read and convert unit cells
PDBTools.read_unitcell — Functionread_unitcell(file::AbstractString)Reads the lattice parameters of unitcell from a PDB (CRYST1 field) or mmCIF file, and converts it to a unitcell matrix with the lattice_to_matrix function. The unitcell matrix contains the lattice vectors as the columns of the matrix. 
Example
julia> using PDBTools
julia> m = read_unitcell(PDBTools.TESTPBC)
3×3 StaticArraysCore.SMatrix{3, 3, Float32, 9} with indices SOneTo(3)×SOneTo(3):
 85.0  -3.71547f-6  -3.71547f-6
  0.0  85.0         -3.71547f-6
  0.0   0.0         85.
julia> matrix_to_lattice(m)
(a = 85.0f0, b = 85.0f0, c = 85.0f0, α = 90.0f0, β = 90.0f0, γ = 90.0f0)PDBTools.lattice_to_matrix — Functionlattice_to_matrix(a, b, c, α, β, γ)Converts unit cell lattice parameters and convert them to a 3x3 unit cell matrix (orthogonalization matrix), where the lattice vectors are the columns of the matrix.
The resulting matrix has the lattice vectors as its columns, with vector 'a' aligned along the x-axis and vector 'b' in the xy-plane. This matrix can be used to transform fractional coordinates to Cartesian coordinates.
Arguments
- a::Real: Length of side a in Ångströms.
- b::Real: Length of side b in Ångströms.
- c::Real: Length of side c in Ångströms.
- α::Real: Angle alpha in degrees.
- β::Real: Angle beta in degrees.
- γ::Real: Angle gamma in degrees.
Returns
- A 3x3 static matrix representing the unit cell vectors.
PDBTools.matrix_to_lattice — Functionmatrix_to_lattice(M::AbstractMatrix)Converts a 3x3 unit cell matrix where the lattice vectors are the columns, back into the six lattice parameters (sides a, b, c and angles α, β, γ).
Arguments
- M::AbstractMatrix: A 3x3 matrix where columns are the lattice vectors.
Returns
- NamedTuple: A named tuple containing the six parameters:- (a, b, c, α, β, γ). Lengths are in the matrix's original units (typically Ångströms), and angles are in degrees.
Check the stereochemistry of protein residues
PDBTools.zeta — Functionzeta(r::Residue)Computes the Cα chirality (zeta "virtual" torsion angle - Cα-N-C-Cβ).
Returns the torsion angle or NaN if the residue is not recognized a protein residue or if its a Gly residue. Expected values are 33.9 ± 3.5 degrees (for one standard deviation). Also see the zeta_check function.
Example
julia> using PDBTools
julia> protein = select(read_pdb(PDBTools.TESTPDB), "protein");
julia> residues = collect(eachresidue(protein));
julia> zeta(residues[1])
33.67202f0PDBTools.zeta_check — Functionzeta_check(r::Residue; nsigma=2)Checks if the Cα chirality falls into expected ranges. See the zeta function for further information. The expected mean is 33.9 degrees with a standard deviation of 3.5 degrees. By default, nsigma=2, implying that the function returns true if the torsion falls  within two standard deviations from the mean.
See: https://www.ebi.ac.uk/thornton-srv/software/PROCHECK/manual/manappa.html
Example
julia> using PDBTools
julia> protein = select(read_pdb(PDBTools.TESTPDB), "protein");
julia> residues = collect(eachresidue(protein));
julia> zeta_check(residues[1])
trueMaximum and minimum coordinates of the atoms
Use maxmin(atoms), or maxmin(atoms,"resname CA"), for example:
julia> using PDBTools
julia> atoms = read_pdb(PDBTools.SMALLPDB);
julia> maxmin(atoms, "residue > 1")
 Minimum atom coordinates: xmin = [-6.974, -16.785, -10.863]
 Maximum atom coordinates: xmax = [-1.94, -9.552, -3.844]
 Length in each direction: xlength = [5.034000000000001, 7.2330000000000005, 7.019]m is a structure containing the three vectors with minimum and maximum coordinates, and lengths.
PDBTools.maxmin — Functionmaxmin(atoms::AbstractVector{<:Atom}; selection)Returns the maximum and minimum coordinates of an atom vector, and the length (maximum minus minimum) in each direction.
Example
julia> protein = wget("1LBD");
julia> maxmin(protein)
 
 Minimum atom coordinates: xmin = [-29.301, 57.178, 45.668]
 Maximum atom coordinates: xmax = [47.147, 99.383, 86.886]
 Length in each direction: xlength = [76.448, 42.205, 41.217999999999996]
Residue tick labels for plots
The residue_ticks function provides a practical way to define tick labels in plots associated to an amino-acid sequence:
residue_ticks(
    atoms (or) residues (or) residue iterator; 
    first=nothing, last=nothing, stride=1, oneletter=true, serial=false,
)The input structure can be provided as a vector of atoms (type Vector{<:Atom}) a residue iterator (obtained by eachresidue(atoms)) or a vector of residues (obtained by collect(eachresidue(atoms))). 
The function returns a tuple with residue numbers and residue names for the given atoms, to be used as tick labels in plots.
first and last optional keyword parameters are integers that refer to the residue numbers to be included.  The stride option can be used to skip residues and declutter the tick labels.
If oneletter is false, three-letter residue codes are returned. Residues with unknown names will be  named X or XXX. 
if serial=false the positions of the ticks will be returned as a the serial residue index in the structure. If serial=true the positions of the ticks are returned as their residue numbers. This difference is important if the residue numbers do not start at 1 and depending on the indexing of the data to be plotted.  
PDBTools.residue_ticks — Functionresidue_ticks(
    atoms (or) residues (or) residue iterator; 
    first=nothing, last=nothing, stride=1, oneletter=true, serial=false
)Returns a tuple with residue numbers and residue names for the given atoms, to be used as tick labels in plots.
The structure data can be provided a vector of Atoms, a vector of Residues or an eachresidue iterator. 
first and last optional keyword parameters are integers that refer to the residue numbers to be included.  The stride option can be used to skip residues and declutter the tick labels.
If oneletter is false, three-letter residue codes are returned. Residues with unknown names will be  named X or XXX. 
If serial=true the sequential residue index will be used as the index of the ticks. If instead serial=false, the positions will be set to the residue numbers.
Examples
julia> using PDBTools
julia> atoms = wget("1LBD", "protein");
julia> residue_ticks(atoms; stride=50) # Vector{<:Atom} as input
(Int32[225, 275, 325, 375, 425], ["S225", "Q275", "L325", "L375", "L425"])
julia> residue_ticks(atoms; first=235, last=240) # first=10
(Int32[235, 236, 237, 238, 239, 240], ["I235", "L236", "E237", "A238", "E239", "L240"])
julia> residue_ticks(eachresidue(atoms); stride=50) # residue iterator as input
(Int32[225, 275, 325, 375, 425], ["S225", "Q275", "L325", "L375", "L425"])
julia> residue_ticks(collect(eachresidue(atoms)); stride=50) # Vector{Residue} as input
(Int32[225, 275, 325, 375, 425], ["S225", "Q275", "L325", "L375", "L425"])
julia> residue_ticks(atoms; first=10, stride=50, serial=true) # using serial=true
(10:50:210, ["R234", "K284", "R334", "S384", "E434"])
The resulting tuple of residue numbers and labels can be used as xticks in Plots.plot, for example.
PDBTools.oneletter — Functiononeletter(residue::Union{AbstractString,Char})Function to return a one-letter residue code from the three letter code or residue name. The function is case-insensitive.
Examples
julia> oneletter("ALA")
"A"
julia> oneletter("Glutamic acid")
"E"
PDBTools.threeletter — Functionthreeletter(residue::String)Function to return the three-letter natural-amino acid residue code from the one-letter code or residue name. The function is case-insensitive.
Examples
julia> threeletter("A")
"ALA"
julia> threeletter("Aspartic acid")
"ASP"
julia> threeletter("HSD")
"HIS"Example
Here we illustrate how to plot the average temperature factor of each residue of a crystallographic model as function of the residues.
julia> using PDBTools, Plots
julia> atoms = wget("1UBQ", "protein");
julia> residue_ticks(atoms; stride=10) # example of output
([1, 11, 21, 31, 41, 51, 61, 71], ["M1", "K11", "D21", "Q31", "Q41", "E51", "I61", "L71"])
julia> plot(
           resnum.(eachresidue(atoms)), # x-axis: residue numbers
           [ mean(beta.(res)) for res in eachresidue(atoms) ], # y-axis: average b-factor per residue
           xlabel="Residue", 
           xticks=residue_ticks(atoms; stride=10), # here we define the x-tick labels
           ylabel="b-factor", 
           xrotation=60,
           label=nothing, framestyle=:box,
      )Produces the following plot:

Alternatively (and sometimes conveniently), the residue ticks can be obtained by providing,  instead of the atoms array, the residue iterator or the residue vector, as:
julia> residue_ticks(eachresidue(atoms); stride=10)
([1, 11, 21, 31, 41, 51, 61, 71], ["M1", "K11", "D21", "Q31", "Q41", "E51", "I61", "L71"])
julia> residue_ticks(collect(eachresidue(atoms)); stride=10)
([1, 11, 21, 31, 41, 51, 61, 71], ["M1", "K11", "D21", "Q31", "Q41", "E51", "I61", "L71"])