Some auxiliary functions to quickly retrieve some data
Get the protein sequence
To obtain a list of the residue names of the protein with three- and one-letter codes, use
julia> using PDBTools
julia> getseq(PDBTools.SMALLPDB)
3-element Vector{String}:
"A"
"C"
"D"
Use getseq(atoms,code=2)
to get the sequence as three-letter residue codes, or code=3
to get full natural-aminoacid names, like "Alanine", "Proline", etc:
julia> using PDBTools
julia> getseq(PDBTools.SMALLPDB, code=2)
3-element Vector{String}:
"ALA"
"CYS"
"ASP"
julia> getseq(PDBTools.SMALLPDB, code=3)
3-element Vector{String}:
"Alanine"
"Cysteine"
"Aspartic acid"
PDBTools.getseq
— Functiongetseq(AbstractVector{<:Atom} or filename; selection, code)
Returns the sequence of aminoacids from the vector of atoms or file name. Selections may be applied. Code defines if the output will be a one-letter, three-letter or full-residue name array.
Example
julia> protein = wget("1LBD");
julia> getseq(protein,"residue < 3")
2-element Vector{String}:
"S"
"A"
julia> getseq(protein,"residue < 3", code=2)
2-element Vector{String}:
"SER"
"ALA"
julia> getseq(protein,"residue < 3",code=3)
2-element Vector{String}:
"Serine"
"Alanine"
PDBTools.Sequence
— TypeSequence
Wrapper for strings, or vectors of chars, strings, or residue names, to dispatch on functions that operate on amino acid sequences.
Example
julia> seq = ["Alanine", "Glutamic acid", "Glycine"];
julia> mass(Sequence(seq))
257.2432
julia> seq = "AEG";
julia> mass(Sequence(seq))
257.2432
If there is some non-standard protein residue in the sequence, inform the getseq
function by adding a selection:
julia> using PDBTools
julia> atoms = read_pdb(PDBTools.SMALLPDB);
julia> for at in atoms
if resname(at) == "ALA"
at.resname = "NEW"
end
end
julia> getseq(atoms, "protein or resname NEW"; code=2)
3-element Vector{String}:
"NEW"
"CYS"
"ASP"
By default the selection will only return the sequence of natural amino acids.
The getseq
function can of course be used on an Atom
list, accepts selections as the last argument, as well as the reading and writing functions:
julia> using PDBTools
julia> atoms = read_pdb(PDBTools.SMALLPDB);
julia> getseq(atoms, "residue > 1")
2-element Vector{String}:
"C"
"D"
Distance between sets of atoms
The distance between atoms, or sets of atoms, can be computed with the distance
function. This function returns the minimum distance between the atoms of the sets involved. For example:
julia> using PDBTools
julia> model = wget("1BSX");
julia> protein = select(model,"protein");
julia> ligand = select(model,"resname T3");
julia> distance(protein,ligand)
2.7775834820937417
PDBTools.distance
— Functiondistance(x,y)
Computes the minimum distance between two sets of atoms, between an atom and a set of atoms, or simply the distance between two atoms. The input may be a vector of Atom
s, or the coordinates that are output of the coor
function.
Examples
julia> model = wget("1BSX");
julia> protein = select(model,"protein");
julia> ligand = select(model,"resname T3");
julia> distance(protein,ligand)
2.7775834820937417
julia> distance(protein[1],ligand[3])
36.453551075306784
julia> distance(coor(ligand),protein)
2.7775834820937417
Closest atoms and their distance
A function similar to the one above is closest
, which returns the shortest distance between atoms but also the identity of the atom or pair of atoms that satisfy that shortest distance:
julia> using PDBTools
julia> model = wget("1BSX");
julia> protein = select(model,"protein");
julia> ligand = select(model,"resname T3");
julia> closest(ligand,protein)
(43, 3684, 2.7775834820937417)
julia> ligand[43]
4037 O1 T3 B 2 512 -22.568 81.625 3.159 1.00 36.59 1 - 4041
julia> protein[3684]
3684 NE2 HIS B 435 472 -21.539 82.145 5.686 1.00 44.44 1 - 3686
julia> distance(ligand[43],protein[3684])
2.7775834820937417
PDBTools.closest
— Functionclosest(x,y)
Computes the minimum distance between two sets of atoms and returns the indices of the atoms and their distance. Both vector of atoms or vectors of coordinates can be used as input.
Examples
julia> model = wget("1BSX");
julia> protein = select(model,"protein");
julia> ligand = select(model,"resname T3");
julia> closest(ligand,protein)
(43, 3684, 2.7775834820937417)
julia> ligand[43]
4037 O1 T3 B 2 512 -22.568 81.625 3.159 36.59 1.00 1 - 4041
julia> closest(ligand[43],protein)
(1, 3684, 2.7775834820937417)
julia> x = coor(protein)
3994-element Vector{SVector{3, Float64}}:
[52.884, 24.022, 35.587]
[52.916, 24.598, 36.993]
⋮
[-46.887, 86.925, 13.235]
[-47.164, 83.593, 15.25]
julia> closest(ligand,x)
(43, 3684, 2.7775834820937417)
Obtain arrays with coordinates
Use the coor
function:
julia> using PDBTools
julia> atoms = read_pdb(PDBTools.SMALLPDB);
julia> coor(atoms[1])
3-element StaticArraysCore.SVector{3, Float32} with indices SOneTo(3):
-9.229
-14.861
-5.481
julia> coor(atoms[1:2])
2-element Vector{StaticArraysCore.SVector{3, Float32}}:
[-9.229, -14.861, -5.481]
[-10.048, -15.427, -5.569]
The coor
function accepts selections:
C$\alpha$ coordinates:
julia> using PDBTools
julia> atoms = read_pdb(PDBTools.SMALLPDB);
julia> coor(atoms, "name CA")
3-element Vector{StaticArraysCore.SVector{3, Float64}}:
[-8.483, -14.912, -6.726]
[-5.113, -13.737, -5.466]
[-3.903, -11.262, -8.062]
The coordinates are output as arrays of static arrays (more specifically, as a Vector{SVector{3,Float64}}
, from StaticArrays
).
PDBTools.coor
— Functioncoor(atoms; selection)
Returns the coordinates of the atoms. The input may be one atom (type Atom
), a vector of atoms, or a Residue
. The coordinates are returned as a vector of static vectors (from StaticArrays
), more specifically as a Vector{SVector{3,Float64}}
.
Examples
julia> using PDBTools, StaticArrays
julia> protein = wget("1LBD");
julia> coor(protein[1])
3-element SVector{3, Float64} with indices SOneTo(3):
45.228
84.358
70.638
julia> coor(protein[1],as=SVector{3,Float32})
3-element SVector{3, Float32} with indices SOneTo(3):
45.228
84.358
70.638
julia> coor(protein, "index <= 2")
2-element Vector{SVector{3, Float64}}:
[45.228, 84.358, 70.638]
[46.08, 83.165, 70.327]
julia> coor(protein, only = at -> at.resname == "ALA")
110-element Vector{SVector{3, Float64}}:
[43.94, 81.982, 70.474]
[43.02, 80.825, 70.455]
[41.996, 80.878, 69.34]
⋮
[-17.866, 84.088, 51.741]
[-18.496, 83.942, 52.777]
[-15.888, 82.583, 51.706]
julia> residues = collect(eachresidue(protein));
julia> coor(residues[1])
6-element Vector{SVector{3, Float64}}:
[45.228, 84.358, 70.638]
[46.08, 83.165, 70.327]
[45.257, 81.872, 70.236]
[45.823, 80.796, 69.974]
[47.147, 82.98, 71.413]
[46.541, 82.639, 72.662]
Maximum and minimum coordinates of the atoms
Use maxmin(atoms)
, or maxmin(atoms,"resname CA")
, for example:
julia> using PDBTools
julia> atoms = read_pdb(PDBTools.SMALLPDB);
julia> maxmin(atoms, "residue > 1")
Minimum atom coordinates: xmin = [-6.974, -16.785, -10.863]
Maximum atom coordinates: xmax = [-1.94, -9.552, -3.844]
Length in each direction: xlength = [5.034000000000001, 7.2330000000000005, 7.019]
m
is a structure containing the three vectors with minimum and maximum coordinates, and lengths.
PDBTools.maxmin
— Functionmaxmin(atoms::AbstractVector{<:Atom}; selection)
Returns the maximum and minimum coordinates of an atom vector, and the length (maximum minus minimum) in each direction.
Example
julia> protein = wget("1LBD");
julia> maxmin(protein)
Minimum atom coordinates: xmin = [-29.301, 57.178, 45.668]
Maximum atom coordinates: xmax = [47.147, 99.383, 86.886]
Length in each direction: xlength = [76.448, 42.205, 41.217999999999996]
Residue tick labels for plots
The residue_ticks
function provides a practical way to define tick labels in plots associated to an amino-acid sequence:
residue_ticks(
atoms (or) residues (or) residue iterator;
first=nothing, last=nothing, stride=1, oneletter=true, serial=false,
)
The input structure can be provided as a vector of atoms (type Vector{<:Atom}
) a residue iterator (obtained by eachresidue(atoms)
) or a vector of residues (obtained by collect(eachresidue(atoms))
).
The function returns a tuple with residue numbers and residue names for the given atoms, to be used as tick labels in plots.
first
and last
optional keyword parameters are integers that refer to the residue numbers to be included. The stride
option can be used to skip residues and declutter the tick labels.
If oneletter
is false
, three-letter residue codes are returned. Residues with unknown names will be named X
or XXX
.
if serial=false
the positions of the ticks will be returned as a the serial residue index in the structure. If serial=true
the positions of the ticks are returned as their residue numbers. This difference is important if the residue numbers do not start at 1
and depending on the indexing of the data to be plotted.
PDBTools.residue_ticks
— Functionresidue_ticks(
atoms (or) residues (or) residue iterator;
first=nothing, last=nothing, stride=1, oneletter=true, serial=true
)
Returns a tuple with residue numbers and residue names for the given atoms, to be used as tick labels in plots.
The structure data can be provided a vector of Atom
s, a vector of Residue
s or an EachResidue
iterator.
first
and last
optional keyword parameters are integers that refer to the residue numbers to be included. The stride
option can be used to skip residues and declutter the tick labels.
If oneletter
is false
, three-letter residue codes are returned. Residues with unknown names will be named X
or XXX
.
If serial=true
the sequential residue index will be used as the index of the ticks. If instead serial=false
, the positions will be set to the residue numbers.
Examples
julia> using PDBTools
julia> atoms = wget("1LBD", "protein");
julia> residue_ticks(atoms; stride=50) # Vector{<:Atom} as input
(1:50:201, ["S225", "Q275", "L325", "L375", "L425"])
julia> residue_ticks(atoms; first=235, last=240, serial=false) # first=10 and resnum indexing
(Int32[235, 236, 237, 238, 239, 240], ["I235", "L236", "E237", "A238", "E239", "L240"])
julia> residue_ticks(eachresidue(atoms); stride=50) # residue iterator as input
(1:50:201, ["S225", "Q275", "L325", "L375", "L425"])
julia> residue_ticks(collect(eachresidue(atoms)); stride=50) # Vector{Residue} as input
(1:50:201, ["S225", "Q275", "L325", "L375", "L425"])
The resulting tuple of residue numbers and labels can be used as xticks
in Plots.plot
, for example.
PDBTools.oneletter
— Functiononeletter(residue::Union{AbstractString,Char})
Function to return a one-letter residue code from the three letter code or residue name. The function is case-insensitive.
Examples
julia> oneletter("ALA")
"A"
julia> oneletter("Glutamic acid")
"E"
PDBTools.threeletter
— Functionthreeletter(residue::String)
Function to return the three-letter natural-amino acid residue code from the one-letter code or residue name. The function is case-insensitive.
Examples
julia> threeletter("A")
"ALA"
julia> threeletter("Aspartic acid")
"ASP"
julia> threeletter("HSD")
"HIS"
Example
Here we illustrate how to plot the average temperature factor of each residue of a crystallographic model as function of the residues.
julia> using PDBTools, Plots
julia> atoms = wget("1UBQ", "protein");
julia> residue_ticks(atoms; stride=10) # example of output
([1, 11, 21, 31, 41, 51, 61, 71], ["M1", "K11", "D21", "Q31", "Q41", "E51", "I61", "L71"])
julia> plot(
resnum.(eachresidue(atoms)), # x-axis: residue numbers
[ mean(beta.(res)) for res in eachresidue(atoms) ], # y-axis: average b-factor per residue
xlabel="Residue",
xticks=residue_ticks(atoms; stride=10), # here we define the x-tick labels
ylabel="b-factor",
xrotation=60,
label=nothing, framestyle=:box,
)
Produces the following plot:
Alternatively (and sometimes conveniently), the residue ticks can be obtained by providing, instead of the atoms
array, the residue iterator or the residue vector, as:
julia> residue_ticks(eachresidue(atoms); stride=10)
([1, 11, 21, 31, 41, 51, 61, 71], ["M1", "K11", "D21", "Q31", "Q41", "E51", "I61", "L71"])
julia> residue_ticks(collect(eachresidue(atoms)); stride=10)
([1, 11, 21, 31, 41, 51, 61, 71], ["M1", "K11", "D21", "Q31", "Q41", "E51", "I61", "L71"])