Iterators
PDBTools.jl provides lazy iterators over Residues, Chains, Segments, and Models of a structure file. The iterators behave similarly, and can be used bo computed properties of independent structural elements. The documentation bellow exemplifies in more detail the features associated to Residue and Chain interators, but the properties and valid for Segment and Model iterators similarly.
Iterate over residues (or molecules)
The eachresidue
iterator enables iteration over the residues of a structure. In PDB files, distinct molecules are often treated as separate residues, so this iterator can be used to iterate over the molecules within a structure. For example:
julia> using PDBTools
julia> protein = read_pdb(PDBTools.SMALLPDB);
julia> count(atom -> resname(atom) == "ALA", protein)
12
julia> count(res -> resname(res) == "ALA", eachresidue(protein))
1
Here, the first count
counts the number of atoms with the residue name "ALA", while the second uses eachresidue
to count the number of residues named "ALA". This highlights the distinction between residue-level and atom-level operations.
Collecting Residues into a Vector
Residues produced by eachresidue
can be collected into a vector for further processing:
julia> using PDBTools
julia> protein = read_pdb(PDBTools.SMALLPDB);
julia> residues = collect(eachresidue(protein))
3-element Vector{Residue}[
ALA1A
CYS2A
ASP3A
]
julia> residues[1]
Residue of name ALA with 12 atoms.
index name resname chain resnum residue x y z occup beta model segname index_pdb
1 N ALA A 1 1 -9.229 -14.861 -5.481 0.00 0.00 1 PROT 1
2 1HT1 ALA A 1 1 -10.048 -15.427 -5.569 0.00 0.00 1 PROT 2
⋮
11 C ALA A 1 1 -7.227 -14.047 -6.599 1.00 0.00 1 PROT 11
12 O ALA A 1 1 -7.083 -13.048 -7.303 1.00 0.00 1 PROT 12
Iterators or collected vectors do not create copies of the original atom data. This means that any changes made to the residue vector will directly modify the corresponding data in the original atom vector.
Iterating Over Atoms Within Residues
You can iterate over the atoms of one or more residues using nested loops. Here, we compute the total number of atoms of ALA residues:
julia> using PDBTools
julia> protein = read_pdb(PDBTools.SMALLPDB);
julia> n_ala_cys = 0
for residue in eachresidue(protein)
if name(residue) in ("ALA", "CYS")
for atom in residue
n_ala_cys += 1
end
end
end
n_ala_cys
23
This method produces the same result as the more concise approach:
julia> using PDBTools
julia> protein = read_pdb(PDBTools.SMALLPDB);
julia> sum(length(r) for r in eachresidue(protein) if name(r) in ("ALA", "CYS"))
23
Reference documentation
PDBTools.Residue
— TypeResidue
Residue data structure.
The Residue structure carries the properties of the residue or molecule of the atoms it contains, but it does not copy the original vector of atoms, only the residue meta data for each residue. Thus, changes in the residue atoms will be reflected in the original vector of atoms.
Example
julia> using PDBTools
julia> pdb = wget("1LBD");
julia> residues = collect(eachresidue(pdb))
238-element Vector{Residue}[
SER225A
ALA226A
⋮
MET461A
THR462A
]
julia> resnum.(residues[1:3])
3-element Vector{Int32}:
225
226
227
julia> residues[5].chain
"A"
julia> residues[8].range
52:58
julia> mass(residues[1])
82.0385
PDBTools.eachresidue
— Functioneachresidue(atoms::AbstractVector{<:Atom})
Iterator for the residues (or molecules) of a selection.
Example
julia> using PDBTools
julia> atoms = wget("1LBD");
julia> eachresidue(atoms)
Residue iterator with length = 238
julia> collect(eachresidue(atoms))
238-element Vector{Residue}[
SER225A
ALA226A
⋮
MET461A
THR462A
]
PDBTools.resname
— Functionresname(residue::Union{AbstractString,Char})
Returns the residue name, given the one-letter code or residue name. Differently from threeletter
, this function will return the force-field name if available in the list of protein residues.
Examples
julia> resname("ALA")
"ALA"
julia> resname("GLUP")
"GLUP"
PDBTools.residuename
— Functionresiduename(residue::Union{AbstractString,Char})
Function to return the long residue name from other residue codes. The function is case-insensitive.
Examples
julia> residuename("A")
"Alanine"
julia> residuename("Glu")
"Glutamic Acid"
Iterate over chains
The eachchain
iterator in PDBTools enables users to iterate over the chains in a PDB structure. A PDB file may contain multiple protein chains. This iterator simplifies operations involving individual chains.
julia> using PDBTools
julia> ats = read_pdb(PDBTools.CHAINSPDB);
julia> chain.(eachchain(ats)) # Retrieve the names of all chains in the structure
4-element Vector{InlineStrings.String3}:
"A"
"B"
"A"
"D"
julia> model.(eachchain(ats)) # Retrieve the model numbers associated with each chain
4-element Vector{Int32}:
1
1
1
2
julia> chain_A1 = first(eachchain(ats)); # Access the first chain in the iterator
julia> resname.(eachresidue(chain_A1)) # Retrieve residue names for chain A in model 1
3-element Vector{InlineStrings.String7}:
"ASP"
"GLN"
"LEU"
In the example above, the chain.
command retrieves the names of all chains in the structure, while model.
command lists the model numbers for each chain. This PDB structure contains two models for chain A, where the third residue changes from leucine (LEU) in model 1 to valine (VAL) in model 2.
Accessing Chains by Index
As seen in the previous example, The first
and last
commands allow quick access to the first an last elements in the iterator. For more specific indexing, you can collect all chains into an array and then use numerical indices to access them.
julia> using PDBTools
julia> ats = read_pdb(PDBTools.CHAINSPDB);
julia> chains = collect(eachchain(ats))
4-element Vector{Chain}[
Chain(A-48 atoms)
Chain(B-48 atoms)
Chain(A-48 atoms)
Chain(D-45 atoms)
]
julia> chain_B = chains[2]
Chain B with 48 atoms.
index name resname chain resnum residue x y z occup beta model segname index_pdb
49 N ASP B 4 4 135.661 123.866 -22.311 1.00 0.00 1 ASYN 49
50 CA ASP B 4 4 136.539 123.410 -21.227 1.00 0.00 1 ASYN 50
⋮
95 HD23 LEU B 6 6 138.780 120.216 -17.864 1.00 0.00 1 ASYN 95
96 O LEU B 6 6 141.411 117.975 -21.923 1.00 0.00 1 ASYN 96
Modifying Atom Properties in a Chain
Any changes made to the atoms of a chain variable directly overwrite the properties of the original atoms in the structure. For example, modifying the occupancy and beta-factor columns of atoms in model 2 of chain A will update the corresponding properties in the original structure.
In the example below, the occup
and beta
properties of all atoms in model 2 of chain A are set to 0.00. The changes are reflected in the original ats
vector, demonstrating that the modifications propagate to the parent data structure.
julia> using PDBTools
julia> ats = read_pdb(PDBTools.CHAINSPDB);
julia> first(eachchain(ats))
Chain A with 48 atoms.
index name resname chain resnum residue x y z occup beta model segname index_pdb
1 N ASP A 1 1 133.978 119.386 -23.646 1.00 0.00 1 ASYN 1
2 CA ASP A 1 1 134.755 118.916 -22.497 1.00 0.00 1 ASYN 2
⋮
47 HD23 LEU A 3 3 130.568 111.868 -26.242 1.00 0.00 1 ASYN 47
48 O LEU A 3 3 132.066 112.711 -21.739 1.00 0.00 1 ASYN 48
julia> for chain in eachchain(ats)
if name(chain) == "A" && model(chain) == 2
for atom in chain
atom.occup = 0.00
atom.beta = 0.00
end
end
end
julia> first(eachchain(ats))
Chain A with 48 atoms.
index name resname chain resnum residue x y z occup beta model segname index_pdb
1 N ASP A 1 1 133.978 119.386 -23.646 1.00 0.00 1 ASYN 1
2 CA ASP A 1 1 134.755 118.916 -22.497 1.00 0.00 1 ASYN 2
⋮
47 HD23 LEU A 3 3 130.568 111.868 -26.242 1.00 0.00 1 ASYN 47
48 O LEU A 3 3 132.066 112.711 -21.739 1.00 0.00 1 ASYN 48
This behavior ensures efficient data manipulation but requires careful handling to avoid unintended changes.
Reference documentation
PDBTools.Chain
— TypeChain
Creates a Chain data structure. Chains must be consecutive in the atoms
vector, and are identified by having the same chain
, segment
, and model
fields.
The Chain structure carries the properties of the atoms it contains, but it does not copy the original vector of atoms. This means that any changes made in the Chain structure atoms, will overwrite the original vector of atoms.
Examples
julia> using PDBTools
julia> ats = read_pdb(PDBTools.CHAINSPDB);
julia> chains = collect(eachchain(ats))
4-element Vector{Chain}[
Chain(A-48 atoms)
Chain(B-48 atoms)
Chain(A-48 atoms)
Chain(D-45 atoms)
]
julia> chains[1]
Chain A with 48 atoms.
index name resname chain resnum residue x y z occup beta model segname index_pdb
1 N ASP A 1 1 133.978 119.386 -23.646 1.00 0.00 1 ASYN 1
2 CA ASP A 1 1 134.755 118.916 -22.497 1.00 0.00 1 ASYN 2
⋮
47 HD23 LEU A 3 3 130.568 111.868 -26.242 1.00 0.00 1 ASYN 47
48 O LEU A 3 3 132.066 112.711 -21.739 1.00 0.00 1 ASYN 48
julia> mass(chains[1])
353.37881000000016
julia> model(chains[4])
2
julia> segname(chains[2])
"ASYN"
PDBTools.eachchain
— Functioneachchain(atoms::AbstractVector{<:Atom})
Iterator for the chains of a selection.
Example
julia> using PDBTools
julia> ats = read_pdb(PDBTools.CHAINSPDB);
julia> eachchain(ats)
Chain iterator with length = 4
julia> chains = collect(eachchain(ats))
4-element Vector{Chain}[
Chain(A-48 atoms)
Chain(B-48 atoms)
Chain(A-48 atoms)
Chain(D-45 atoms)
]
Iterate over segments
The eachsegment
iterator enables iteration over the segments of a structure. For example:
julia> using PDBTools
julia> ats = read_pdb(PDBTools.DIMERPDB);
julia> eachsegment(ats)
Segment iterator with length = 2
julia> name.(eachsegment(ats))
2-element Vector{InlineStrings.String7}:
"A"
"B"
The result of the iterator can also be collected, with:
julia> using PDBTools
julia> ats = read_pdb(PDBTools.DIMERPDB);
julia> s = collect(eachsegment(ats))
2-element Vector{Segment}[
A-(1905 atoms))
B-(92 atoms))
]
julia> s[1]
Segment of name A with 1905 atoms.
index name resname chain resnum residue x y z occup beta model segname index_pdb
1 N LYS A 211 1 52.884 24.022 35.587 1.00 53.10 1 A 1
2 CA LYS A 211 1 52.916 24.598 36.993 1.00 53.10 1 A 2
⋮
1904 OD2 ASP A 461 243 17.538 51.009 45.748 1.00 97.43 1 A 1904
1905 OXT ASP A 461 243 14.506 47.082 47.528 1.00 97.43 1 A 1905
These segment structure does not copy the data from the original atom vector. Therefore, changes performed on these vectors will be reflected on the original data.
Iterators can be used to obtain or modify properties of the segments. Here we illustrate computing the mass of each segment and renaming segment of all atoms with the segment indices:
julia> using PDBTools
julia> ats = read_pdb(PDBTools.DIMERPDB);
julia> s = collect(eachsegment(ats))
2-element Vector{Segment}[
A-(1905 atoms))
B-(92 atoms))
]
julia> mass.(s)
2-element Vector{Float64}:
25222.339099999943
1210.7300999999993
julia> for (iseg, seg) in enumerate(eachsegment(ats))
for at in seg
at.segname = "$(at.segname)$iseg"
end
end
julia> collect(eachsegment(ats))
2-element Vector{Segment}[
A1-(1905 atoms))
B2-(92 atoms))
]
Reference documentation
PDBTools.Segment
— TypeSegment
Segment data structure. Segments must be consecutive in the atoms
vector, and are identified by having the same segname
and model
fields.
The Segment structure carries the properties of the segment it contains, but it does not copy the original vector of atoms, only the segment meta data and the reference to the original vector. Thus, changes in the segment atoms will be reflected in the original vector of atoms.
Example
julia> using PDBTools
julia> ats = read_pdb(PDBTools.DIMERPDB);
julia> segments = collect(eachsegment(ats))
2-element Vector{Segment}[
A-(1905 atoms))
B-(92 atoms))
]
julia> segname.(segments[1:2])
2-element Vector{InlineStrings.String7}:
"A"
"B"
julia> length(segments[2])
92
PDBTools.eachsegment
— Functioneachsegment(atoms::AbstractVector{<:Atom})
Iterator for the segments of a selection.
Example
julia> using PDBTools
julia> ats = read_pdb(PDBTools.DIMERPDB);
julia> sit = eachsegment(ats)
Segment iterator with length = 2
julia> for seg in sit
@show length(seg)
end
length(seg) = 1905
length(seg) = 92
julia> collect(sit)
2-element Vector{Segment}[
A-(1905 atoms))
B-(92 atoms))
]
Iterate over models
The eachmodel
iterator enables iteration over the segments of a structure. For example:
julia> using PDBTools
julia> ats = wget("8S8N");
julia> eachmodel(ats)
Model iterator with length = 11
julia> model.(eachmodel(ats))
11-element Vector{Int32}:
1
2
3
⋮
10
11
The result of the iterator can also be collected, with:
julia> using PDBTools
julia> ats = wget("8S8N");
julia> m = collect(eachmodel(ats))
11-element Vector{Model}[
1-(234 atoms))
2-(234 atoms))
⋮
10-(234 atoms))
11-(234 atoms))
]
julia> m[1]
Model 1 with 234 atoms.
index name resname chain resnum residue x y z occup beta model segname index_pdb
1 N DLE A 2 1 -5.811 -0.380 -2.159 1.00 0.00 1 1
2 CA DLE A 2 1 -4.785 -0.493 -3.227 1.00 0.00 1 2
⋮
233 HT2 A1H5T B 101 13 -5.695 5.959 -3.901 1.00 0.00 1 233
234 HT1 A1H5T B 101 13 -4.693 4.974 -2.743 1.00 0.00 1 234
The model structure does not copy the data from the original atom vector. Therefore, changes performed on these vectors will be reflected on the original data.
Iterators can be used to obtain or modify properties of the segments. Here we illustrate computing the mass of each segment and renaming segment of all atoms with the segment indices:
julia> using PDBTools
julia> ats = wget("8S8N");
julia> center_of_mass.(eachmodel(ats))
11-element Vector{StaticArraysCore.SVector{3, Float64}}:
[0.633762128213737, -0.1413050285597195, -0.21796044955626692]
[0.560772763043067, -0.15154922049365185, 0.1354801245061217]
[0.506559232784597, -0.09771757024270422, 0.030405317843908077]
⋮
[0.3889973654414868, -0.2110381926238272, 0.21802466991599198]
[0.6995386823110438, -0.1537225338789714, 0.21793134264425737]
Reference documentation
PDBTools.Model
— TypeModel
Model data structure. It carries the data of a model in a PDB file. Models must be consecutive in the atoms
vector, and are identified by having the same model
field.
The Model structure carries the properties of the model it contains, but it does not copy the original vector of atoms, only the model meta data and the reference to the original vector. Thus, changes in the model atoms will be reflected in the original vector of atoms.
Example
In the example below, 8S8N is PDB entry with 11 models.
julia> using PDBTools
julia> ats = wget("8S8N");
julia> models = collect(eachmodel(ats))
11-element Vector{Model}[
1-(234 atoms))
2-(234 atoms))
⋮
10-(234 atoms))
11-(234 atoms))
]
julia> models[1]
Model 1 with 234 atoms.
index name resname chain resnum residue x y z occup beta model segname index_pdb
1 N DLE A 2 1 -5.811 -0.380 -2.159 1.00 0.00 1 1
2 CA DLE A 2 1 -4.785 -0.493 -3.227 1.00 0.00 1 2
⋮
233 HT2 A1H5T B 101 13 -5.695 5.959 -3.901 1.00 0.00 1 233
234 HT1 A1H5T B 101 13 -4.693 4.974 -2.743 1.00 0.00 1 234
PDBTools.eachmodel
— Functioneachmodel(atoms::AbstractVector{<:Atom})
Iterator for the models of a selection.
Example
Here we show how to iterate over the models of a PDB file, annotate the index of the first atom of each model, and collect all models.
julia> using PDBTools
julia> ats = wget("8S8N");
julia> models = eachmodel(ats)
Model iterator with length = 11
julia> first_atom = Atom[]
for model in models
push!(first_atom, model[1])
end
@show index.(first_atom);
index.(first_atom) = Int32[1, 235, 469, 703, 937, 1171, 1405, 1639, 1873, 2107, 2341]
julia> collect(models)
11-element Vector{Model}[
1-(234 atoms))
2-(234 atoms))
⋮
10-(234 atoms))
11-(234 atoms))
]