Iterators

PDBTools.jl provides lazy iterators over Residues, Chains, Segments, and Models of a structure file. The iterators behave similarly, and can be used bo computed properties of independent structural elements. The documentation bellow exemplifies in more detail the features associated to Residue and Chain interators, but the properties and valid for Segment and Model iterators similarly.

Iterate over residues (or molecules)

The eachresidue iterator enables iteration over the residues of a structure. In PDB files, distinct molecules are often treated as separate residues, so this iterator can be used to iterate over the molecules within a structure. For example:

using PDBTools
protein = read_pdb(PDBTools.SMALLPDB);
count(res -> resname(res) == "ALA", eachresidue(protein))
1

Here, we use eachresidue to count the number of residues named "ALA". This highlights the distinction between residue-level and atom-level operations.

Collecting Residues into a Vector

Residues produced by eachresidue can be collected into a vector for further processing:

residues = collect(eachresidue(protein))
3-element Vector{Residue}[ 
    ALA1A
    CYS2A
    ASP3A
]

and the atoms of a specific residue can be seen by indexing the residue:

residues[1]
 Residue of name ALA with 12 atoms.
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
       1    N     ALA     A        1        1   -9.229  -14.861   -5.481  0.00  0.00     1    PROT         1
       2 1HT1     ALA     A        1        1  -10.048  -15.427   -5.569  0.00  0.00     1    PROT         2
⋮
      11    C     ALA     A        1        1   -7.227  -14.047   -6.599  1.00  0.00     1    PROT        11
      12    O     ALA     A        1        1   -7.083  -13.048   -7.303  1.00  0.00     1    PROT        12
Note

Iterators or collected vectors do not create copies of the original atom data. This means that any changes made to the residue vector will directly modify the corresponding data in the original atom vector.

Iterating Over Atoms Within Residues

You can iterate over the atoms of one or more residues using nested loops. Here, we compute the total number of atoms of ALA residues:

let n_ala_cys = 0
    for residue in eachresidue(protein)
        if name(residue) in ("ALA", "CYS")
            for atom in residue
                n_ala_cys += 1
            end
        end
    end
    n_ala_cys
end
23

This method produces the same result as the more concise approach:

sum(length(r) for r in eachresidue(protein) if name(r) in ("ALA", "CYS"))
23

Alternativelly, an image (not a copy!) of the atoms corresponding to a residue can be obtained with get_atoms:

r1_atoms = get_atoms(residues[1])
   SubArray{Atom{Nothing}, 1, Vector{Atom{Nothing}}, Tuple{UnitRange{Int64}}, true} with 12 atoms with fields:
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
       1    N     ALA     A        1        1   -9.229  -14.861   -5.481  0.00  0.00     1    PROT         1
       2 1HT1     ALA     A        1        1  -10.048  -15.427   -5.569  0.00  0.00     1    PROT         2
⋮
      11    C     ALA     A        1        1   -7.227  -14.047   -6.599  1.00  0.00     1    PROT        11
      12    O     ALA     A        1        1   -7.083  -13.048   -7.303  1.00  0.00     1    PROT        12

Reference documentation

PDBTools.ResidueType
Residue

Residue data structure.

The Residue structure carries the properties of the residue or molecule of the atoms it contains, but it does not copy the original vector of atoms, only the residue meta data for each residue. Thus, changes in the residue atoms will be reflected in the original vector of atoms.

Example

julia> using PDBTools

julia> pdb = wget("1LBD");

julia> residues = collect(eachresidue(pdb))
238-element Vector{Residue}[
    SER225A
    ALA226A
    ⋮
    MET461A
    THR462A
]

julia> resnum.(residues[1:3])
3-element Vector{Int32}:
 225
 226
 227

julia> residues[5].chain
"A"

julia> residues[8].range
52:58

julia> mass(residues[1])
82.0385f0
source
PDBTools.eachresidueFunction
eachresidue(atoms::AbstractVector{<:Atom})

Iterator for the residues (or molecules) of a selection.

Example

julia> using PDBTools

julia> atoms = wget("1LBD");

julia> eachresidue(atoms)
 Residue iterator with length = 238

julia> collect(eachresidue(atoms))
238-element Vector{Residue}[
    SER225A
    ALA226A
    ⋮
    MET461A
    THR462A
]
source
PDBTools.resnameFunction
resname(residue::Union{AbstractString,Char})

Returns the residue name, given the one-letter code or residue name. Differently from threeletter, this function will return the force-field name if available in the list of protein residues.

Examples

julia> resname("ALA")
"ALA"

julia> resname("GLUP")
"GLUP"
source
PDBTools.residuenameFunction
residuename(residue::Union{AbstractString,Char})

Function to return the long residue name from other residue codes. The function is case-insensitive.

Examples

julia> residuename("A")
"Alanine"

julia> residuename("Glu")
"Glutamic Acid"
source

Iterate over chains

The eachchain iterator in PDBTools enables users to iterate over the chains in a PDB structure. A PDB file may contain multiple protein chains. This iterator simplifies operations involving individual chains.

ats = read_pdb(PDBTools.CHAINSPDB);
chain.(eachchain(ats)) # Retrieve the names of all chains in the structure
4-element Vector{InlineStrings.String3}:
 "A"
 "B"
 "A"
 "D"
model.(eachchain(ats)) # Retrieve the model numbers associated with each chain
4-element Vector{Int32}:
 1
 1
 1
 2
chain_A1 = first(eachchain(ats)); # Access the first chain in the iterator
 Chain A with 48 atoms.
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
       1    N     ASP     A        1        1  133.978  119.386  -23.646  1.00  0.00     1    ASYN         1
       2   CA     ASP     A        1        1  134.755  118.916  -22.497  1.00  0.00     1    ASYN         2
⋮
      47 HD23     LEU     A        3        3  130.568  111.868  -26.242  1.00  0.00     1    ASYN        47
      48    O     LEU     A        3        3  132.066  112.711  -21.739  1.00  0.00     1    ASYN        48
resname.(eachresidue(chain_A1)) # Retrieve residue names for chain A in model 1
3-element Vector{InlineStrings.String7}:
 "ASP"
 "GLN"
 "LEU"

In the example above, the chain. command retrieves the names of all chains in the structure, while model. command lists the model numbers for each chain. This PDB structure contains two models for chain A, where the third residue changes from leucine (LEU) in model 1 to valine (VAL) in model 2.

Collect chains and indexing

As seen in the previous example, The first and last commands allow quick access to the first an last elements in the iterator. For more specific indexing, you can collect all chains into an array and then use numerical indices to access them.

chains = collect(eachchain(ats))
4-element Vector{Chain}[ 
    Chain(A-48 atoms)
    Chain(B-48 atoms)
    Chain(A-48 atoms)
    Chain(D-45 atoms)
]
chain_B = chains[2]
 Chain B with 48 atoms.
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
      49    N     ASP     B        4        4  135.661  123.866  -22.311  1.00  0.00     1    ASYN        49
      50   CA     ASP     B        4        4  136.539  123.410  -21.227  1.00  0.00     1    ASYN        50
⋮
      95 HD23     LEU     B        6        6  138.780  120.216  -17.864  1.00  0.00     1    ASYN        95
      96    O     LEU     B        6        6  141.411  117.975  -21.923  1.00  0.00     1    ASYN        96

Modifying Atom Properties in a Chain

Any changes made to the atoms of a chain variable directly overwrite the properties of the original atoms in the structure. For example, modifying the occupancy and beta-factor columns of atoms in model 2 of chain A will update the corresponding properties in the original structure.

In the example below, the occup and beta properties of all atoms in model 2 of chain A are set to 0.00. The changes are reflected in the original ats vector, demonstrating that the modifications propagate to the parent data structure.

first(eachchain(ats))
 Chain A with 48 atoms.
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
       1    N     ASP     A        1        1  133.978  119.386  -23.646  1.00  0.00     1    ASYN         1
       2   CA     ASP     A        1        1  134.755  118.916  -22.497  1.00  0.00     1    ASYN         2
⋮
      47 HD23     LEU     A        3        3  130.568  111.868  -26.242  1.00  0.00     1    ASYN        47
      48    O     LEU     A        3        3  132.066  112.711  -21.739  1.00  0.00     1    ASYN        48
for chain in eachchain(ats)
    if name(chain) == "A" && model(chain) == 1
        for atom in chain
            atom.occup = 0.00
            atom.beta = 0.00
        end
    end
end
first(eachchain(ats))
 Chain A with 48 atoms.
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
       1    N     ASP     A        1        1  133.978  119.386  -23.646  0.00  0.00     1    ASYN         1
       2   CA     ASP     A        1        1  134.755  118.916  -22.497  0.00  0.00     1    ASYN         2
⋮
      47 HD23     LEU     A        3        3  130.568  111.868  -26.242  0.00  0.00     1    ASYN        47
      48    O     LEU     A        3        3  132.066  112.711  -21.739  0.00  0.00     1    ASYN        48

This behavior ensures efficient data manipulation but requires careful handling to avoid unintended changes.

Reference documentation

PDBTools.ChainType
Chain

Creates a Chain data structure. Chains must be consecutive in the atoms vector, and are identified by having the same chain, segment, and model fields.

The Chain structure carries the properties of the atoms it contains, but it does not copy the original vector of atoms. This means that any changes made in the Chain structure atoms, will overwrite the original vector of atoms.

Examples

julia> using PDBTools

julia> ats = read_pdb(PDBTools.CHAINSPDB);

julia> chains = collect(eachchain(ats))
4-element Vector{Chain}[
    Chain(A-48 atoms)
    Chain(B-48 atoms)
    Chain(A-48 atoms)
    Chain(D-45 atoms)
]

julia> chains[1]
 Chain A with 48 atoms.
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
       1    N     ASP     A        1        1  133.978  119.386  -23.646  1.00  0.00     1    ASYN         1
       2   CA     ASP     A        1        1  134.755  118.916  -22.497  1.00  0.00     1    ASYN         2
⋮
      47 HD23     LEU     A        3        3  130.568  111.868  -26.242  1.00  0.00     1    ASYN        47
      48    O     LEU     A        3        3  132.066  112.711  -21.739  1.00  0.00     1    ASYN        48

julia> mass(chains[1])
353.3787f0 

julia> model(chains[4])
2

julia> segname(chains[2])
"ASYN"
source
PDBTools.eachchainFunction
eachchain(atoms::AbstractVector{<:Atom})

Iterator for the chains of a selection.

Example

julia> using PDBTools

julia> ats = read_pdb(PDBTools.CHAINSPDB);

julia> eachchain(ats)
 Chain iterator with length = 4

julia> chains = collect(eachchain(ats))
4-element Vector{Chain}[
    Chain(A-48 atoms)
    Chain(B-48 atoms)
    Chain(A-48 atoms)
    Chain(D-45 atoms)
]
source

Iterate over segments

The eachsegment iterator enables iteration over the segments of a structure. For example:

read_pdb(PDBTools.DIMERPDB)
eachsegment(ats)
 Segment iterator with length = 2
name.(eachsegment(ats))
2-element Vector{InlineStrings.String7}:
 "ASYN"
 "ASYN"

The result of the iterator can also be collected, with:

s = collect(eachsegment(ats))
2-element Vector{Segment}[ 
    ASYN-(144 atoms))
    ASYN-(45 atoms))
]
s[1]
 Segment of name ASYN with 144 atoms.
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
       1    N     ASP     A        1        1  133.978  119.386  -23.646  0.00  0.00     1    ASYN         1
       2   CA     ASP     A        1        1  134.755  118.916  -22.497  0.00  0.00     1    ASYN         2
⋮
     143 HD23     LEU     A        9        9  142.275  131.839  -10.225  0.00  0.00     1    ASYN       143
     144    O     LEU     A        9        9  141.311  127.197  -12.279  0.00  0.00     1    ASYN       144

These segment structure does not copy the data from the original atom vector. Therefore, changes performed on these vectors will be reflected on the original data.

Iterators can be used to obtain or modify properties of the segments. Here we illustrate computing the mass of each segment and renaming segment of all atoms with the segment indices:

s = collect(eachsegment(ats))
2-element Vector{Segment}[ 
    ASYN-(144 atoms))
    ASYN-(45 atoms))
]

Properties of each segment can then be obtained by broadcasting over the segments:

mass.(s)
2-element Vector{Float32}:
 1060.1364
  339.35178
formula.(s)
2-element Vector{PDBTools.Formula}:
 H₆₉C₄₅N₁₅O₁₅
 H₂₁C₁₄N₅O₅

And iterating over the segments can allow changing properties of the atoms in a segment-specific way. For instance, here we change the segment names:

for (iseg, seg) in enumerate(eachsegment(ats))
    for at in seg
        at.segname = "$(at.segname)$iseg"
    end
end
collect(eachsegment(ats))
2-element Vector{Segment}[ 
    ASYN1-(144 atoms))
    ASYN2-(45 atoms))
]

Reference documentation

PDBTools.SegmentType
Segment

Segment data structure. Segments must be consecutive in the atoms vector, and are identified by having the same segname and model fields.

The Segment structure carries the properties of the segment it contains, but it does not copy the original vector of atoms, only the segment meta data and the reference to the original vector. Thus, changes in the segment atoms will be reflected in the original vector of atoms.

Example

julia> using PDBTools

julia> ats = read_pdb(PDBTools.DIMERPDB);

julia> segments = collect(eachsegment(ats))
2-element Vector{Segment}[
    A-(1905 atoms))
    B-(92 atoms))
]

julia> segname.(segments[1:2])
2-element Vector{InlineStrings.String7}:
 "A"
 "B"

julia> length(segments[2])
92
source
PDBTools.eachsegmentFunction
eachsegment(atoms::AbstractVector{<:Atom})

Iterator for the segments of a selection.

Example

julia> using PDBTools

julia> ats = read_pdb(PDBTools.DIMERPDB);

julia> sit = eachsegment(ats)
 Segment iterator with length = 2

julia> for seg in sit
           @show length(seg)
       end
length(seg) = 1905
length(seg) = 92

julia> collect(sit)
2-element Vector{Segment}[ 
    A-(1905 atoms))
    B-(92 atoms))
]
source

Iterate over models

The eachmodel iterator enables iteration over the segments of a structure. For example:

ats = wget("8S8N");
eachmodel(ats)
 Model iterator with length = 11
model.(eachmodel(ats))
11-element Vector{Int32}:
  1
  2
  3
  ⋮
 10
 11

The result of the iterator can also be collected, with:

m = collect(eachmodel(ats))
11-element Vector{Model}[ 
    1-(234 atoms))
    2-(234 atoms))
    ⋮
    10-(234 atoms))
    11-(234 atoms))
]
m[1]
 Model 1 with 234 atoms.
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
       1    N     DLE     A        2        1   -5.811   -0.380   -2.159  1.00  0.00     1                 1
       2   CA     DLE     A        2        1   -4.785   -0.493   -3.227  1.00  0.00     1                 2
⋮
     233  HT2   A1H5T     B      101       13   -5.695    5.959   -3.901  1.00  0.00     1               233
     234  HT1   A1H5T     B      101       13   -4.693    4.974   -2.743  1.00  0.00     1               234

The model structure does not copy the data from the original atom vector. Therefore, changes performed on these vectors will be reflected on the original data.

Iterators can be used to obtain or modify properties of the models. Here we illustrate computing the mass of each segment and renaming segment of all atoms with the segment indices:

center_of_mass.(eachmodel(ats))
11-element Vector{StaticArraysCore.SVector{3, Float32}}:
 [0.6337627, -0.14130484, -0.2179606]
 [0.56077266, -0.15154965, 0.1354806]
 [0.5065595, -0.0977174, 0.030405657]
 ⋮
 [0.38899764, -0.21103837, 0.2180245]
 [0.69953984, -0.15372278, 0.21793146]

Reference documentation

PDBTools.ModelType
Model

Model data structure. It carries the data of a model in a PDB file. Models must be consecutive in the atoms vector, and are identified by having the same model field.

The Model structure carries the properties of the model it contains, but it does not copy the original vector of atoms, only the model meta data and the reference to the original vector. Thus, changes in the model atoms will be reflected in the original vector of atoms.

Example

In the example below, 8S8N is PDB entry with 11 models.

julia> using PDBTools

julia> ats = wget("8S8N");

julia> models = collect(eachmodel(ats))
11-element Vector{Model}[
    1-(234 atoms))
    2-(234 atoms))
    ⋮
    10-(234 atoms))
    11-(234 atoms))
]

julia> models[1]
 Model 1 with 234 atoms.
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
       1    N     DLE     A        2        1   -5.811   -0.380   -2.159  1.00  0.00     1                 1
       2   CA     DLE     A        2        1   -4.785   -0.493   -3.227  1.00  0.00     1                 2
⋮
     233  HT2   A1H5T     B      101       13   -5.695    5.959   -3.901  1.00  0.00     1               233
     234  HT1   A1H5T     B      101       13   -4.693    4.974   -2.743  1.00  0.00     1               234
source
PDBTools.eachmodelFunction
eachmodel(atoms::AbstractVector{<:Atom})

Iterator for the models of a selection.

Example

Here we show how to iterate over the models of a PDB file, annotate the index of the first atom of each model, and collect all models.

julia> using PDBTools

julia> ats = wget("8S8N");

julia> models = eachmodel(ats)
 Model iterator with length = 11

julia> first_atom = Atom[]
       for model in models
           push!(first_atom, model[1])
       end
       @show index.(first_atom);
index.(first_atom) = Int32[1, 235, 469, 703, 937, 1171, 1405, 1639, 1873, 2107, 2341]

julia> collect(models)
11-element Vector{Model}[
    1-(234 atoms))
    2-(234 atoms))
    ⋮
    10-(234 atoms))
    11-(234 atoms))
]
source