Selection functions
The select
function can be used to select subsets of atoms from a vector of atoms. A simple selection syntax can be used, for example:
atoms = select(atoms, "protein and resnum < 30")
or standard Julia function can be provided as the second argument:
atoms = select(atoms, at -> isprotein(at) && resnum(at) < 30)
General selection syntax
Accepted Boolean operators: and
, or
, and not
.
The accepted keywords for the selection are:
Keyword | Options | Input value | Example |
---|---|---|---|
index | = ,> ,< ,<= ,>= | Integer | index <= 10 |
index_pdb | = ,> ,< ,<= ,>= | Integer | index_pdb <= 10 |
name | String | name CA | |
element | String | element N | |
resname | String | resname ALA | |
resnum | = ,> ,< ,<= ,>= | Integer | resnum = 10 |
residue | = ,> ,< ,<= ,>= | Integer | residue = 10 |
chain | String | chain A | |
model | Integer | model 1 | |
beta | = ,> ,< ,<= ,>= | Real | beta > 0.5 |
occup | = ,> ,< ,<= ,>= | Real | occup >= 0.3 |
segname | String | segname PROT | |
resnum
is the residue number as written in the PDB file, while residue
is the residue number counted sequentially in the file.
index_pdb
is the number written in the "atom index" field of the PDB file, while index
is the sequential index of the atom in the file.
Special macros: proteins, water
Just use these keywords to select the residues matching the properties desired.
Examples:
aromatic = select(atoms,"aromatic")
aromatic = select(atoms,"charged")
Available keywords:
Keywords | ||
---|---|---|
water | ||
protein | backbone | sidechain |
acidic | basic | |
aliphatic | aromatic | |
charged | neutral | |
polar | nonpolar | |
hydrophobic | ||
The properties refer to protein residues and will return false
to every non-protein residue. Thus, be careful with the use of not
with these selections, as they might retrieve non-protein atoms.
PDBTools.select
— Functionselect(atoms::AbstractVector{<:Atom}, by::String)
Selects atoms from a vector of atoms using a string query, or a function.
PDBTools.Select
— TypeSelect
This structure acts a function when used within typical julia filtering functions, by converting a string selection into a call to query call.
Example
Using a string to select the CA atoms of the first residue:
julia> using PDBTools
julia> atoms = read_pdb(PDBTools.TESTPDB, "protein");
julia> findfirst(Select("name CA"), atoms)
5
julia> filter(Select("name CA and residue 1"), atoms)
Array{Atoms,1} with 1 atoms with fields:
index name resname chain resnum residue x y z occup beta model segname index_pdb
5 CA ALA A 1 1 -8.483 -14.912 -6.726 1.00 0.00 1 PROT 5
Retrieving indices, filtering, etc
If only the indices of the atoms are of interest, the Julia findall
function can be used, by passing a Select
object, or a regular function, to select the atoms:
julia> using PDBTools
julia> atoms = read_pdb(PDBTools.TESTPDB, "protein and residue <= 3");
julia> findall(Select("name CA"), atoms)
3-element Vector{Int64}:
5
15
26
julia> findall(at -> name(at) == "CA", atoms)
3-element Vector{Int64}:
5
15
26
All indexing is 1-based. Thus, the first atom of the structure is atom 1.
The Select
constructor can be used to feed simple selection syntax entries to other Julia functions, such as findfirst
, findlast
, or filter
:
julia> using PDBTools
julia> atoms = read_pdb(PDBTools.TESTPDB, "protein and residue <= 3");
julia> filter(Select("name CA"), atoms)
Array{Atoms,1} with 3 atoms with fields:
index name resname chain resnum residue x y z occup beta model segname index_pdb
5 CA ALA A 1 1 -8.483 -14.912 -6.726 1.00 0.00 1 PROT 5
15 CA CYS A 2 2 -5.113 -13.737 -5.466 1.00 0.00 1 PROT 15
26 CA ASP A 3 3 -3.903 -11.262 -8.062 1.00 0.00 1 PROT 26
julia> findfirst(Select("beta = 0.00"), atoms)
1
The sel""
literal string macro is a shortcut for Select
. Thus, these syntaxes are valid:
julia> using PDBTools
julia> atoms = read_pdb(PDBTools.TESTPDB, "protein and residue <= 3");
julia> name.(filter(sel"name CA", atoms))
3-element Vector{InlineStrings.String7}:
"CA"
"CA"
"CA"
julia> findfirst(sel"name CA", atoms)
5
Use Julia functions directly
Selections can be done using Julia functions directly, providing a greater control over the selection and, possibly, the use of user defined selection functions. For example:
myselection(atom) = (atom.x < 10.0 && atom.resname == "GLY") || (atom.name == "CA")
atoms = select(atoms, myselection)
or, for example, using Julia anonymous functions
select(atoms, at -> isprotein(at) && name(at) == "O" && atom.x < 10.0)
The only requirement is that the function defining the selection receives an PDBTools.Atom
as input, and returns true
or false
depending on the conditions required for the atom.
The macro-keywords described in the previous section can be used within the Julia function syntax, but the function names start with is
. For example:
select(atoms, at -> isprotein(at) && resnum(at) in [ 1, 5, 7 ])
Thus, the macro selection functions are: iswater
, isprotein
, isbackbone
, issidechain
, isacidic
, isbasic
, isaliphatic
, isaromatic
, ischarged
, isneutral
, ispolar
, isnonpolar
, and ishydrophobic
.
Iterate over residues (or molecules)
The eachresidue
iterator allows iteration over the resiudes of a structure (in PDB files distinct molecules are associated to different residues, thus this iterates similarly over the molecules of a structure). For example:
julia> using PDBTools
julia> protein = read_pdb(PDBTools.SMALLPDB);
julia> count(atom -> resname(atom) == "ALA", protein)
12
julia> count(res -> resname(res) == "ALA", eachresidue(protein))
1
The result of the iterator can also be collected, with:
julia> using PDBTools
julia> protein = read_pdb(PDBTools.SMALLPDB);
julia> residues = collect(eachresidue(protein))
Array{Residue,1} with 3 residues.
julia> residues[1]
Residue of name ALA with 12 atoms.
index name resname chain resnum residue x y z occup beta model segname index_pdb
1 N ALA A 1 1 -9.229 -14.861 -5.481 0.00 0.00 1 PROT 1
2 1HT1 ALA A 1 1 -10.048 -15.427 -5.569 0.00 0.00 1 PROT 2
3 HT2 ALA A 1 1 -9.488 -13.913 -5.295 0.00 0.00 1 PROT 3
⋮
10 HB3 ALA A 1 1 -9.164 -15.063 -8.765 1.00 0.00 1 PROT 10
11 C ALA A 1 1 -7.227 -14.047 -6.599 1.00 0.00 1 PROT 11
12 O ALA A 1 1 -7.083 -13.048 -7.303 1.00 0.00 1 PROT 12
These residue vector do not copy the data from the original atom vector. Therefore, changes performed on these vectors will be reflected on the original data.
It is possible also to iterate over the atoms of one or more residue:
julia> using PDBTools
julia> protein = read_pdb(PDBTools.SMALLPDB);
julia> m_ALA = 0.
for residue in eachresidue(protein)
if name(residue) == "ALA"
for atom in residue
m_ALA += mass(atom)
end
end
end
m_ALA
73.09488999999999
Which, in this simple example, results in the same as:
julia> sum(mass(at) for at in protein if resname(at) == "ALA" )
73.09488999999999
or
julia> sum(mass(res) for res in eachresidue(protein) if resname(res) == "ALA" )
73.09488999999999
PDBTools.Residue
— TypeResidue(atoms::AbstractVector{<:Atom}, range::UnitRange{Int})
Residue data structure. It contains two fields: atoms
which is a vector of Atom
elements, and range
, which indicates which atoms of the atoms
vector compose the residue.
The Residue structure carries the properties of the residue or molecule of the atoms it contains, but it does not copy the original vector of atoms, only the residue meta data for each residue.
Example
julia> pdb = wget("1LBD");
julia> residues = collect(eachresidue(pdb))
Array{Residue,1} with 238 residues.
julia> resnum.(residues[1:3])
3-element Vector{Int64}:
225
226
227
julia> residues[5].chain
"A"
julia> residues[8].range
52:58
PDBTools.eachresidue
— Functioneachresidue(atoms::AbstractVector{<:Atom})
Iterator for the residues (or molecules) of a selection.
Example
julia> atoms = wget("1LBD");
julia> length(eachresidue(atoms))
238
julia> for res in eachresidue(atoms)
println(res)
end
Residue of name SER with 6 atoms.
index name resname chain resnum residue x y z beta occup model segname index_pdb
1 N SER A 225 1 45.228 84.358 70.638 67.05 1.00 1 - 1
2 CA SER A 225 1 46.080 83.165 70.327 68.73 1.00 1 - 2
3 C SER A 225 1 45.257 81.872 70.236 67.90 1.00 1 - 3
4 O SER A 225 1 45.823 80.796 69.974 64.85 1.00 1 - 4
5 CB SER A 225 1 47.147 82.980 71.413 70.79 1.00 1 - 5
6 OG SER A 225 1 46.541 82.639 72.662 73.55 1.00 1 - 6
Residue of name ALA with 5 atoms.
index name resname chain resnum residue x y z beta occup model segname index_pdb
7 N ALA A 226 2 43.940 81.982 70.474 67.09 1.00 1 - 7
8 CA ALA A 226 2 43.020 80.825 70.455 63.69 1.00 1 - 8
9 C ALA A 226 2 41.996 80.878 69.340 59.69 1.00 1 - 9
...
PDBTools.resname
— Functionresname(residue::Union{AbstractString,Char})
Returns the residue name, given the one-letter code or residue name. Differently from threeletter
, this function will return the force-field name if available in the list of protein residues.
Examples
julia> resname("ALA")
"ALA"
julia> resname("GLUP")
"GLUP"
PDBTools.residuename
— Functionresiduename(residue::Union{AbstractString,Char})
Function to return the long residue name from other residue codes. The function is case-insensitive.
Examples
julia> residuename("A")
"Alanine"
julia> residuename("Glu")
"Glutamic Acid"
Using VMD
VMD is a very popular and powerful package for visualization of simulations. It contains a very versatile library to read topologies and trajectory files, and a powerful selection syntax. We provide here a wrapper to VMD which allows using its capabilities.
For example, the solute can be defined with:
indices, names = select_with_vmd("./system.gro","protein",vmd="/usr/bin/vmd")
The output will contain two lists, one of atom indices (one-based) and atom names. The indices correspond to sequential indices in the input, not the indices written in the PDB file, for example.
The input may also be a vector of atoms of type PDBTools.Atom
:
atoms = read_pdb("mypdbfile.pdb")
indices, names = select_with_vmd(atoms,"protein",vmd="/usr/bin/vmd")
If vmd
is available in your path, there is no need to pass it as a keyword parameter.
The main advantage here is that all the file types and the complete selection syntax that VMD supports are supported. But VMD needs to be installed and is run in background, and it takes a few seconds to run.
PDBTools.select_with_vmd
— Functionselect_with_vmd(inputfile::String, selection::String; vmd="vmd", srcload=nothing)
select_with_vmd(atoms::AbstractVector{<:Atom}, selection::String; vmd="vmd", srcload=nothing)
Select atoms using vmd selection syntax, with vmd in background. The input can be a file or a list of atoms.
Returns a tuple with list of index (one-based) and atom names of the selection.
Function to return the selection from a input file (topology, coordinates, etc), by calling VMD in the background.
The srcload
argument can be used to load a list of scripts before loading the input file, for example with macros to define custom selection keywords.
Loading vmd scripts
The select_with_vmd
function also accepts an optional keyword parameter srcload
, which can be used to load custom scripts within vmd
before running setting the selection. This allows the definition of tcl
scripts with custom selection macros, for instance. The usage would be:
sel = select_with_vmd("file.pdb", "resname MYRES"; srcload = [ "mymacros1.tcl", "mymacros2.tcl" ])
Which corresponds to source
ing each of the macro files in VMD before defining the selection with the custom MYRES
name.
VMD uses 0-based indexing and select_with_vmd
adjusts that. However, if a selection is performed by index, as with index 1
, VMD will select the second atom, and the output will be [2]
. Selections by type, name, segment, residue name, etc, will be consistent with one-based indexing.