Selection functions
The select function can be used to select subsets of atoms from a vector of atoms. A simple selection syntax can be used, for example: 
atoms = select(atoms, "protein and resnum < 30")or standard Julia function can be provided as the second argument:
atoms = select(atoms, at -> isprotein(at) && resnum(at) < 30)PDBTools.select — Functionselect(atoms::AbstractVector{<:Atom}, selection_string::String)
select(atoms::AbstractVector{<:Atom}, selection_function::Function)Selects atoms from a vector of atoms using a string query, or a function.
The string query can be a simple selection like "name CA" or a more complex one like "name CA or (residue 1 2 3)". The function can be any function that takes an atom and returns a boolean value.
Example
julia> using PDBTools
julia> atoms = read_pdb(PDBTools.TESTPDB, "protein");
julia> select(atoms, "name CA and (residue > 1 and residue < 3)")
   Vector{Atom{Nothing}} with 1 atoms with fields:
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
      15   CA     CYS     A        2        2   -5.113  -13.737   -5.466  1.00  0.00     1    PROT        15
julia> select(atoms, at -> name(at) == "CA" && 1 < residue(at) < 3)
   Vector{Atom{Nothing}} with 1 atoms with fields:
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
      15   CA     CYS     A        2        2   -5.113  -13.737   -5.466  1.00  0.00     1    PROT        15  
PDBTools.Select — TypeSelectThis structure acts a function when used within typical julia filtering functions, by converting a string selection into a call to query call.
Example
Using a string to select the CA atoms of the first residue:
julia> using PDBTools
julia> atoms = read_pdb(PDBTools.TESTPDB, "protein");
julia> findfirst(Select("name CA"), atoms)
5
julia> filter(Select("name CA and residue 1"), atoms)
   Vector{Atom{Nothing}} with 1 atoms with fields:
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
       5   CA     ALA     A        1        1   -8.483  -14.912   -6.726  1.00  0.00     1    PROT         5
General selection syntax
- Accepted Boolean operators: and,or, andnot.
- Accepted comparison operators: <,>,<=,=>,==
- Support for parenthesis.
- Support for multiple keys as a shorcut for multiple or(i. e.residue 1 3 5)
- Support for selection strings supporting parenthesis and multiple keys was introduced in v3.1.0
- Support for selection by coordinates, x,y, andzwas introduced in v3.2.0
The accepted keywords for the selection are:
| Keyword | Options | Input value | Example | 
|---|---|---|---|
| index | =,>,<,<=,>= | Integer | index <= 10 | 
| index_pdb | =,>,<,<=,>= | Integer | index_pdb <= 10 | 
| name | String | name CA | |
| element | String | element N | |
| resname | String | resname ALA | |
| resnum | =,>,<,<=,>= | Integer | resnum = 10 | 
| residue | =,>,<,<=,>= | Integer | residue = 10 | 
| chain | String | chain A | |
| model | Integer | model 1 | |
| beta | =,>,<,<=,>= | Real | beta > 0.5 | 
| occup | =,>,<,<=,>= | Real | occup >= 0.3 | 
| segname | String | segname PROT | |
| x | =,>,<,<=,>= | Real | x >= 3.0 | 
| y | =,>,<,<=,>= | Real | y < 0.0 | 
| z | =,>,<,<=,>= | Real | z >= 1.0 | 
resnum is the residue number as written in the PDB file, while residue is the residue number counted sequentially in the file.
index_pdb is the number written in the "atom index" field of the PDB file, while index is the sequential index of the atom in the file. 
Special macros: proteins, water
Just use these keywords to select the residues matching the properties desired.
Examples:
aromatic = select(atoms,"aromatic")
aromatic = select(atoms,"charged")
Available keywords:
| Keywords | ||
|---|---|---|
| water | ||
| protein | backbone | sidechain | 
| acidic | basic | |
| aliphatic | aromatic | |
| charged | neutral | |
| polar | nonpolar | |
| hydrophobic | ||
The properties refer to protein residues and will return false to every non-protein residue. Thus, be careful with the use of not with these selections, as they might retrieve non-protein atoms.
Retrieving indices, filtering, etc
If only the indices of the atoms are of interest, the Julia findall function can be used, by passing a Select object, or a regular  function, to select the atoms:
julia> using PDBTools
julia> atoms = read_pdb(PDBTools.TESTPDB, "protein and residue <= 3");
julia> findall(Select("name CA"), atoms)
3-element Vector{Int64}:
  5
 15
 26
julia> findall(at -> name(at) == "CA", atoms)
3-element Vector{Int64}:
  5
 15
 26The Select constructor can be used to feed simple selection syntax entries to  other Julia functions, such as findfirst, findlast, or filter:
julia> using PDBTools
julia> atoms = read_pdb(PDBTools.TESTPDB, "protein and residue <= 3");
julia> filter(Select("name CA"), atoms)
   Vector{Atom{Nothing}} with 3 atoms with fields:
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
       5   CA     ALA     A        1        1   -8.483  -14.912   -6.726  1.00  0.00     1    PROT         5
      15   CA     CYS     A        2        2   -5.113  -13.737   -5.466  1.00  0.00     1    PROT        15
      26   CA     ASP     A        3        3   -3.903  -11.262   -8.062  1.00  0.00     1    PROT        26
julia> findfirst(Select("beta = 0.00"), atoms)
1The sel"" literal string macro is a shortcut for Select. Thus, these syntaxes are valid:
julia> using PDBTools
julia> atoms = read_pdb(PDBTools.TESTPDB, "protein and residue <= 3");
julia> name.(filter(sel"name CA", atoms))
3-element Vector{InlineStrings.String7}:
 "CA"
 "CA"
 "CA"
julia> findfirst(sel"name CA", atoms)
5Use Julia functions directly
Selections can be done using Julia functions directly, providing a greater control over the selection and, possibly, the use of user defined selection functions. For example:
myselection(atom) = (atom.x < 10.0 && atom.resname == "GLY") || (atom.name == "CA") 
atoms = select(atoms, myselection)or, for example, using Julia anonymous functions
select(atoms, at -> isprotein(at) && name(at) == "O" && atom.x < 10.0)The only requirement is that the function defining the selection receives an PDBTools.Atom as input, and returns true or false depending on the conditions required for the atom.
The macro-keywords described in the previous section can be used within  the Julia function syntax, but the function names start with is. For example:
select(atoms, at -> isprotein(at) && resnum(at) in [ 1, 5, 7 ])Thus, the macro selection functions are: iswater,  isprotein,     isbackbone,    issidechain, isacidic,      isbasic,                   isaliphatic,   isaromatic,                ischarged,     isneutral,                 ispolar,       isnonpolar,                and ishydrophobic.                          
Using VMD
VMD is a very popular and powerful package for visualization of simulations. It contains a very versatile library to read topologies and trajectory files, and a powerful selection syntax. We provide here a wrapper to VMD which enables using its capabilities.
Some notable differences of the PDBTools.select function relative to the selection syntax of VMD are:
- VMD uses 0-based indexing. Thus, the first atom is atom 0 for VMD, and atom 1 for PDBTools. Same for residue numbering. Be careful.
- VMD uses residfor the residue number as written in the PDB file, while PDBTools usesresidue.
- VMD uses residuefor the sequential number of the residue in the PDB file, while PDBTools usesresnum.
The select_with_vmd input can be a vector of PDBTools.Atoms, or a filename. If the input is a vector of Atoms, the output will be the corresponding atoms matching the selection. If the input is a filename, two lists are returned: the list of indices and names of the corresponding atoms. This is because some input files supported by VMD (e. g. GRO, PSF, etc.) do not contain full atom information. 
For example, here some atoms are selected from a previously loaded vector of atoms:
julia> using PDBTools
julia> pdbfile = PDBTools.SMALLPDB
julia> atoms = read_pdb(pdbfile);
julia> selected_atoms = select_with_vmd(atoms,"resname ALA and name HT2 HT3";vmd="/usr/bin/vmd")
   Vector{Atom{Nothing}} with 2 atoms with fields:
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
       3  HT2     ALA     A        1        1   -9.488  -13.913   -5.295  0.00  0.00     1    PROT         3
       4  HT3     ALA     A        1        1   -8.652  -15.208   -4.741  0.00  0.00     1    PROT         4And, now, we provide the filename as input:
julia> selected_atoms = select_with_vmd(pdbfile,"resname ALA and name HT2 HT3";vmd="/usr/bin/vmd")
([3, 4], ["HT2", "HT3"])Note that in the above examples we use name HT2 HT3 which is not currently supported by the  internal PDBTools select function, which would require name HT2 or name HT3. 
Here, the output will contain two lists, one of atom indices (one-based) and atom names. The indices correspond to sequential indices in the input, not the indices written in the PDB file, for example.
The main advantage here is that all the file types and the complete selection syntax that VMD supports are supported. But VMD needs to be installed and is run in background, and it takes a few seconds to run.
PDBTools.select_with_vmd — Functionselect_with_vmd(atoms::AbstractVector{<:Atom}, selection::String; vmd="vmd", srcload=nothing)
select_with_vmd(inputfile::String, selection::String; vmd="vmd", srcload=nothing)Select atoms using vmd selection syntax, with vmd in background. The input can be a file or a list of atoms.
Input structure and output format:
- atoms::AbstractVector{<:Atom}: A vector of- PDBTools.Atomobjects to select from. In this case, the output will be a vector of- PDBTools.Atomobjects that match the selection.
- inputfile::String: Path to the input file (e.g., PDB, PSF, GRO, etc.) or a temporary file containing atom data. In this case, two vectors will be returned: one with the indices of the selected atoms and another with their names.
The outputs are different in each case because VMD supports selections on files like PSF, GRO, etc., which do not  carry the full atom information like PDB files do.
Additional arguments:
- selection::String: A string containing the selection criteria in VMD syntax, e.g.,- "protein and residue 1".
- vmd::String: The command to run VMD. Default is- "vmd", but can be set to the full path if VMD is not in the system PATH.
- srcload::Union{Nothing, AbstractString, Vector{AbstractString}}: A script or a list of VMD scripts to load before executing the selection, for example with macros to define custom selection keywords.
Loading vmd scripts
The select_with_vmd function also accepts an optional keyword parameter srcload, which can be used to load custom scripts within vmd before running setting the selection. This enables the definition of tcl scripts with custom selection macros, for instance. The usage would be: 
sel = select_with_vmd("file.pdb", "resname MYRES"; srcload = [ "mymacros1.tcl", "mymacros2.tcl" ])Which corresponds to sourceing each of the macro files in VMD before defining the  selection with the custom MYRES name.