Read and write files
PDBTools can read and write PDB
and mmCIF
files. The relevant functions are:
PDBTools.read_pdb
— Functionread_pdb(pdbfile::String, selection::String)
read_pdb(pdbfile::String; only::Function = all)
read_pdb(pdbdata::IOBuffer, selection::String)
read_pdb(pdbdata::IOBuffer; only::Function = all)
Reads a PDB file and stores the data in a vector of type Atom
.
If a selection is provided, only the atoms matching the selection will be read. For example, resname ALA
will select all the atoms in the residue ALA.
If the only
function keyword is provided, only the atoms for which only(atom)
is true will be read.
Examples
julia> protein = read_pdb("../test/structure.pdb")
Array{Atoms,1} with 62026 atoms with fields:
index name resname chain resnum residue x y z beta occup model segname index_pdb
1 N ALA A 1 1 -9.229 -14.861 -5.481 0.00 1.00 1 PROT 1
2 HT1 ALA A 1 1 -10.048 -15.427 -5.569 0.00 0.00 1 PROT 2
⋮
62025 H1 TIP3 C 9339 19638 13.218 -3.647 -34.453 0.00 1.00 1 WAT2 62025
62026 H2 TIP3 C 9339 19638 12.618 -4.977 -34.303 0.00 1.00 1 WAT2 62026
julia> ALA = read_pdb("../test/structure.pdb","resname ALA")
Array{Atoms,1} with 72 atoms with fields:
index name resname chain resnum residue x y z beta occup model segname index_pdb
1 N ALA A 1 1 -9.229 -14.861 -5.481 0.00 1.00 1 PROT 1
2 HT1 ALA A 1 1 -10.048 -15.427 -5.569 0.00 0.00 1 PROT 2
⋮
1339 C ALA A 95 95 14.815 -3.057 -5.633 0.00 1.00 1 PROT 1339
1340 O ALA A 95 95 14.862 -2.204 -6.518 0.00 1.00 1 PROT 1340
julia> ALA = read_pdb("../test/structure.pdb", only = atom -> atom.resname == "ALA")
Array{Atoms,1} with 72 atoms with fields:
index name resname chain resnum residue x y z beta occup model segname index_pdb
1 N ALA A 1 1 -9.229 -14.861 -5.481 0.00 1.00 1 PROT 1
2 HT1 ALA A 1 1 -10.048 -15.427 -5.569 0.00 0.00 1 PROT 2
⋮
1339 C ALA A 95 95 14.815 -3.057 -5.633 0.00 1.00 1 PROT 1339
1340 O ALA A 95 95 14.862 -2.204 -6.518 0.00 1.00 1 PROT 1340
PDBTools.read_mmcif
— Functionread_mmcif(mmCIF_file::String, selection::String)
read_mmcif(mmCIF_file::String; only::Function = all)
read_mmcif(mmCIF_data::IOBuffer, selection::String)
read_mmcif(mmCIF_data::IOBuffer; only::Function = all)
Reads a mmCIF file and stores the data in a vector of type Atom
.
If a selection is provided, only the atoms matching the selection will be read. For example, resname ALA
will select all the atoms in the residue ALA.
If the only
function keyword is provided, only the atoms for which only(atom)
is true will be returned.
Examples
julia> ats = read_mmcif(PDBTools.SMALLCIF)
Array{Atoms,1} with 7 atoms with fields:
index name resname chain resnum residue x y z occup beta model segname index_pdb
1 N VAL A 1 1 6.204 16.869 4.854 1.00 49.05 1 1
2 CA VAL A 1 1 6.913 17.759 4.607 1.00 43.14 1 2
3 C VAL A 1 1 8.504 17.378 4.797 1.00 24.80 1 3
5 CB VAL A 1 1 6.369 19.044 5.810 1.00 72.12 1 5
6 CG1 VAL A 1 1 7.009 20.127 5.418 1.00 61.79 1 6
7 CG2 VAL A 1 1 5.246 18.533 5.681 1.00 80.12 1 7
julia> ats = read_mmcif(PDBTools.SMALLCIF, "index < 3")
Array{Atoms,1} with 2 atoms with fields:
index name resname chain resnum residue x y z occup beta model segname index_pdb
1 N VAL A 1 1 6.204 16.869 4.854 1.00 49.05 1 1
2 CA VAL A 1 1 6.913 17.759 4.607 1.00 43.14 1 2
julia> ats = read_mmcif(PDBTools.SMALLCIF; only = at -> name(at) == "CA")
Array{Atoms,1} with 1 atoms with fields:
index name resname chain resnum residue x y z occup beta model segname index_pdb
2 CA VAL A 1 1 6.913 17.759 4.607 1.00 43.14 1 2
In the following examples, the read_pdb
function will be illustrated. The usage is similar to that of read_mmcif
, to read mmCIF (PDBx)
files.
Read a PDB file
To read a PDB file and return a vector of atoms of type Atom
, do:
atoms = read_pdb("file.pdb")
Atom
is the structure of data containing the atom index, name, residue, coordinates, etc. For example, after reading a file (as shown bellow), a list of atoms with the following structure will be generated:
julia> printatom(atoms[1])
index name resname chain resnum residue x y z beta occup model segname index_pdb
1 N ALA A 1 1 -9.229 -14.861 -5.481 0.00 1.00 1 PROT 1
The data in the Atom
structure is organized as indicated in the following documentation:
PDBTools.Atom
— TypeAtom::DataType
Structure that contains the atom properties. It is mutable, so its fields can be modified.
Fields:
mutable struct Atom{CustomType}
index::Int32 # The sequential index of the atoms in the file
index_pdb::Int32 # The index as written in the PDB file (might be anything)
name::String7 # Atom name
resname::String7 # Residue name
chain::String3 # Chain identifier
resnum::Int32 # Number of residue as written in PDB file
residue::Int32 # Sequential residue (molecule) number in file
x::Float32 # x coordinate
y::Float32 # y coordinate
z::Float32 # z coordinate
beta::Float32 # temperature factor
occup::Float32 # occupancy
model::Int32 # model number
segname::String7 # Segment name (cols 73:76)
pdb_element::String3 # Element symbol string (cols 77:78)
charge::Float32 # Charge (cols: 79:80)
custom::CustomType # Custom fields
end
Example
julia> using PDBTools
julia> atoms = read_pdb(PDBTools.SMALLPDB)
Array{Atoms,1} with 35 atoms with fields:
index name resname chain resnum residue x y z occup beta model segname index_pdb
1 N ALA A 1 1 -9.229 -14.861 -5.481 0.00 0.00 1 PROT 1
2 1HT1 ALA A 1 1 -10.048 -15.427 -5.569 0.00 0.00 1 PROT 2
3 HT2 ALA A 1 1 -9.488 -13.913 -5.295 0.00 0.00 1 PROT 3
⋮
33 OD2 ASP A 3 3 -6.974 -11.289 -9.300 1.00 0.00 1 PROT 33
34 C ASP A 3 3 -2.626 -10.480 -7.749 1.00 0.00 1 PROT 34
35 O ASP A 3 3 -1.940 -10.014 -8.658 1.00 0.00 1 PROT 35
julia> resname(atoms[1])
"ALA"
julia> chain(atoms[1])
"A"
julia> element(atoms[1])
"N"
julia> mass(atoms[1])
14.0067
julia> position(atoms[1])
3-element StaticArraysCore.SVector{3, Float32} with indices SOneTo(3):
-9.229
-14.861
-5.481
The pdb_element
and charge
fields, which are frequently left empty in PDB files, are not printed. The direct access to the fields is considered part of the interface.
Custom fields can be set on Atom
construction with the custom
keyword argument. The Atom structure will then be parameterized with the type of custom
.
Example
julia> using PDBTools
julia> atom = Atom(index = 0; custom=Dict(:c => "c", :index => 1));
julia> typeof(atom)
Atom{Dict{Symbol, Any}}
julia> atom.custom
Dict{Symbol, Any} with 2 entries:
:index => 1
:c => "c"
julia> atom.custom[:c]
"c"
For all these reading and writing functions, a final argument can be provided to read or write a subset of the atoms, following the selection syntax described in the Selection section. For example:
protein = read_pdb("file.pdb","protein")
or
arginines = read_pdb("file.pdb","resname ARG")
The only difference is that, if using Julia anonymous functions, the keyword is only
:
arginines = read_pdb("file.pdb", only = atom -> atom.resname == "ARG")
The same is valid for the write
function, below.
Retrieve from Protein Data Bank
Use the wget
function to retrieve the atom data directly from the PDB database, optionally filtering the atoms with a selection:
julia> atoms = wget("1LBD","name CA")
index name resname chain resnum residue x y z beta occup model segname index_pdb
2 CA SER A 225 1 46.080 83.165 70.327 68.73 1.00 1 - 2
8 CA ALA A 226 2 43.020 80.825 70.455 63.69 1.00 1 - 8
13 CA ASN A 227 3 41.052 82.178 67.504 53.45 1.00 1 - 13
⋮
1847 CA GLN A 460 236 -22.650 79.082 50.023 71.46 1.00 1 - 1847
1856 CA MET A 461 237 -25.561 77.191 51.710 78.41 1.00 1 - 1856
1864 CA THR A 462 238 -26.915 73.645 51.198 82.96 1.00 1 - 1864
PDBTools.wget
— Functionwget(PDBid; selection; format="mmCIF")
Retrieves a PDB file from the protein data bank. Selections may be applied.
The optional format argument can be either "mmCIF" or "PDB". The default is "mmCIF". To download the data of large structures, it is recommended to use the "mmCIF" format.
Example
julia> protein = wget("1LBD","chain A")
Array{Atoms,1} with 1870 atoms with fields:
index name resname chain resnum residue x y z beta occup model segname index_pdb
1 N SER A 225 1 45.228 84.358 70.638 67.05 1.00 1 - 1
2 CA SER A 225 1 46.080 83.165 70.327 68.73 1.00 1 - 2
3 C SER A 225 1 45.257 81.872 70.236 67.90 1.00 1 - 3
⋮
1868 OG1 THR A 462 238 -27.462 74.325 48.885 79.98 1.00 1 - 1868
1869 CG2 THR A 462 238 -27.063 71.965 49.222 78.62 1.00 1 - 1869
1870 OXT THR A 462 238 -25.379 71.816 51.613 84.35 1.00 1 - 1870
Edit a PDB file
The Atom
structure is mutable, meaning that the fields can be edited. For example:
julia> atoms = read_pdb("file.pdb")
Array{PDBTools.Atom,1} with 62026 atoms with fields:
index name resname chain resnum residue x y z beta occup model segname index_pdb
1 N ALA A 1 1 -9.229 -14.861 -5.481 0.00 1.00 1 PROT 1
2 HT1 ALA A 1 1 -10.048 -15.427 -5.569 0.00 0.00 1 PROT 2
3 HT2 ALA A 1 1 -9.488 -13.913 -5.295 0.00 0.00 1 PROT 3
julia> atoms[1].segname = "ABCD"
"ABCD"
julia> printatom(atoms[1])
index name resname chain resnum residue x y z beta occup model segname index_pdb
1 N ALA A 1 1 -9.229 -14.861 -5.481 0.00 1.00 1 ABCD 1
Additionally, With the edit!
function, you can directly edit or view the data in a vector of Atoms
in your preferred text editor.
julia> edit!(atoms)
This will open a text editor. Here, we modified the data in the resname
field of the first atom to ABC
. Saving and closing the file will update the atoms
array:
julia> printatom(atoms[1])
index name resname chain resnum residue x y z beta occup model segname index_pdb
1 N ABC A 1 1 -9.229 -14.861 -5.481 0.00 1.00 1 PROT 1
PDBTools.edit!
— Functionedit!(atoms::AbstractVector{<:Atom})
Opens a temporary PDB file in which the fields of the vector of atoms can be edited.
Write a PDB file
To write a PDB file use the write_pdb
function, as:
write_pdb("file.pdb", atoms)
where atoms
contain a list of atoms with the Atom
structures.
PDBTools.write_pdb
— Functionwrite_pdb(filename::String, atoms::AbstractVector{<:Atom}, selection; header=:auto, footer=:auto)
Write a PDB file with the atoms in atoms
to filename
. The selection
argument is a string that can be used to select a subset of the atoms in atoms
. For example, write_pdb("test.pdb", atoms, "name CA")
.
The header
and footer
arguments can be used to add a header and footer to the PDB file. If header
is :auto
, then a header will be added with the number of atoms in atoms
. If footer
is :auto
, then a footer will be added with the "END" keyword. Either can be set to nothing
if no header or footer is desired.
PDBTools.write_mmcif
— Functionwrite_mmcif(filename, atoms::AbstractVector{<:Atom}, [selection])
Write a mmCIF file with the atoms in atoms
to filename
. The optional selection
argument is a string that can be used to select a subset of the atoms in atoms
. For example, write_mmcif(atoms, "test.cif", "name CA")
.
Read from string buffer
In some cases a PDB file data may be available as a string and not a regular file. For example, when reading the output of a zipped file. In these cases, it is possible to obtain the array of atoms by reading directly the string buffer with, for example:
julia> pdbdata = read(pdb_file, String); # returns a string with the PDB data, to exemplify
julia> atoms = read_pdb(IOBuffer(pdbdata), "protein and name CA")
Array{Atoms,1} with 104 atoms with fields:
index name resname chain resnum residue x y z occup beta model segname index_pdb
5 CA ALA A 1 1 -8.483 -14.912 -6.726 1.00 0.00 1 PROT 5
15 CA CYS A 2 2 -5.113 -13.737 -5.466 1.00 0.00 1 PROT 15
26 CA ASP A 3 3 -3.903 -11.262 -8.062 1.00 0.00 1 PROT 26
⋮
1425 CA GLU A 102 102 4.414 -4.302 -7.734 1.00 0.00 1 PROT 1425
1440 CA CYS A 103 103 4.134 -7.811 -6.344 1.00 0.00 1 PROT 1440
1454 CA THR A 104 104 3.244 -10.715 -8.603 1.00 0.00 1 PROT 1454