gabb_jackson_sternberg.pdf

(784 KB) Pobierz
J. Mol. Biol.
(1997)
272,
106±120
Modelling Protein Docking using Shape
Complementarity, Electrostatics and
Biochemical Information
Henry A. Gabb, Richard M. Jackson and Michael J. E. Sternberg*
Biomolecular Modelling
Laboratory, Imperial Cancer
Research Fund, Lincoln's Inn
Fields, P.O. Box 123, London
WC2A 3PX, UK
A protein docking study was performed for two classes of biomolecular
complexes: six enzyme/inhibitor and four antibody/antigen. Biomolecu-
lar complexes for which crystal structures of both the complexed and
uncomplexed proteins are available were used for eight of the ten test
systems. Our docking experiments consist of a global search of transla-
tional and rotational space followed by re®nement of the best predictions.
Potential complexes are scored on the basis of shape complementarity
and favourable electrostatic interactions using Fourier correlation theory.
Since proteins undergo conformational changes upon binding, the scoring
function must be suf®ciently soft to dock unbound structures success-
fully. Some degree of surface overlap is tolerated to account for side-
chain ¯exibility. Similarly for electrostatics, the interaction of the dis-
persed point charges of one protein with the Coulombic ®eld of the other
is measured rather than precise atomic interactions. We tested our dock-
ing protocol using the native rather than the complexed forms of the pro-
teins to address the more scienti®cally interesting problem of predictive
docking. In all but one of our test cases, correctly docked geometries
Ê
(interface C
a
RMS deviation
42
A from the experimental structure) are
found during a global search of translational and rotational space in a
list that was always less than 250 complexes and often less than 30. Vary-
ing degrees of biochemical information are still necessary to remove most
of the incorrectly docked complexes.
#
1997 Academic Press Limited
*Corresponding author
Keywords:
molecular recognition; protein ±protein docking; protein ±
protein complex; predictive docking algorithm; fast Fourier transform
Introduction
Knowledge of the three-dimensional (3D) struc-
ture of protein ±protein complexes provides a valu-
able understanding of the function of molecular
systems. The rate of protein structure determi-
nation is increasing rapidly. Today there are some
5,000 entries in the Brookhaven Protein Data Bank
(PDB) (Bernstein
et al.,
1977). However, determi-
nation of the structure of protein ±protein com-
plexes still remains a dif®cult problem with only
about 100 deposited co-ordinates. Thus, computer
Abbreviations used: PDB, Brookhaven Protein Data
Bank; 3D, three-dimensional; FFT, fast Fourier
transform; DFT, discrete Fourier transform; IFT, inverse
Fourier transform; CDR, complementarity determining
region; RMS, root-mean square; CPU, central processing
unit.
0022±2836/97/360106±15 $25.00/0/mb971203
algorithms capable of predicting protein ±protein
interactions are becoming increasingly important.
Such algorithms can also provide insight into the
process of protein± protein recognition.
The general strategy for simulating protein ±pro-
tein docking involves matching shape complemen-
tarity (for recent reviews, see Janin, 1995a; Shoichet
& Kuntz, 1996). Some approaches focus speci®cally
on matching surfaces (e.g. see Jiang & Kim, 1991;
Katchalski-Katzir
et al.,
1992; Walls & Sternberg,
1992; Helmer-Citterich & Tramontano, 1994).
Others enhance the search for geometric compli-
mentarity by matching the position of surface
spheres and surface normals (e.g. see Kuntz
et al.,
1982; Shoichet & Kuntz, 1991; Norel
et al.,
1995).
Shape complementarity is measured by a variety
of scoring functions, some of which aim to model
the hydrophobic effect during association from the
change in solvent-accessible surface area of mol-
#
1997 Academic Press Limited
Protein Docking
107
docking, starting with unbound complexes, is that
the algorithm must be suf®ciently soft to manage
conformational changes, yet speci®c enough to
identify the correct solution. At present, the
alternative approaches of an initial non-rigid body
search is computationally intractable. A variation
of the Fourier correlation algorithm (Katchalski-
Katzir
et al.,
1992) was chosen as it incorporates
soft docking, is computationally fast and math-
ematically elegant. We introduce a soft treatment
of electrostatic interactions into the Fourier corre-
lation approach. In addition, we evaluate the selec-
tivity provided by speci®c biological constraints of
the type likely to be available in genuine docking
applications.
ecular surface area (e.g. see Cher®ls
et al.,
1991).
Several algorithms employ a simpli®ed scheme to
estimate electrostatic interactions (e.g. see Jian &
Kim, 1991; Walls & Sternberg, 1992). In general the
algorithms yield a limited set of favourable com-
plexes, one or a few of which are close (typically 1
Ê
to 3 A RMS) to the native structure. Recognizing
this, several groups have additionally focused on
screening the correct solution from the false posi-
tives by modeling the hydrophobic effect, electro-
static interactions and desolvation (Gilson &
Honig, 1988; Vakser & A¯alo, 1994; Jackson &
Sternberg, 1995; Weng
et al.,
1996). Most of the
above studies have focused on rigid body docking.
Recently, however, Monte Carlo simulations have
been used to re®ne ¯exible side-chain positions
after rigid body docking (Totrov & Abagyan,
1994).
In the development of these algorithms many
workers demonstrate that the bound complex can
be generated starting from the component bound
molecules. Simulations on the pertinent biological
problem of docking starting from unbound com-
plexes have also been performed on a limited num-
ber of test cases. However, these studies tended to
be applications of algorithms developed on bound
complexes. As a consequence, selective pruning of
some interface side-chains was required to demon-
strate that the algorithms could be applied to un-
bound complexes. The next step, therefore, is to
establish an automated procedure that is successful
on a series of molecules without manual interven-
tion guided by knowledge of the answer.
An important step in the identi®cation of reliable
docking approaches was the blind trial organized
by James and co-workers (Strynadka
et al.,
1996).
Five groups submitted entries to predict the dock-
ing of
b-lactamase
and its inhibitor starting with
the unbound structures prior to knowledge of the
complex. All entrants were able to identify a sol-
ution close to the correct answer (i.e. better than
Ê
2.7 A RMS for superposition of the predicted on
the X-ray complex). The closest agreement was by
Fourier correlation algorithm (Katchalski-Katzir
Ê
et al.,
1992), which found a 1.1 A solution. This ap-
proach performs a complete translational and ro-
tational search in Fourier space and selects binding
geometries with high surface correlation scores.
However, general conclusions cannot be drawn
from one trial, particularly as the
b-lactamase/in-
hibitor complex is highly amenable to docking
methods that rely on shape complementarity alone
as the surface area buried in the complex is par-
ticularly large (Janin, 1995a). It is not clear whether
the Fourier approach or the other strategies invol-
ving measures of surface complementarity will
reliably predict unbound systems without con-
sideration of other properties, particularly electro-
statics.
This paper reports extensive trials of docking un-
bound protein ±protein molecules to obtain struc-
tural models for their complexes. Our approach
recognises that a major problem in rigid body
Algorithm
Measuring shape complementarity by
Fourier correlation
The docking protocol used in this study follows
closely the shape recognition algorithm of
Katchalski-Katzir
et al.
(1992) (shown schematically
in Figure 1). The geometric surface recognition
method takes advantage of the fast Fourier trans-
form (FFT) and Fourier correlation theory to scan
rapidly the translational space of two rigidly rotat-
ing molecules. The algorithm begins by discretiz-
ing two-molecules, A and B, in 3D grids with
every node (l,m,n
{1 . . .
N})
assigned a value:
V
surfe of moleule
`
1
X
f
A
lYmYn
r
X
ore of moleule
X
0
X
open spe
and
f
B
lYmYn
&
1
0
X
X
inside moleule
open spe
By convention,
A
and
B
denote the larger and
smaller proteins, respectively. Any grid node
Ê
within 1.8 A of an atom is considered to be inside
a protein. Grid nodes at the surface of molecule A
Ê
are scored differently. A 1.5 A surface layer is
used. We set
r
À
15 in all of our docking exper-
iments. Contrary to the ®nding of Katchalski-
Katzir
et al.(1992)
we ®nd that the choice of
r
does
affect the performance of the algorithm. Speci®-
cally, the degree of surface overlap tolerated
during docking is directly related to the value of
r.
In all our calculations, 128
Â
128
Â
128 grids are
used. Grid spacing is determined by the size of the
respective molecules. The approximate radius of a
protein is the distance between the geometric
centre and the most distal atom. The grid size is
Ê
the sum of the protein diameters plus 1 A. Grid
Ê
Ê
spacings ranged from 0.74 A/node to 0.84 A/node
for the enzyme/inhibitor complexes and from
Ê
Ê
0.91 A/node to 0.94 A/node for the antibody/anti-
gen complexes.
108
Protein Docking
Figure 1.
The Fourier correlation
docking algorithm used in this
study, based on the method of
Katchalski-Katzir
et al.
(1992). Mol-
ecules A and B are discretized dif-
ferently. Molecule A has a negative
core and a positive surface layer
(the dark band) whereas no surface
core distinction is made for mol-
ecule B. It is only necessary to dis-
cretize and Fourier transform
molecule A one time. Electrostatic
complementarity is calculated con-
currently with shape complemen-
tarity. Similarly, the transform of
the electric ®eld of molecule A
need only be calculated once. The
cross-section of a sample 3D Four-
ier correlation function illustrates a
search of translational space. The
geometric centres of the two mol-
ecules are superposed at the origin.
Molecule A is ®xed in the centre of
the grid. As molecule B moves
through the grid, a ``signal''
describing shape complementarity
emerges. A zero correlation score
indicates that the proteins are not
in contact while negative scores
(the empty region in the centre) in-
dicate signi®cant surface pen-
etration. The highest peak indicates
the translation vector giving the
best surface complementarity.
The correlation function of
f
A
and
f
B
is:
f
C
aYbYg
N
N
N
l1 m1 n1
f
A
lYmYn
Â
f
B
laYmbYng
where
N
3
is the number of grid points and
a, b, g
is the translation vector of molecule B relative to A.
A high correlation score denotes a complex with
good surface complementarity. If molecule B sig-
ni®cantly overlaps molecule A the correlation is
negative. Zero correlation most likely means that
the molecules are not in contact.
Calculating
f
C
lYmYn
as shown above is very inef®-
cient, requiring
N
3
multiplications and additions
for every
N
3
shifts
a, b, g.
Since
f
A
and
f
B
are dis-
crete functions i.e. (representing the discretized
molecules) it is possible to calculate
f
C
much more
rapidly by FFT. The FFT requires of the order of
N
3
ln (N
3
) calculations, which is signi®cantly less
than
N
6
. (See Press
et al.
(1986) and Bracewell
(1990) for thorough reviews of fast transforms and
Fourier correlation.) The discrete functions
f
A
and
f
B
are transformed (denoted DFT for discrete Four-
ier transform) and the complex conjugate
F*
is mul-
A
tiplied times
F
B
:
F
A
hp
f
A
F
B
hp
f
B
F
C
F
Ã
F
B
A
The correlation function describing the shape
complementarity of molecules A and B along each
Protein Docking
109
translational vector is the inverse Fourier transform
(IFT) of the transform product:
f
C
spF
C
A cross-section through a sample correlation
function is shown in Figure 1. The highest peak de-
notes the translation vector producing the best
shape complementarity for the current orientation.
After each translational scan, molecule B is ro-
tated about one of its Euler angles until rotational
space has been completely scanned. For an angular
deviation of
a
15
, this yields 360
Â
360
Â
180/
a
3
6912 orientations. However, many of these
orientations are degenerate and must be removed
using the following relationship (Lattman, 1972):
a
os
À1
trR
1
Â
R
T
À
1
2
2
where
R
1
is the rotation matrix of the ®rst orien-
tation,
R
T
is the transpose of the rotation matrix of
2
the second orientation, and tr is the matrix trace. If
a
4
1
then the two orientations are degenerate.
Removing degeneracies in this fashion yields 6385
unique orientations. A ®ner angular rotation is
computationally demanding for extensive trials.
For example, there are 22,105 non-degenerate
orientations for
a
10
.
Measuring electrostatic complementarity by
Fourier correlation
Shape complementarity is not the only factor in-
volved in molecular binding. Electrostatic attrac-
tion, particularly the speci®c charge ±charge
interactions in the binding interface, also plays an
important role. For speed and consistency, electro-
static complementarity is calculated by Fourier cor-
relation using a simple Coulombic model. Since
charged amino acid side-chains are usually on the
protein surface, they are often involved in binding
and tend to be highly ¯exible (Figure 2). Therefore,
calculating individual point charge interactions
when attempting to dock the uncomplexed struc-
tures is not feasible and can produce misleading re-
sults. So rather than try to measure speci®c
charge ± charge interactions, we measure the point
charges of one protein interacting with the electric
®eld of the other as grid points. In this way, point
charges are dispersed to simulate side-chain move-
ment. The electrostatic calculations proceed in a
manner very similar to those of shape complemen-
tarity. Charges are assigned to the atoms of protein
A (Table 1) and the molecule is placed in a grid.
The electric ®eld at each grid node (excluding
those of the protein core) is calculated:
q
j
f
i
er
ij
r
ij
j
where
f
i
, is the ®eld strength at node
i
(position
l,m,n), q
j
is the charge on atom
j,r
ij
is the distance
Ê
between
i
and
j
(a minimum cutoff distance of 2 A
Figure 2.
Examples of docking to illustrate induced
binding in the interface. (a) Selected binding site resi-
dues of ovomucoid when free (2ovo, light grey) and
when bound to
a-chymotrypsin
(1cho, dark grey). (b)
Selected binding site residues of BPTI when free (4pti,
light grey) and when bound to trypsin (2ptc, dark grey).
Scoring functions used for predictive docking must be
suf®ciently ``soft'' to account for conformational changes
of this magnitude. The whole complexes for CHO and
PTC are shown in Figures 4(b) and (d) respectively.
is imposed to avoid arti®cially large values of
f),
and
e(r
ij
) is a distance-dependent dielectric func-
tion. In this case, a pseudo-sigmoidal function,
based on the sigmoidal function of Hingerty
et al.
(1985), is used:
V
#
4
X
r
ij
6
e
b
b
`
#
#
er
ij
38r
ij
À
224
X
6
e
`
r
ij
`
8
e
b
b
X
#
80
X
r
ij
!
8
e
Several distance-dependent dielectric functions
were tested. This one was chosen because it effec-
tively damps long-range electrostatic effects that
are not relevant to the binding interface. In fact, di-
electric functions that do not damp long-range ef-
fects give inconsistent results, sometimes showing
poor electrostatics for experimentally determined
complexes. The treatment of protein B is much sim-
pler. Charges are assigned to its atoms and then
110
Table 1.
Charges used in Coulombic electrostatic ®elds
Peptide
backbone
Terminal-N
Terminal-O
C
a
C
O
N
Charge
1.0
À1.0
0.0
0.0
À0.5
0.5
Side-chain
atoms
Arg-N
Z
Glu-O
e
Asp-O
d
Lys-N
z
Pro-N
Charge
0.5
À0.5
À0.5
1.0
À0.1
Protein Docking
discretized in a grid (q
l,m,n
) by trilinear weighting
(Rogers & Sternberg, 1984; Edmonds
et al.,
1984).
Calculations of the electrostatic interactions pro-
ceeds as outlined in Figure 1 and as described for
surface correlation except that the discrete func-
tions are:
&
f
lYmYn
X entire grid exluding ore
f
A
lYmYn
0
X ore of moleule
f
B
lYmYn
q
lYmYn
and the correlation function becomes:
ele
f
aYbYg
N
N
N
l1 m1 n1
rectly docked complex. For example, even a
limited rotational scan near the correct geometry
produces a broad range of correlation scores
(Figure 3). In a complete rotational search it is
possible to ®nd incorrectly docked complexes that
score higher than the actual crystal structure. This
does not pose a problem because our aim during
the global search is to place at least one near-cor-
rect prediction in the ®nal output; not necessarily
at the highest scoring position. Correctly docked
complexes can be screened later using experimen-
tal constraints and advanced re®nement tech-
niques.
The additional constraint of removing predic-
tions with unfavourable electrostatic interactions
markedly improves the ranking of correctly docked
structures in the global search (Table 2). With elec-
trostatics, a good solution is found in the top 4000
structures in nine out of ten systems. In general, in-
clusion of electrostatics reduces the number of geo-
metries to be evaluated by approximately 50%.
Global searching and the dependence
on filtering
Knowledge of the location of the binding site on
one, or both proteins drastically reduces the num-
ber of possible allowed conformations. Knowing
speci®c binding site residues reduces the search
space even further. It is possible to utilize this in-
formation in the form of distance constraints (see
Methods). Generally, information about the bind-
ing site is available from experimental data (e.g.
site-directed mutagenesis, chemical cross-linking,
phylogenetic data). In the absence of experimental
data, it is often possible to predict the correct bind-
ing site by examining potential hydrogen bonding
groups, clefts and/or charged sites on a protein
surface (Gilson & Honig, 1987; DesJarlais
et al.,
1988; Nicholls & Honig 1991; Laskowski, 1995;
Laskowski
et al.,
1996; Meyer
et al.,
1996). Serine
proteases and immunoglobulins represent systems
where the binding sites are known in advance. The
catalytic triad of the serine proteases and the com-
plementarity determining region (CDR) of immu-
noglobulins are both well characterized. We take
advantage of this information to varying degrees
in our docking experiments (Table 2). In the serine
protease/inhibitor docking attempts, ®lters are de-
®ned as: loose, any residue of the inhibitor in con-
tact with any residue of the catalytic triad (i.e. His,
Asp, Ser); medium, an inhibitor residue in contact
with both the His and the Ser of the catalytic triad;
tight, a speci®c binding site residue of the inhibitor
in contact with both the His and the Ser of the cat-
alytic triad. In the antibody/lysozyme docking at-
tempts, ®lters are de®ned as: loose, any part of the
lysozyme in contact with either the L3 or the H3
CDR; medium, lysozyme in contact with both the
L3 and H3 CDRs; tight, the medium ®lter together
with one residue of the epitope in contact with any
part of the CDR. The L3/H3 CDR ®lters are based
on the study of MacCullum
et al.
(1996), which
f
A
lYmYn
Â
q
B
laYmbYng
So both grids are Fourier transformed and corre-
lated such that the static charges of molecule B
move through the electric ®eld of molecule A. The
electrostatic correlation score is used as a binary ®l-
ter. Speci®cally, false positive geometries that give
high shape correlation scores can be excluded if
their electrostatic correlation is unfavourable (i.e.
positive).
Results
Predictive docking of native protein structures
The docking protocol, shown schematically in
Figure 1, was applied to the unbound coordinates
of six enzyme/inhibitor and two antibody/antigen
systems. In addition, two bound antibodies were
docked to unbound antigens. These ten systems
are referred to as ``unbound'' docking, for simpli-
city. Docked solutions are ranked by surface corre-
lation score. A correct solution is de®ned as any
geometry with an interface C
a
RMS less than or
Ê
equal to 2.5 A from the crystal structure (see
Methods).
The results of the global search before ®ltering
and without consideration of electrostatics show
that geometric complementarity alone (as
measured by Katchalski-Katzir
et al.,
1992) cannot
reliably dock unbound complexes (Tables 2 and 3).
For three out of the ten test systems no correctly
docked complex is ranked in the top 4000 predic-
tions. For the other seven systems, the highest
ranked correct answer was in a list of hundreds of
alternatives. This shows that a high surface corre-
lation score does not necessarily indicate a cor-
Zgłoś jeśli naruszono regulamin