gabb_jackson_sternberg.pdf

(784 KB) Pobierz

J. Mol. Biol.

(1997)

272,

106±120

Modelling Protein Docking using Shape

Complementarity, Electrostatics and

Biochemical Information

Henry A. Gabb, Richard M. Jackson and Michael J. E. Sternberg*

Biomolecular Modelling

Laboratory, Imperial Cancer

Research Fund, Lincoln's Inn

Fields, P.O. Box 123, London

WC2A 3PX, UK

A protein docking study was performed for two classes of biomolecular

complexes: six enzyme/inhibitor and four antibody/antigen. Biomolecu-

lar complexes for which crystal structures of both the complexed and

uncomplexed proteins are available were used for eight of the ten test

systems. Our docking experiments consist of a global search of transla-

tional and rotational space followed by re®nement of the best predictions.

Potential complexes are scored on the basis of shape complementarity

and favourable electrostatic interactions using Fourier correlation theory.

Since proteins undergo conformational changes upon binding, the scoring

function must be suf®ciently soft to dock unbound structures success-

fully. Some degree of surface overlap is tolerated to account for side-

chain ¯exibility. Similarly for electrostatics, the interaction of the dis-

persed point charges of one protein with the Coulombic ®eld of the other

is measured rather than precise atomic interactions. We tested our dock-

ing protocol using the native rather than the complexed forms of the pro-

teins to address the more scienti®cally interesting problem of predictive

docking. In all but one of our test cases, correctly docked geometries

(interface C

RMS deviation

A from the experimental structure) are

found during a global search of translational and rotational space in a

list that was always less than 250 complexes and often less than 30. Vary-

ing degrees of biochemical information are still necessary to remove most

of the incorrectly docked complexes.

1997 Academic Press Limited

*Corresponding author

Keywords:

molecular recognition; protein ±protein docking; protein ±

protein complex; predictive docking algorithm; fast Fourier transform

Introduction

Knowledge of the three-dimensional (3D) struc-

ture of protein ±protein complexes provides a valu-

able understanding of the function of molecular

systems. The rate of protein structure determi-

nation is increasing rapidly. Today there are some

5,000 entries in the Brookhaven Protein Data Bank

(PDB) (Bernstein

et al.,

1977). However, determi-

nation of the structure of protein ±protein com-

plexes still remains a dif®cult problem with only

about 100 deposited co-ordinates. Thus, computer

Abbreviations used: PDB, Brookhaven Protein Data

Bank; 3D, three-dimensional; FFT, fast Fourier

transform; DFT, discrete Fourier transform; IFT, inverse

Fourier transform; CDR, complementarity determining

region; RMS, root-mean square; CPU, central processing

unit.

0022±2836/97/360106±15 $25.00/0/mb971203

algorithms capable of predicting protein ±protein

interactions are becoming increasingly important.

Such algorithms can also provide insight into the

process of protein± protein recognition.

The general strategy for simulating protein ±pro-

tein docking involves matching shape complemen-

tarity (for recent reviews, see Janin, 1995a; Shoichet

& Kuntz, 1996). Some approaches focus speci®cally

on matching surfaces (e.g. see Jiang & Kim, 1991;

Katchalski-Katzir

et al.,

1992; Walls & Sternberg,

1992; Helmer-Citterich & Tramontano, 1994).

Others enhance the search for geometric compli-

mentarity by matching the position of surface

spheres and surface normals (e.g. see Kuntz

et al.,

1982; Shoichet & Kuntz, 1991; Norel

et al.,

1995).

Shape complementarity is measured by a variety

of scoring functions, some of which aim to model

the hydrophobic effect during association from the

change in solvent-accessible surface area of mol-

1997 Academic Press Limited

Protein Docking

107

docking, starting with unbound complexes, is that

the algorithm must be suf®ciently soft to manage

conformational changes, yet speci®c enough to

identify the correct solution. At present, the

alternative approaches of an initial non-rigid body

search is computationally intractable. A variation

of the Fourier correlation algorithm (Katchalski-

Katzir

et al.,

1992) was chosen as it incorporates

soft docking, is computationally fast and math-

ematically elegant. We introduce a soft treatment

of electrostatic interactions into the Fourier corre-

lation approach. In addition, we evaluate the selec-

tivity provided by speci®c biological constraints of

the type likely to be available in genuine docking

applications.

ecular surface area (e.g. see Cher®ls

et al.,

1991).

Several algorithms employ a simpli®ed scheme to

estimate electrostatic interactions (e.g. see Jian &

Kim, 1991; Walls & Sternberg, 1992). In general the

algorithms yield a limited set of favourable com-

plexes, one or a few of which are close (typically 1

to 3 A RMS) to the native structure. Recognizing

this, several groups have additionally focused on

screening the correct solution from the false posi-

tives by modeling the hydrophobic effect, electro-

static interactions and desolvation (Gilson &

Honig, 1988; Vakser & A¯alo, 1994; Jackson &

Sternberg, 1995; Weng

et al.,

1996). Most of the

above studies have focused on rigid body docking.

Recently, however, Monte Carlo simulations have

been used to re®ne ¯exible side-chain positions

after rigid body docking (Totrov & Abagyan,

1994).

In the development of these algorithms many

workers demonstrate that the bound complex can

be generated starting from the component bound

molecules. Simulations on the pertinent biological

problem of docking starting from unbound com-

plexes have also been performed on a limited num-

ber of test cases. However, these studies tended to

be applications of algorithms developed on bound

complexes. As a consequence, selective pruning of

some interface side-chains was required to demon-

strate that the algorithms could be applied to un-

bound complexes. The next step, therefore, is to

establish an automated procedure that is successful

on a series of molecules without manual interven-

tion guided by knowledge of the answer.

An important step in the identi®cation of reliable

docking approaches was the blind trial organized

by James and co-workers (Strynadka

et al.,

1996).

Five groups submitted entries to predict the dock-

ing of

b-lactamase

and its inhibitor starting with

the unbound structures prior to knowledge of the

complex. All entrants were able to identify a sol-

ution close to the correct answer (i.e. better than

2.7 A RMS for superposition of the predicted on

the X-ray complex). The closest agreement was by

Fourier correlation algorithm (Katchalski-Katzir

et al.,

1992), which found a 1.1 A solution. This ap-

proach performs a complete translational and ro-

tational search in Fourier space and selects binding

geometries with high surface correlation scores.

However, general conclusions cannot be drawn

from one trial, particularly as the

b-lactamase/in-

hibitor complex is highly amenable to docking

methods that rely on shape complementarity alone

as the surface area buried in the complex is par-

ticularly large (Janin, 1995a). It is not clear whether

the Fourier approach or the other strategies invol-

ving measures of surface complementarity will

reliably predict unbound systems without con-

sideration of other properties, particularly electro-

statics.

This paper reports extensive trials of docking un-

bound protein ±protein molecules to obtain struc-

tural models for their complexes. Our approach

recognises that a major problem in rigid body

Algorithm

Measuring shape complementarity by

Fourier correlation

The docking protocol used in this study follows

closely the shape recognition algorithm of

Katchalski-Katzir

et al.

(1992) (shown schematically

in Figure 1). The geometric surface recognition

method takes advantage of the fast Fourier trans-

form (FFT) and Fourier correlation theory to scan

rapidly the translational space of two rigidly rotat-

ing molecules. The algorithm begins by discretiz-

ing two-molecules, A and B, in 3D grids with

every node (l,m,n

{1 . . .

N})

assigned a value:

surfe of moleule

lYmYn

ore of moleule

open spe

and

lYmYn

inside moleule

open spe

By convention,

and

denote the larger and

smaller proteins, respectively. Any grid node

within 1.8 A of an atom is considered to be inside

a protein. Grid nodes at the surface of molecule A

are scored differently. A 1.5 A surface layer is

used. We set

15 in all of our docking exper-

iments. Contrary to the ®nding of Katchalski-

Katzir

et al.(1992)

we ®nd that the choice of

does

affect the performance of the algorithm. Speci®-

cally, the degree of surface overlap tolerated

during docking is directly related to the value of

In all our calculations, 128

128

128 grids are

used. Grid spacing is determined by the size of the

respective molecules. The approximate radius of a

protein is the distance between the geometric

centre and the most distal atom. The grid size is

the sum of the protein diameters plus 1 A. Grid

spacings ranged from 0.74 A/node to 0.84 A/node

for the enzyme/inhibitor complexes and from

0.91 A/node to 0.94 A/node for the antibody/anti-

gen complexes.

108

Protein Docking

Figure 1.

The Fourier correlation

docking algorithm used in this

study, based on the method of

Katchalski-Katzir

et al.

(1992). Mol-

ecules A and B are discretized dif-

ferently. Molecule A has a negative

core and a positive surface layer

(the dark band) whereas no surface

core distinction is made for mol-

ecule B. It is only necessary to dis-

cretize and Fourier transform

molecule A one time. Electrostatic

complementarity is calculated con-

currently with shape complemen-

tarity. Similarly, the transform of

the electric ®eld of molecule A

need only be calculated once. The

cross-section of a sample 3D Four-

ier correlation function illustrates a

search of translational space. The

geometric centres of the two mol-

ecules are superposed at the origin.

Molecule A is ®xed in the centre of

the grid. As molecule B moves

through the grid, a ``signal''

describing shape complementarity

emerges. A zero correlation score

indicates that the proteins are not

in contact while negative scores

(the empty region in the centre) in-

dicate signi®cant surface pen-

etration. The highest peak indicates

the translation vector giving the

best surface complementarity.

The correlation function of

and

is:

aYbYg

l1 m1 n1

lYmYn

laYmbYng

where

is the number of grid points and

a, b, g

is the translation vector of molecule B relative to A.

A high correlation score denotes a complex with

good surface complementarity. If molecule B sig-

ni®cantly overlaps molecule A the correlation is

negative. Zero correlation most likely means that

the molecules are not in contact.

Calculating

lYmYn

as shown above is very inef®-

cient, requiring

multiplications and additions

for every

shifts

a, b, g.

Since

and

are dis-

crete functions i.e. (representing the discretized

molecules) it is possible to calculate

much more

rapidly by FFT. The FFT requires of the order of

ln (N

) calculations, which is signi®cantly less

than

. (See Press

et al.

(1986) and Bracewell

(1990) for thorough reviews of fast transforms and

Fourier correlation.) The discrete functions

and

are transformed (denoted DFT for discrete Four-

ier transform) and the complex conjugate

is mul-

tiplied times

The correlation function describing the shape

complementarity of molecules A and B along each

Protein Docking

109

translational vector is the inverse Fourier transform

(IFT) of the transform product:

spF

A cross-section through a sample correlation

function is shown in Figure 1. The highest peak de-

notes the translation vector producing the best

shape complementarity for the current orientation.

After each translational scan, molecule B is ro-

tated about one of its Euler angles until rotational

space has been completely scanned. For an angular

deviation of

, this yields 360

360

180/

6912 orientations. However, many of these

orientations are degenerate and must be removed

using the following relationship (Lattman, 1972):

À1

trR

where

is the rotation matrix of the ®rst orien-

tation,

is the transpose of the rotation matrix of

the second orientation, and tr is the matrix trace. If

then the two orientations are degenerate.

Removing degeneracies in this fashion yields 6385

unique orientations. A ®ner angular rotation is

computationally demanding for extensive trials.

For example, there are 22,105 non-degenerate

orientations for

Measuring electrostatic complementarity by

Fourier correlation

Shape complementarity is not the only factor in-

volved in molecular binding. Electrostatic attrac-

tion, particularly the speci®c charge ±charge

interactions in the binding interface, also plays an

important role. For speed and consistency, electro-

static complementarity is calculated by Fourier cor-

relation using a simple Coulombic model. Since

charged amino acid side-chains are usually on the

protein surface, they are often involved in binding

and tend to be highly ¯exible (Figure 2). Therefore,

calculating individual point charge interactions

when attempting to dock the uncomplexed struc-

tures is not feasible and can produce misleading re-

sults. So rather than try to measure speci®c

charge ± charge interactions, we measure the point

charges of one protein interacting with the electric

®eld of the other as grid points. In this way, point

charges are dispersed to simulate side-chain move-

ment. The electrostatic calculations proceed in a

manner very similar to those of shape complemen-

tarity. Charges are assigned to the atoms of protein

A (Table 1) and the molecule is placed in a grid.

The electric ®eld at each grid node (excluding

those of the protein core) is calculated:

where

, is the ®eld strength at node

(position

l,m,n), q

is the charge on atom

j,r

is the distance

between

and

(a minimum cutoff distance of 2 A

Figure 2.

Examples of docking to illustrate induced

binding in the interface. (a) Selected binding site resi-

dues of ovomucoid when free (2ovo, light grey) and

when bound to

a-chymotrypsin

(1cho, dark grey). (b)

Selected binding site residues of BPTI when free (4pti,

light grey) and when bound to trypsin (2ptc, dark grey).

Scoring functions used for predictive docking must be

suf®ciently ``soft'' to account for conformational changes

of this magnitude. The whole complexes for CHO and

PTC are shown in Figures 4(b) and (d) respectively.

is imposed to avoid arti®cially large values of

f),

and

e(r

) is a distance-dependent dielectric func-

tion. In this case, a pseudo-sigmoidal function,

based on the sigmoidal function of Hingerty

et al.

(1985), is used:

38r

224

Several distance-dependent dielectric functions

were tested. This one was chosen because it effec-

tively damps long-range electrostatic effects that

are not relevant to the binding interface. In fact, di-

electric functions that do not damp long-range ef-

fects give inconsistent results, sometimes showing

poor electrostatics for experimentally determined

complexes. The treatment of protein B is much sim-

pler. Charges are assigned to its atoms and then

110

Table 1.

Charges used in Coulombic electrostatic ®elds

Peptide

backbone

Terminal-N

Terminal-O

Charge

1.0

À1.0

0.0

À0.5

0.5

Side-chain

atoms

Arg-N

Glu-O

Asp-O

Lys-N

Pro-N

Charge

0.5

À0.5

1.0

À0.1

Protein Docking

discretized in a grid (q

l,m,n

) by trilinear weighting

(Rogers & Sternberg, 1984; Edmonds

et al.,

1984).

Calculations of the electrostatic interactions pro-

ceeds as outlined in Figure 1 and as described for

surface correlation except that the discrete func-

tions are:

lYmYn

X entire grid exluding ore

lYmYn

X ore of moleule

lYmYn

and the correlation function becomes:

ele

aYbYg

l1 m1 n1

rectly docked complex. For example, even a

limited rotational scan near the correct geometry

produces a broad range of correlation scores

(Figure 3). In a complete rotational search it is

possible to ®nd incorrectly docked complexes that

score higher than the actual crystal structure. This

does not pose a problem because our aim during

the global search is to place at least one near-cor-

rect prediction in the ®nal output; not necessarily

at the highest scoring position. Correctly docked

complexes can be screened later using experimen-

tal constraints and advanced re®nement tech-

niques.

The additional constraint of removing predic-

tions with unfavourable electrostatic interactions

markedly improves the ranking of correctly docked

structures in the global search (Table 2). With elec-

trostatics, a good solution is found in the top 4000

structures in nine out of ten systems. In general, in-

clusion of electrostatics reduces the number of geo-

metries to be evaluated by approximately 50%.

Global searching and the dependence

on filtering

Knowledge of the location of the binding site on

one, or both proteins drastically reduces the num-

ber of possible allowed conformations. Knowing

speci®c binding site residues reduces the search

space even further. It is possible to utilize this in-

formation in the form of distance constraints (see

Methods). Generally, information about the bind-

ing site is available from experimental data (e.g.

site-directed mutagenesis, chemical cross-linking,

phylogenetic data). In the absence of experimental

data, it is often possible to predict the correct bind-

ing site by examining potential hydrogen bonding

groups, clefts and/or charged sites on a protein

surface (Gilson & Honig, 1987; DesJarlais

et al.,

1988; Nicholls & Honig 1991; Laskowski, 1995;

Laskowski

et al.,

1996; Meyer

et al.,

1996). Serine

proteases and immunoglobulins represent systems

where the binding sites are known in advance. The

catalytic triad of the serine proteases and the com-

plementarity determining region (CDR) of immu-

noglobulins are both well characterized. We take

advantage of this information to varying degrees

in our docking experiments (Table 2). In the serine

protease/inhibitor docking attempts, ®lters are de-

®ned as: loose, any residue of the inhibitor in con-

tact with any residue of the catalytic triad (i.e. His,

Asp, Ser); medium, an inhibitor residue in contact

with both the His and the Ser of the catalytic triad;

tight, a speci®c binding site residue of the inhibitor

in contact with both the His and the Ser of the cat-

alytic triad. In the antibody/lysozyme docking at-

tempts, ®lters are de®ned as: loose, any part of the

lysozyme in contact with either the L3 or the H3

CDR; medium, lysozyme in contact with both the

L3 and H3 CDRs; tight, the medium ®lter together

with one residue of the epitope in contact with any

part of the CDR. The L3/H3 CDR ®lters are based

on the study of MacCullum

et al.

(1996), which

lYmYn

laYmbYng

So both grids are Fourier transformed and corre-

lated such that the static charges of molecule B

move through the electric ®eld of molecule A. The

electrostatic correlation score is used as a binary ®l-

ter. Speci®cally, false positive geometries that give

high shape correlation scores can be excluded if

their electrostatic correlation is unfavourable (i.e.

positive).

Results

Predictive docking of native protein structures

The docking protocol, shown schematically in

Figure 1, was applied to the unbound coordinates

of six enzyme/inhibitor and two antibody/antigen

systems. In addition, two bound antibodies were

docked to unbound antigens. These ten systems

are referred to as ``unbound'' docking, for simpli-

city. Docked solutions are ranked by surface corre-

lation score. A correct solution is de®ned as any

geometry with an interface C

RMS less than or

equal to 2.5 A from the crystal structure (see

Methods).

The results of the global search before ®ltering

and without consideration of electrostatics show

that geometric complementarity alone (as

measured by Katchalski-Katzir

et al.,

1992) cannot

reliably dock unbound complexes (Tables 2 and 3).

For three out of the ten test systems no correctly

docked complex is ranked in the top 4000 predic-

tions. For the other seven systems, the highest

ranked correct answer was in a list of hundreds of

alternatives. This shows that a high surface corre-

lation score does not necessarily indicate a cor-

Plik z chomika:

xyzgeo

Inne pliki z tego folderu:

lecture(2).ppt (4091 KB)
0544(4).pdf (4024 KB)
proteindocking(5).pdf (1853 KB)
gkq311(4).pdf (4167 KB)
1471-2105-12-36(4).pdf (1183 KB)

gabb_jackson_sternberg.pdf

Plik z chomika:

Inne pliki z tego folderu:

Inne foldery tego chomika: