Tamar Schlick.

Development of a new computational approach to the prediction of nucleic acid structure by potential energy methods: I. Deoxyribose online

. (page 1 of 6)
Online LibraryTamar SchlickDevelopment of a new computational approach to the prediction of nucleic acid structure by potential energy methods: I. Deoxyribose → online text (page 1 of 6)
Font size
QR-code for this ebook

Computer Science Department


"Development of a New Computational

Approach to the Prediction of Nucleic Acid

Structure by Potential Energy Methods:

I. Deoxyribose'

T. Schlick

C. Peskin

S. Broyde

M. Overton

Technical Report #264

December 1986





o o

^ -H

o o

(0 3

;? o c

(U v-i

C 04^"

Cu o
(0 nj
(^ u ^-i i-i O
ph nj O nJ •'-'

M (0 -P O O
CO o -c

o uch as X-Ray crystallography
and spectroscopic methods regarding structural features of a molecule, and can make reliable
predictions [1 - *]. The fundamental idea behind potential energy calculations is to con^truct a
function that represents the free energy associated with a specific molecular conformation; the
conformation corresponding to the minimum free energy can thenjtte. appiroximated by potential
energy minimization. The challenge and power of this approach lies-in» its-ability to incorporate all
experimental data known from crystal structures of small molecules with the goal o( predicting
structures of large biological systems. Understanding the correlation between conformation and
biological function is our ultimate goal.

Two basic ingredients of potential energy studies make these investigations difficult:

construction of the energy function and energy minimization. With increasing accuracy of

experimental data and better computing resources, these difficulties, however, may be overcome.

Nonetheless, understanding the basic computational problems is important for improved
theoretical treatments.

[n this paper, we discuss the three fundamental issues of a computational approach — choice
of coordinate system to represent molecular conformation, constructum ot the potential energy
function, and choice of minimization technique. We then present the particular combination that
we have devised and applied to a deoxyribose model. The preferred sugar conformations of
C2' — endo and C3" — endo (see Fisures 1.2) are venerated with aeometries as observed for manv

sugar t'ragmLMits of crystallographicall\-determined nucleoNides and nucleotides [5-9j.

Deo\>rib attempts to
reproduce the observed puckering preferences of nucleic acid sugars wiih associated endocyclic
bond angle values by potential energ\ methods uere imperfect [9]. These discrepencies were
attributed to the transfer of energy parameters from general chemical sequences in small
molecules to analogous chemical groups in the furanose ring^ Olson has reconciled these
differences by incorporating directly experimental values of the bond angles that acct^mpany the
puckering and by improving the energy function with a 'gauche potential [I7j. While more recent
minimization or dynamics studies [18-20] have been more successful at reproducing qtialirative
aspects of pseudorotation (regions of energy minima and maxima, values of energy barriers i.
there are still some discrepancies from experiment in the exact location of the minima and
maxima, and. when reported, in the bond angle deviations that occur with puckering. .Accurate
generation of structures and energies for the furanose ring by full cartesian energy minimization is
important in order to compute nucleic acid structures by minimization in conformational space
with all degrees of freedom rather than in dihedral angle space [see. for example. 21-23].

Our computational approach consists of the following three components. First, the potential
energy is represented in cartesian coordinate space with the full set of degrees o[ freedom
associated u ith the molecular conformation. Thus, all bond lengths and bond angles as well as
dihedral angles are allowed to vary. When properly parameten/.ed. this approach provides a
realistic view of the relaxed molecular conformation. .Second, the potential energy function is
cimstructed so that all the interactions other than the non-bonded terms are represented by
polynnmials of the coordinate variables. This formulation allov^s direct differentiation vMth respect
to the cartesian variables and facilitates manipulation and differentiation of the energy terms.

Third, two powerful Newton methods that are globally and quadratically convergent [2-1 are
unpleinented: Gill and Murray's Modified Newton method [2>.26j and a Truncated Neuton
method specifically developed for potential energy minimization. Full details of the minimization
algorithms are provided in the accompanying paper [68].

In the ne.xt section, we describe the fundamental issues of computational approaches while
explaining the directions we have chosen. Section 3 discusses the methods: geometric construction,
energy formulation and parameterization, and organization for minimization. In Section 4. ue
examine in detail the influence of different potential energy sets on the structures obtained and on
the C3' — endo'C2"-endo energy difference. In particular, ue can study the contribution of each
energy potential from Energy vs. P (the pseudorotation parameter) profiles. These curves also
provide an opportunity to investigate the approximation made in the pseudorotation description.
In Section 5 we summarize our overall conclusions. S-

•rf'i fr ;.•

- 4 -

In anv computational approach to the determination of molecular >tructure by energy
minimization, three basic decisions must be made:

1) The degrees of freedom used to describe the molecular structure.

2) The analytic form of the energy function, and

3) The choice of a minimization algorithm.

The structural outcome and hence the biological implication of any calculation is highly dependent
on the combination of choices taken. Given the ^ame starting point, different minimization
algorithms may predict different local minima, and different structures may result froin different
potential energy models. We will describe these three issues in turn.

2.1 Degrees of Freedom

The conformation of a molecule is described by a list of numbers that specifies the relative
positions of the atoms m space. By definition, the conformation is unchanged uhen the
molecule as a whole is subjected to rigid-body motion (translation or rotation). If the molecule
contains .V atoms. 3iV-6 numbers are required to specify the conformation. These numbers
may be chosen in different uays. For example, one might use the 3.V cartesian coordinates o\ the
atoms and force 6 of these numbers to be zero by putting a particular atom at the origin, a second
atom along the v a.xis. and a third atom in the v . \ plane. .Alternatively . the contormation may be
.specified as some combinatii)n of bond lengths, bond angles, and dihedral angles |27].

The internal energy of a molecule is some function of its conformation i the energy is
unchanged by translation or by rotation oi the molecule as a v\hiile). Thus the task of energy
minimization must be carried out in conformation space, which has 3A'-6 dimensions (degrees of

freedom). Since this number of dimensions can be very large for biological macromolecules. it is
temptmg to reduce the number of degrees of freedom by makmg use ot certain a priori
information concerning the molecule in question. One might, for example. as>ume that the bond
lengths and bond angles were rigid and known in advance. Then the conformation of the molecule
(and hence the energy) would be a function of the dihedral angles alone. .Another possibility
would be to assume that the molecule could be deformed only along >ome particular path in
conformation space. The use of the pseudorotation model is an example of the latter approach.

The concept of pseudorotation [5. 1 1 . 12. 14..28.29] restricts the energetic pathway that the
five-membered sugar ring follows to a wave-like motion from a chosen mean plane detined by the
five ring atoms. The conformation of these skeletal atoms is described by only two instead k)\ the
full 9 degrees of freedom (3.V-6 , ,V = 5) a^soclated u ith the molecular conformation. These
coordinates can be calculated in the Cremer and Pople formalism [U] by expressing the
r — coordinates of the atoms as periodic displacements from a chosen mean plane in terms of phase
amplitude and phase shift (t/.'^}:

.-^ = (2/5)'-c/ cos ( ^ -r -^ (;-2) ) . ; = 0.1.2,3.4. (1)

.Alternatively, the coordinates can be constructed in the .Altona and Sundaralingam description [5]
from the ring's dihedral angles, expressed as periodic variations in terms of amplitude and shift
("ma.x- P) '^ee Figure 3):

T; = T^^, cos( P + ^(;-2) ) . 7 = (2)

.Although application of this concept to nucleic acids is conceptually Mmple and can simplify
formulation of ring geometry, it is necessarily approximate. First, it ma\ produce anomalies in the
overall nucleic acid structure. Second, the pseudorotation formulation introduces mathematical
difficulties in energy representation fas a function of the 2 pseudorotation parameters) and energy
differentiation (with respect to the conformational variables).

Thus, while it is generally an advantage to work u iih t'ewer degrees of freedom, there are
several problems associated with the use of constraints:

1) The constraints may be unrealistic. In a real molecule, bond lengths and particularly bond
ansles can fluctuate in response to molecular forces that vary v\ith conformation. Such >mall
changes may produce large deformations of the molecule as a \\hole [9].

2) The constraints may be inconsistent! This is particularly true in the case of five-atom rings,
for which the usual constraints on bond lengths and bond angles are inconsistent with ring
closure [10].

3) When constraints are used to reduce the number of independent \ariables, the energy may be
a very complicated function of the variables that remain. In particular, the non-bonded
interactions (Coulomb and Van der Waalsl are most easily expressed as a function of the
cartesian coordinates of the atoms. If. for example, the independent variables are taken as a
collection of dihedral angles, then the cartesian coordinates must be computed from the
dihedral angles at every stage of the minimization process. Worse, if the minimization
method uses derivatives, the corresponding derivatives of the cartesian cocudinatcs uith
respect to the dihedral angles must be evaluated repeatedK . The chain-rule expressions for
the derivatives (especially second-derivatives) are tedious to derive, easy to get wrong, and
expensive to evaluate. Symbolic computation (e.g. Macsyma [30]) avoids the tedium and the
potential errors, but not the expense of repeatedly evaluating the resulting expressions.

For all ot the above reasons, we avoid constraints and work uith the cartesian coordinates of
the atoms as the independent variables. Our energy function contains terms which keep the bond
lengths and bond angles close to their observed values, but the stiffness o{' these "soft constraints"
is not infinite, so we avoid the pitfalls outlined above.

- 7 -

2.2 The Potential Energy Function

In principle, a potential energy function could be obtained by a quantum-mechanical
description of the ground-state energy of the molecule. Since >uch calculations are not yet feasible
for molecules as large as proteins and nucleic acids, ue resort to a molecular mechanics treatment.
We consider the molecule as a system of .V masses (atoms) which is deformed by molecular
forces, thereby producing an energy change. The basic form of this energy consists of a sum of
non-bonded interactions, torsional potentials, and strain energy:




{-^n B,, ]


+ V

332 Q, Qj ]

„.;,-: 5., l^*'-';) '•'; J


V'"* V'3

^TORSION = 1 ^ (l+cos :t,) + 2 -r l^+^os 3t,) +


E STRAIN = 1 S\,^ [r,j - r,j] - + V S2, (o, - 9,1 -

ii.j} i 5g ,


^sos -BOSDED represents the pairvMse interactions and consists of two contributions: Van der
Waals interactions, repulsive at short atomic separations and attractive at large distances: and
electrostatic interactions between charged groups obeying Coulomb's law. The Lennard-Jones
6-12 empirical potential was formed by combining the leading term ( -.4 / r^ ) o( the London
attraction potential, knov\.n from quantum mechanical theory, with a steep repulsion term that is
computationally convenient. Interactions are generally considered for atom pairs in .S'vg. the set oi
all (/.;) pairs for which ; < j and atoms / and j are separated by 3 bonds or more. This avoids
double consideration (in terms (3a) and (3c)) of bonded atoms and atoms involved in a bond
angle in some calculations, to avoid double consideration of atoms involved in a dihedral angle as
well. .')yfl ci^nsists of the atom pairs separated by 4 bonds or more. Thus. v\hcn atoms separated
by 3 bonds e.xactly are considered non — bonded . the contribution to the torsional potential must be

- 8 -

recoenized: this can be done by adjusting the torsional parameters or scaling the 1 - interactions.
The electrostatic potential energy between the partial atomic charges Q, is given b>
Coulomb's law modified by a dielectric function Dir,.] to account for a weaker interaction in a
polarizable medium than in a vacuum. For convenience, summation generally extends over atoms
defined as non-bonded for the Lennard-Jones potential. This involves an approximation, hecau.se
although we attempt to avoid the contribution of atom pairs separated by 1 or 2 chemical bonds in
more than one energy term, this convention assumes neutral interactions between these excluded
atom pairs. ■

The torsional potential ^torsio^ accounts for interaction between atoms invoKed in

internal rotation . a rotation about a bond connecting two chemical groups in a molecule and

described by a dihedral angle 7 (see Figure -i-)- In ethane, for example, the C — C bond provides a

rotation a.xis for the 2 methyl groups. .Although the origin of the barrier to internal rotation has

not been entirely resolved, the principal interactions that give rise to rotational barriers are

currently thought to be repulsive interactions, caused by overlapping of bond orbitals of the two

rotating groups [31]. In ethane, the torsional energy is highest when the two ineth\l groups are

nearest, as in the eclipsed state, and lowest when the two groups are optimall> separated, as m the

staggered state. The empirical form of the torsional potential is given by

E-TOR ('^^ — ~( 1 +COS/ITJ, where the integer n denotes the periodicity ot the rotational

barrier and V^ is the associated barrier height. Tv\ofold and threefold potentials are most
commonly used.

^STRM.\ represents the bond stretching and angle bending energy as bond lengths and bond
angles ( 8, ) deviate from equilibrium values (denoted in the energy formulas with bars). .V^
denotes the set of all ii.j) pairs for which i small
(< 3") unless electron lone pairs t\)rm bond-like orbitals (e.2. water. B ( H - () - H ) ~ 105") or

- 9 -

ring molecules impose closure constramrs (e.g.. cycloalkanes. deoxyribose) . For these larger
deviations, suitable potentials are still bemg investigated, but the most commonly used form of the
bending potential is also harmonic.

.Additional terms or variations to the basic energy form in (3) may be used to account for
hydrogen bonding, solvent effects or helical parameters where appropriate [ 18. 32. 33. for

The parameterization process for potential energy functions is a difficult task Several
important decisions must be made regarding choices for the functional form and numerical values
for the parameters. Even if one is given a specific energy form and a set of structural and
energetic data to reproduce, the combination of parameters that can be used are endless.
Unrealistic choices for one group ot parameters can be compensated for by adjustment of another.
Ironically, the more specific the force-field becomes, that is, the more individualized the treatment
of geometric sequences (bonds, bond angles and dihedral angles), the greater becomes the
possibility of divergence from physical reality.

[n theory, the energy terms ^hould have clear physical significance with parameters calibrated
by empirical fitting of crystal data and rotational barriers of analogous small molecules However,
an approximation is inherent in the extension of data from small to large ^ystems. .Moreover,
interaction with solvent and countenons, reflected in the experimental data, must be interpreted
and incorporated in the energy model. In summary, much freedom and manipulation are possible
in constructing semi-empirical energy surfaces. Only if constructed and parameterized correctly,
will the energy model generate reliable structural predictions.

In the case of nucleic acid sugars, the importance of parameter choices in the energy function
has already been realized. Different potential energy models have produced revults that are
cjiialitatively different - pseudorotation is "free" [34|. or pseudorotation is hindered [18-20]. This
is particularly possible when chosing equilibrium values for endocyclic bond angles that are
inappropriate [17.34-]. Since the unusual puckering geometry and ring closure constraints produce
significant deviations from tetrahedral bond angle arrangements, it is not clear uhat equilibrium

- 1 -

values should be used in the harmonic bending terms, more appropriate for small fluctuations.
Indeed. Olson's studies [17] were >uccesstul at producmg energy minnna at the ideal
pseudorotation phase angles of 18'^ and 162". partl_\ because they mvolved no mmmiizaiion;
experimentally-determined valence angles were directly mcorporated mto the coordinate
generation oi the ring atoms to give the energy as a function of a fixed puckered form.

In addition to the sensitivity of the results to the parameter choices, important issues m the
modeling procedure have emerged in deoxyribose investigations. For example, the drbitrar\
selections possible in modeling dihedral angles have been noted by Harvey and Prabhakaran [l'^].
Consider an ethane molecule; nine rotations are associated uith the C-C bond. Which dihedral
angles should we associate with a particular torsional term.' .And how should we assure proper
positioning of the remaining atoms'.' Since the torsional potentials have the greatest contribution to
the total energy 'of deoxyribose. Harvey and Prabhakaran suggest that a ver\ careful search for
appropriate torsion parameters and modeled dihedral angle sets are required to obtain energy
minima at the angles corresponding to the ideal C2" — endo and C3'-endt) structures. Selection of
the torsion parameters is particularly important for deoxyribose. since a combination ot both
threefold and twofold (gauche) torsional energies is required to produce relative energies ot the
preferred C2'-endo and C3" — endo structures in accordance with the observed ratio of these
forms for sugar fragments [5-9]. We will discuss the selection of modeled dihedral angles and
associated constants in Section 3. .Although choices in the torsion parameters and modeled
dihedral angles are important for the furanose ring, the discrepancies noted in [19] regarding
location of the minima, are undoubdetly attributed in part to the united atom approach, Rxplicit
inclusion ot the hydrogen atoms is necessary in order to account correctly for the eclipsing strain
of the two substituents of C2' and C3" in the East and the \'an der Waals repulsions between the
equatorial base (in parallel orientatum to the furanose ring plane) and the exoc\clic substituents of
C5' in the West.

1 3 4 5 6

Online LibraryTamar SchlickDevelopment of a new computational approach to the prediction of nucleic acid structure by potential energy methods: I. Deoxyribose → online text (page 1 of 6)