Chaya Bleich Gurwitz.

# Sequential quadratic programming methods based on approximating a projected Hessian matrix online

. (page 1 of 8)
Font size

Computer Science Department

TECHNICAL REPORT

"Sequential Quadratic Programming Methods Based on

Approximating a Projected Hessian Matrix"

by

Chaya Bleich Gunvitz '

Technical Report #219
May, 1986

Ui

NEW YORK UNIVERSITY

u

o o w

•H -H fO

!a> 0) -p o •

I "u J = l,...,m;

Cj(x) =0 j = mi+\, ■ ■ ■ ,m
where the objective and constraint functions are twice continuously differentiable.

We are concerned with developing algorithms to find a local minimum of (NP) by

using second derivative information in the tangent space to the constraints.

Sequential quadratic programming methods are particularly effective for
solving problems of this nature. Algorithms of this type compute the minimizer of
(NP) by solving a sequence of quadratic programming subproblems. It is assumed
that first derivatives of/ and {c,} are available, but that second derivatives may be
too expensive to compute. Instead, the methods typically update a suitable n^n
matrix which approximates second derivative information at every iteration.

We are interested m developing sequential quadratic programming methods
which maintain an approximation to second derivative information projected onto
the tangent space of the constraints. The main motivation for our work is that only
the projected matrix enters into the optimality conditions for the nonlinear problem.
One advantage of our methods is that updating projected second derivative

information reduces the dimension of the matrix to be recurred. Moreover, we
avoid the necessity of mtroducing an augmenting term which can lead to ill-
.'onditioned matrices Since the reduced matrix of second derivatives is known to be
positive definite at the solution, we are able to make use of standard quasi-New ton

Fou: possible formulations of the quadratic programming subproblem will be
presented The difficulties arising from incomplete knowledge of the full second
derivative matrix will be discussed. Proofs of global convergence of some of these
methods for a restricted class of problems are presented. We will present numerical
result? w'hi»_ii indicate that our algorithms may be useful in practice.

1.2 Notation

Scalarh will usually be denoted by lower-case Greek letters, e.g. a, p.

Vectors will be represented by lower-case Roman letters, such as x, y, and are
assumed to be column vectors. A superscript T will be used to denote transpose, so
x^ will represent a row vector. Individual components of a vector may be
referenced by the subscripted name of the vector, so that y^, would denote the third
component of the vector y.

I^/Iatrices will be represented by capital Roman letters A particular element
wiil be referenced by the name of the matri.x, m either upper or lower case, with
tv. ,ubscripts, referring to the row and column in which the element appears, e.g.
0 and all k^kQ\
2 -step superlinearly, if

fc- k'~ -v

-0:

■ \\-
or some constant fx>0 and all k'^k^.

Given any function h{x) . we use the notation h (x)^ or Vh(x) to denote the

The following notation will be used in reference to problem (NP):
Let g{x) =Vf(x), the gradient of the objective function,
c(x) = [ci(i), ■ ■ ,c„ix)]^. A(.v) = [Vci(x) • ■ ,Vc„(x)]^, the matrix whose rows
consist of ihe transposed constraint gradients, G(x) = V^/(.vi, G,(.v) = V^c,(.r), the
Hessians of the objecti\e and constraint functions, respectively. We will assume that
the unctions / and c, are twice differentiable and that the Hessians of the objective
an>. constraint functions are Lipschitz continuous matrix functions of c.

The index set A// = {1,2, ■■ ,mi} refers to the indices of the inequality
,:onstraints, while ME = {mi^\, . .m} is used to denote the equality constraints.

._et A be partitioned into

-4/

A.

, where A/ contains the transposed gradients of the

inequality constraints and Aa- denotes the transposed gradients of the equality
constraints. Similarly, let c; and c^ refer to the values of the inequality and equality
constraints, respectively.

A point x is feasible for problem < NP) if

c/ > 0.
Ce = 0.

-5-

At the solution to (NP), some subset of the inequahty constraints, as well as all
of the equality constraints are zero. These constraints, denoted by c(x ) or c , are
called the active constraints at the solution. Any algorithm for solving (NP) is
concerned with identifying the active set at the solution. Since the other inequality
constraints have a positive value, they remain satisfied within some neighborhood of
x' . The problem then essentially reduces to the problem of minimizing the
objective function with respect to the active constraints. The index set J{x ), known
as the working set, is an approximation to the active set at the solution which is used
by the algorithm at the point a:*. Any constraint which is in the working set is called
active, although it may not actually be included in the active set at the solution. The
constraints in the working set are referred to by c(x ) or c .

Let A{x) denote the matrix whose rows consist of the transposed gradients of
the active constraints and assume that A{x) has full row rank. If we partition Aj into

, where Aj refers to the inactive constraint gradients and Aj denotes the

A similar partitioning of cj

gradients of the active constraints, then A =

allows us to express c as {cj c^) .

Let Z{.v) be a matrix whose columns span the null space of A(x), and let Y{x)
be a matrix whose columns form a basis for the range space of A{x)^ . Such
matrices can be obtained, for example, by forming the LQ factorization which
produces

A(x)Qix) = [L(x) 0]. (1.1)

Let r be the number of active constraints and let t = n-r. The first r columns of

Q{x) can be used to define the columns of Y{x), while the columns of Z(x) can be

formed from the last t columns of Q{x).

The vector gi(x) and the matrix W(j:,X) will be used to denote the gradient and

-6-

Hessian of the Lagrangian function, which will be introduced in the next section.
The approximation to the Hessian of the Lagrangian function at r* will be denoted
by H'' . We will use B'' to refer to the projected Hessian approximation.

1.3 Optimality Conditions

The following presentation of the optimality conditions refers to the constraint
qualifications and the necessary and sufficient conditions for a point x to be a local
minimum of (NPi as presented in Fletcher (1981) We will assume that the first
and second order constraint qualifications hold at x . A sufficient, but not
necessary, condition for the constraint qualifications to hold at v is that the
gradients of the active constraints are linearly independent, i.e., A(x ) is of full
rank.

A point -v* is optimal for iNP) if

I) it is feasible

2' locally there is no other feasible point with a lower objective function value.

Given a feasible point .v. consider a step along a first-order feasible direction p.
First, consider such a direction which maintains to first-order a zero value for the
constraints in the working set. This direction must then satisfy

A(x)p = 0. (1.2)

The matrix Z{x) forms a basis for all p that satisfy (1.2). Such a direction p is then

of the form p = Z(x)p^, for some vector p.. Consider the Taylor series expansion

of/ at x along such a direction p:

fix' + ap) = fix' -h aZ(.Y*)p,)

= fix') + ap]Zix* f gix' ) ^ Oia-p-)-
In order for x' to be optimal,

plZix')^gix*) 2: 0, for all vectors p..

-7-
This condition reduces to

Zix'fgix') = (1.3)

which implies that g(x') is in the range of A{x')^ . Therefore, there exists a vector

X such that

six') = A(x'f\' . (1.4)

The quantity Zix)^g{x) is called the projected gradient of / at x. Note that

requiring the projected gradient to be zero is a special case of the optimality

condition ^(.v*) = for unconstrained optimization. A feasible point satisfying

(1.3) is called a constrained stationary point.

However, it may be that the current working set is not the active set at the
solution, and that a feasible point with a lower function value could be found by
allowing one of the inequalities in the current working set to take on a positive
value. At the solution, using (1.4) we have

XX,*a,(.v*)p ^ 0. (1.5)

A first-order feasible direction/) must satisfy

ai(x')p = idME,
ai(x')p > z€A//.
Condition (1.5) thus implies

X, ^ 0, for the active inequality constraints.
Therefore, the first-order necessary conditions for a point x to be a local
minimum of NP are the existence of a vector of Lagrange multipliers, X , such that

gix') = A(x'fK' (1.6a)

X,* > 0, for the active inequality constraints (1.6b)

c,(x') > 0, (1.6c)

Ce(x*) = 0. (1.6d)

A pair {x , X ), satisfying (1.6) is called a Kuhn — Tucker pair of (NP).

Strict complementarity is said to hold if there is no ; such that X, = c,(.r ) = 0.

Condition (1.6a) is equivalent to requiring x to be a stationary point of the

Lagrangian function

L(x,k) ^ f(x)-X^iCi(x) (1.7)

when \ = X. .

Second-order optimality conditions are expressed in terms of the Hessian with
respect to x of the Lagrangian function,

W{xM = V^f(x)-2K^^c,{x).
It can be shown that a necessary condition for optimality is that p^W(x' ,x')p >

for any feasible p (see Fletcher (198))). Since p = Z(x')p.. this implies that

pJZix )^W(x' ,k')Z{x*)p^ S: for all p,, i.e., the projected Hessian,

Z(x'')^W(x' ,k')Z{x'), is positive semi-definite. Strict positive definiteness of the

projected Hessian together with strict inequality in (1.6b) and the satisfaction of the

first order conditions is a second-order sufficient condition.

In addition to the assumption that A has full rank (stated earher), we shall
assume throughout that the the second-order sufficient conditions hold at x .

1.4 Historical Background

The earliest approache'; to solving problem (NP) were based on solving a
sequence of unconstrainc •; minimization problems whose solution in the limit
approached the solution of (NP). Penalty function and barrier function methods
belong to this class (see Fiacco and McCormick (1968j) Penalty function methods
minimize a sequence of subproblems where a "penalty" term is added to the
objective function for constraint violation. These methods generate a sequence of
infeasible points. The penalty term grows steadily, forcing convergence to a feasible
point in the limit. These methods are not appropriate for problems in which
feasiblity must be maintained Barrier function methods, on the other hand,
generate a sequence of feasible points by creating a "barrier" around the feasible

-9-

region. Such methods involve a sequence of subproblems where the function to be
minimized is found by adding a weighted sum of continuous functions with a
positive singularity at the boundary of the feasible region to the objective function.
As the weight assigned to the barrier term is decreased, the solutions to the
unconstrained subproblems approach the constrained minimum.

The availability of powerful methods for unconstrained optimization and the
well-developed theoretical background and comparative simplicity of these methods
is attractive. However, in practice they suffer from severe numerical difficulties,
since the unconstrained problems become increasingly more ill-conditioned as the
solution is approached (Murray (1971), Lootsma (1972)).

Algorithms based on sequential quadratic programming (SQP) (also known as
recursive quadratic programming (RQP) ) have been found particularly effective.
Methods of this type approximate the minimizer of (NP) by solving a sequence of
quadratic programming (QP) subproblems. In these subproblems a local model of
the nonlinear problem is posed in which the curvature of the Lagrangian function is
approximated by a suitable matrix and linear approximations are made to the
constraints. The solution of the QP is used to provide a search direction, p*. The
algorithm then determines a step length, a*, such that .v* + a*p* is a sufficiently
better approximation to x , in a sense to be defined later.

At the solution to (NP) we are concerned only with the constraints that are
active. Inactive constraints do not affect the solution to (NP), since they remain
satisfied in the neighborhood of the solution and thus can be ignored. Therefore,
any method used to solve (NP) must incorporate some means of determining the
active set. In some cases, an approximation is made to the active set and the QP
subproblem is posed with only equality constraints. Such subproblems are known as
EQPs - equality constrained quadratic programs. On the other hand, the QP can be
posed as an IQP - an inequality constrained quadratic program. Subproblems of this

- 10-

nature include linear approxraations to all the equality and inequality constraints of
the original problem. Although the terms EQP and IQP refer to the formulation of
the QP subprobiem, we also use these terms in a looser sense to mean SQP methods
which >o!ve successive QP subproblems of either type.

The EQP .'■-ubproblem for (NP) is given by

mm -p'^Hp + p'^g (EQP)

s.t. A^p = -c.

and the TQP formulation of the j:ubproblemi is given by

min \p^Hp + p'^g (IQP)

s.t. Alp ^ -ci
aIp = -ce
where H is some approximation to the Hessian of the Lagrangian function.

Smce the Lagrangian function is defined in terms of both ,v and \, these
methods must incorporate some means of approximating the Lagrange multipliers.
At the solution, the Lagrange multipliers are defined by (1.6a), where c(x*) = 0.
This suggests that a first-order estimate, X^. can be found at .v* by solving the least
squares problem

mxn\\A'\i- g'Wl. (1.8)

The least squares Lagrange multiplier estimates, X.£, are only an estimate of the
mulnplii-rs defined by the current working set.

A second order m jltiplier estimate, •ti'^, is given by the multipliers defined by
the solu'ion p of the QP ~ubprob)em,

A'\'' = g' + Hp (1.9)

where 4 is defined by the working set at the solution to the QP. These multiplier

estimates are referred to as the QP multipliers .

- 11 -

The advantage of posing the subprobiem as an EQP is that the solution can be
easily computed by solving two systems of linear equations. The solution, p , can
be written as

p* = Ypy* + Zp*
where Y and Z are defined as in (1.1). The vectors py and p^ can be found by

solving

Mp* = -c (1.10a)

Z^HZp' = -Z'^g - Z^HYp* (1.10b)

The solutions are unique when A has full row rank and Z^HZ is positive definite.

This method of computing p* is discussed by Murray and Wright (1982). The

solution of the EQP subprobiem can also be viewed as the result of applying one

step of Newton's method to the nonlinear system of equations

g{x)-k{x)\

c{x)
(see Nocedal and Overton (1985)).

= (1.11)

In general, the computation of the solution to an IQP is itself a finite iterative
process. Essentially, the IQP generates a sequence of working sets and solves the
EQP associated with each working set. At any given point, a search direction is
generated along which the QP objective function decreases and the active constraints
remain satisfied. The full step along this search direction may cause one of the
currently inactive constraints to be violated. If such is the case, a step is taken to
the closest such constraint. The newly-active constraint is added to the working set
and the process is repeated. Eventually, a point is reached which is the minimum
on the subspace defined by the current working set (i.e., the gradient projected onto
the tangent space to the constraints in the working set is zero). To determine
whether this point is optimal for the QP, the QP Lagrange multiplier estimates are
checked. If some multiplier is negative, that is an indication that the QP objective

- 12-

function can be further decreased by deleting the corresponding constraint from the
working set This process of revising the working set and minimizing along the
sub? pace defined by the working set is continued until a point is reached at which
the jirojected gradient is zero and the Lagrange multipliers are all non-negative.

£QP methods are attractive because the formulation of the subproblem
parallels NewtonV method applied to an unconstramed minimization problem.
Moreover, when the conect working set is identified, Z(x' )'^W{x' .k')Zt,x') must be
at leaM positive semi-definite. In that case one can extend quasi-Newton updating
methods to the case of maintaining a positive definite approximation to Z^WZ.
However, the difficulties of determining the correct working set should not be
minimi/.ed. Algorithms which update the working set at each iteration can suffer
from repeated insertion and deletion of the same constraint, known as zigzagging
(see Wolfe (1972)^

Methods based upon solving IQP subproblems are not dependent on a strategy
for updating the working set at each iteration. Such algorithms allow for fast
recovery from a bad initial approximation to the active set. IQP methods typically
involve approximating the full matrix W, since the constraints defining Z^WZ change
within the subprobleni. However, there is no a priori reason to assume that \V is
positive definite. Moreover, when W is indefinite, the solution may not be
bounded, and the direction produced by the QP ma\ not be a descent direction.
Therefore, many methods which update the tuU second derivative matrix of the
Lagrangian function add a term of the form pA^A lo the definition of the
Lagrangian function so as to force W to be positive definite. This augmenting term
can lead to very ill-conditioned matrices, although in practice this does not appear to
impede the performance of such methods (Giil, ei. al., private communication).

A discussion of the relative merits of EQP and IQP methods can be found in
Vfv)rray and Wright (1982) and Gil! et. al. (1985cj. In the neighborhood of a

- 13 -

solution, one would expect similar performance from both the IQP and EQP
methods. Close to x' it is expected that the working set would correspond to the
correct active set. Therefore, the solution to an IQP subproblem would be found by
taking one step along the subspace defined by the current working set which is
equivalent to the solution of the EQP defined by those constraints.

It appears that SQP methods were first suggested by Wilson (1963). Wilson
proposed solving an IQP at each iteration, where the quadratic approximation to the
Lagrangian function was formed using the exact Hessians of the objective and
constraint functions. Murray (1969), Biggs (1972) and Wright (1976) presented
successive EQP methods. Garcia and Mangasarian (1976) used quasi-Newton