Computer Science Department

TECHNICAL REPORT

"Sequential Quadratic Programming Methods Based on

Approximating a Projected Hessian Matrix"

by

Chaya Bleich Gunvitz '

Technical Report #219

May, 1986

Ui

NEW YORK UNIVERSITY

u

o o w

•H -H fO

!a> 0) -p o •

I "u J = l,...,m;

Cj(x) =0 j = mi+\, ■ ■ ■ ,m

where the objective and constraint functions are twice continuously differentiable.

We are concerned with developing algorithms to find a local minimum of (NP) by

using second derivative information in the tangent space to the constraints.

Sequential quadratic programming methods are particularly effective for

solving problems of this nature. Algorithms of this type compute the minimizer of

');
}
elseif (getClientWidth() > 430)
{
document.write('');
}
else
{
document.write('');
}
//-->

(NP) by solving a sequence of quadratic programming subproblems. It is assumed

that first derivatives of/ and {c,} are available, but that second derivatives may be

too expensive to compute. Instead, the methods typically update a suitable n^n

matrix which approximates second derivative information at every iteration.

We are interested m developing sequential quadratic programming methods

which maintain an approximation to second derivative information projected onto

the tangent space of the constraints. The main motivation for our work is that only

the projected matrix enters into the optimality conditions for the nonlinear problem.

One advantage of our methods is that updating projected second derivative

information reduces the dimension of the matrix to be recurred. Moreover, we

avoid the necessity of mtroducing an augmenting term which can lead to ill-

.'onditioned matrices Since the reduced matrix of second derivatives is known to be

positive definite at the solution, we are able to make use of standard quasi-New ton

updates.

Fou: possible formulations of the quadratic programming subproblem will be

presented The difficulties arising from incomplete knowledge of the full second

derivative matrix will be discussed. Proofs of global convergence of some of these

methods for a restricted class of problems are presented. We will present numerical

result? w'hi»_ii indicate that our algorithms may be useful in practice.

1.2 Notation

Scalarh will usually be denoted by lower-case Greek letters, e.g. a, p.

Vectors will be represented by lower-case Roman letters, such as x, y, and are

assumed to be column vectors. A superscript T will be used to denote transpose, so

x^ will represent a row vector. Individual components of a vector may be

referenced by the subscripted name of the vector, so that y^, would denote the third

component of the vector y.

I^/Iatrices will be represented by capital Roman letters A particular element

wiil be referenced by the name of the matri.x, m either upper or lower case, with

tv. ,ubscripts, referring to the row and column in which the element appears, e.g.

0 and all k^kQ\

2 -step superlinearly, if

fc- k'~ -v

-0:

2-step quadratically . if

■ \\-

or some constant fx>0 and all k'^k^.

Given any function h{x) . we use the notation h (x)^ or Vh(x) to denote the

gradient of h at x.

The following notation will be used in reference to problem (NP):

Let g{x) =Vf(x), the gradient of the objective function,

c(x) = [ci(i), ■ ■ ,c„ix)]^. A(.v) = [Vci(x) • ■ ,Vc„(x)]^, the matrix whose rows

consist of ihe transposed constraint gradients, G(x) = V^/(.vi, G,(.v) = V^c,(.r), the

Hessians of the objecti\e and constraint functions, respectively. We will assume that

the unctions / and c, are twice differentiable and that the Hessians of the objective

an>. constraint functions are Lipschitz continuous matrix functions of c.

The index set A// = {1,2, ■■ ,mi} refers to the indices of the inequality

,:onstraints, while ME = {mi^\, . .m} is used to denote the equality constraints.

._et A be partitioned into

-4/

A.

, where A/ contains the transposed gradients of the

inequality constraints and Aa- denotes the transposed gradients of the equality

constraints. Similarly, let c; and c^ refer to the values of the inequality and equality

constraints, respectively.

A point x is feasible for problem < NP) if

c/ > 0.

Ce = 0.

-5-

At the solution to (NP), some subset of the inequahty constraints, as well as all

of the equality constraints are zero. These constraints, denoted by c(x ) or c , are

called the active constraints at the solution. Any algorithm for solving (NP) is

concerned with identifying the active set at the solution. Since the other inequality

constraints have a positive value, they remain satisfied within some neighborhood of

x' . The problem then essentially reduces to the problem of minimizing the

objective function with respect to the active constraints. The index set J{x ), known

as the working set, is an approximation to the active set at the solution which is used

by the algorithm at the point a:*. Any constraint which is in the working set is called

active, although it may not actually be included in the active set at the solution. The

constraints in the working set are referred to by c(x ) or c .

Let A{x) denote the matrix whose rows consist of the transposed gradients of

the active constraints and assume that A{x) has full row rank. If we partition Aj into

, where Aj refers to the inactive constraint gradients and Aj denotes the

A similar partitioning of cj

gradients of the active constraints, then A =

allows us to express c as {cj c^) .

Let Z{.v) be a matrix whose columns span the null space of A(x), and let Y{x)

be a matrix whose columns form a basis for the range space of A{x)^ . Such

matrices can be obtained, for example, by forming the LQ factorization which

produces

A(x)Qix) = [L(x) 0]. (1.1)

Let r be the number of active constraints and let t = n-r. The first r columns of

Q{x) can be used to define the columns of Y{x), while the columns of Z(x) can be

formed from the last t columns of Q{x).

The vector gi(x) and the matrix W(j:,X) will be used to denote the gradient and

-6-

Hessian of the Lagrangian function, which will be introduced in the next section.

The approximation to the Hessian of the Lagrangian function at r* will be denoted

by H'' . We will use B'' to refer to the projected Hessian approximation.

1.3 Optimality Conditions

The following presentation of the optimality conditions refers to the constraint

qualifications and the necessary and sufficient conditions for a point x to be a local

minimum of (NPi as presented in Fletcher (1981) We will assume that the first

and second order constraint qualifications hold at x . A sufficient, but not

necessary, condition for the constraint qualifications to hold at v is that the

gradients of the active constraints are linearly independent, i.e., A(x ) is of full

rank.

A point -v* is optimal for iNP) if

I) it is feasible

2' locally there is no other feasible point with a lower objective function value.

Given a feasible point .v. consider a step along a first-order feasible direction p.

First, consider such a direction which maintains to first-order a zero value for the

constraints in the working set. This direction must then satisfy

A(x)p = 0. (1.2)

The matrix Z{x) forms a basis for all p that satisfy (1.2). Such a direction p is then

of the form p = Z(x)p^, for some vector p.. Consider the Taylor series expansion

of/ at x along such a direction p:

fix' + ap) = fix' -h aZ(.Y*)p,)

= fix') + ap]Zix* f gix' ) ^ Oia-p-)-

In order for x' to be optimal,

plZix')^gix*) 2: 0, for all vectors p..

-7-

This condition reduces to

Zix'fgix') = (1.3)

which implies that g(x') is in the range of A{x')^ . Therefore, there exists a vector

X such that

six') = A(x'f\' . (1.4)

The quantity Zix)^g{x) is called the projected gradient of / at x. Note that

requiring the projected gradient to be zero is a special case of the optimality

condition ^(.v*) = for unconstrained optimization. A feasible point satisfying

(1.3) is called a constrained stationary point.

However, it may be that the current working set is not the active set at the

solution, and that a feasible point with a lower function value could be found by

allowing one of the inequalities in the current working set to take on a positive

value. At the solution, using (1.4) we have

XX,*a,(.v*)p ^ 0. (1.5)

A first-order feasible direction/) must satisfy

ai(x')p = idME,

ai(x')p > z€A//.

Condition (1.5) thus implies

X, ^ 0, for the active inequality constraints.

Therefore, the first-order necessary conditions for a point x to be a local

minimum of NP are the existence of a vector of Lagrange multipliers, X , such that

gix') = A(x'fK' (1.6a)

X,* > 0, for the active inequality constraints (1.6b)

c,(x') > 0, (1.6c)

Ce(x*) = 0. (1.6d)

A pair {x , X ), satisfying (1.6) is called a Kuhn — Tucker pair of (NP).

Strict complementarity is said to hold if there is no ; such that X, = c,(.r ) = 0.

Condition (1.6a) is equivalent to requiring x to be a stationary point of the

Lagrangian function

L(x,k) ^ f(x)-X^iCi(x) (1.7)

when \ = X. .

Second-order optimality conditions are expressed in terms of the Hessian with

respect to x of the Lagrangian function,

W{xM = V^f(x)-2K^^c,{x).

It can be shown that a necessary condition for optimality is that p^W(x' ,x')p >

for any feasible p (see Fletcher (198))). Since p = Z(x')p.. this implies that

pJZix )^W(x' ,k')Z{x*)p^ S: for all p,, i.e., the projected Hessian,

Z(x'')^W(x' ,k')Z{x'), is positive semi-definite. Strict positive definiteness of the

projected Hessian together with strict inequality in (1.6b) and the satisfaction of the

first order conditions is a second-order sufficient condition.

In addition to the assumption that A has full rank (stated earher), we shall

assume throughout that the the second-order sufficient conditions hold at x .

1.4 Historical Background

The earliest approache'; to solving problem (NP) were based on solving a

sequence of unconstrainc •; minimization problems whose solution in the limit

approached the solution of (NP). Penalty function and barrier function methods

belong to this class (see Fiacco and McCormick (1968j) Penalty function methods

minimize a sequence of subproblems where a "penalty" term is added to the

objective function for constraint violation. These methods generate a sequence of

infeasible points. The penalty term grows steadily, forcing convergence to a feasible

point in the limit. These methods are not appropriate for problems in which

feasiblity must be maintained Barrier function methods, on the other hand,

generate a sequence of feasible points by creating a "barrier" around the feasible

-9-

region. Such methods involve a sequence of subproblems where the function to be

minimized is found by adding a weighted sum of continuous functions with a

positive singularity at the boundary of the feasible region to the objective function.

As the weight assigned to the barrier term is decreased, the solutions to the

unconstrained subproblems approach the constrained minimum.

The availability of powerful methods for unconstrained optimization and the

well-developed theoretical background and comparative simplicity of these methods

is attractive. However, in practice they suffer from severe numerical difficulties,

since the unconstrained problems become increasingly more ill-conditioned as the

solution is approached (Murray (1971), Lootsma (1972)).

Algorithms based on sequential quadratic programming (SQP) (also known as

recursive quadratic programming (RQP) ) have been found particularly effective.

Methods of this type approximate the minimizer of (NP) by solving a sequence of

quadratic programming (QP) subproblems. In these subproblems a local model of

the nonlinear problem is posed in which the curvature of the Lagrangian function is

approximated by a suitable matrix and linear approximations are made to the

constraints. The solution of the QP is used to provide a search direction, p*. The

algorithm then determines a step length, a*, such that .v* + a*p* is a sufficiently

better approximation to x , in a sense to be defined later.

At the solution to (NP) we are concerned only with the constraints that are

active. Inactive constraints do not affect the solution to (NP), since they remain

satisfied in the neighborhood of the solution and thus can be ignored. Therefore,

any method used to solve (NP) must incorporate some means of determining the

active set. In some cases, an approximation is made to the active set and the QP

subproblem is posed with only equality constraints. Such subproblems are known as

EQPs - equality constrained quadratic programs. On the other hand, the QP can be

posed as an IQP - an inequality constrained quadratic program. Subproblems of this

- 10-

nature include linear approxraations to all the equality and inequality constraints of

the original problem. Although the terms EQP and IQP refer to the formulation of

the QP subprobiem, we also use these terms in a looser sense to mean SQP methods

which >o!ve successive QP subproblems of either type.

The EQP .'■-ubproblem for (NP) is given by

mm -p'^Hp + p'^g (EQP)

s.t. A^p = -c.

and the TQP formulation of the j:ubproblemi is given by

min \p^Hp + p'^g (IQP)

s.t. Alp ^ -ci

aIp = -ce

where H is some approximation to the Hessian of the Lagrangian function.

Smce the Lagrangian function is defined in terms of both ,v and \, these

methods must incorporate some means of approximating the Lagrange multipliers.

At the solution, the Lagrange multipliers are defined by (1.6a), where c(x*) = 0.

This suggests that a first-order estimate, X^. can be found at .v* by solving the least

squares problem

mxn\\A'\i- g'Wl. (1.8)

The least squares Lagrange multiplier estimates, X.£, are only an estimate of the

mulnplii-rs defined by the current working set.

A second order m jltiplier estimate, •ti'^, is given by the multipliers defined by

the solu'ion p of the QP ~ubprob)em,

A'\'' = g' + Hp (1.9)

where 4 is defined by the working set at the solution to the QP. These multiplier

estimates are referred to as the QP multipliers .

- 11 -

The advantage of posing the subprobiem as an EQP is that the solution can be

easily computed by solving two systems of linear equations. The solution, p , can

be written as

p* = Ypy* + Zp*

where Y and Z are defined as in (1.1). The vectors py and p^ can be found by

solving

Mp* = -c (1.10a)

Z^HZp' = -Z'^g - Z^HYp* (1.10b)

The solutions are unique when A has full row rank and Z^HZ is positive definite.

This method of computing p* is discussed by Murray and Wright (1982). The

solution of the EQP subprobiem can also be viewed as the result of applying one

step of Newton's method to the nonlinear system of equations

g{x)-k{x)\

c{x)

(see Nocedal and Overton (1985)).

= (1.11)

In general, the computation of the solution to an IQP is itself a finite iterative

process. Essentially, the IQP generates a sequence of working sets and solves the

EQP associated with each working set. At any given point, a search direction is

generated along which the QP objective function decreases and the active constraints

remain satisfied. The full step along this search direction may cause one of the

currently inactive constraints to be violated. If such is the case, a step is taken to

the closest such constraint. The newly-active constraint is added to the working set

and the process is repeated. Eventually, a point is reached which is the minimum

on the subspace defined by the current working set (i.e., the gradient projected onto

the tangent space to the constraints in the working set is zero). To determine

whether this point is optimal for the QP, the QP Lagrange multiplier estimates are

checked. If some multiplier is negative, that is an indication that the QP objective

- 12-

function can be further decreased by deleting the corresponding constraint from the

working set This process of revising the working set and minimizing along the

sub? pace defined by the working set is continued until a point is reached at which

the jirojected gradient is zero and the Lagrange multipliers are all non-negative.

£QP methods are attractive because the formulation of the subproblem

parallels NewtonV method applied to an unconstramed minimization problem.

Moreover, when the conect working set is identified, Z(x' )'^W{x' .k')Zt,x') must be

at leaM positive semi-definite. In that case one can extend quasi-Newton updating

methods to the case of maintaining a positive definite approximation to Z^WZ.

However, the difficulties of determining the correct working set should not be

minimi/.ed. Algorithms which update the working set at each iteration can suffer

from repeated insertion and deletion of the same constraint, known as zigzagging

(see Wolfe (1972)^

Methods based upon solving IQP subproblems are not dependent on a strategy

for updating the working set at each iteration. Such algorithms allow for fast

recovery from a bad initial approximation to the active set. IQP methods typically

involve approximating the full matrix W, since the constraints defining Z^WZ change

within the subprobleni. However, there is no a priori reason to assume that \V is

positive definite. Moreover, when W is indefinite, the solution may not be

bounded, and the direction produced by the QP ma\ not be a descent direction.

Therefore, many methods which update the tuU second derivative matrix of the

Lagrangian function add a term of the form pA^A lo the definition of the

Lagrangian function so as to force W to be positive definite. This augmenting term

can lead to very ill-conditioned matrices, although in practice this does not appear to

impede the performance of such methods (Giil, ei. al., private communication).

A discussion of the relative merits of EQP and IQP methods can be found in

Vfv)rray and Wright (1982) and Gil! et. al. (1985cj. In the neighborhood of a

- 13 -

solution, one would expect similar performance from both the IQP and EQP

methods. Close to x' it is expected that the working set would correspond to the

correct active set. Therefore, the solution to an IQP subproblem would be found by

taking one step along the subspace defined by the current working set which is

equivalent to the solution of the EQP defined by those constraints.

It appears that SQP methods were first suggested by Wilson (1963). Wilson

proposed solving an IQP at each iteration, where the quadratic approximation to the

Lagrangian function was formed using the exact Hessians of the objective and

constraint functions. Murray (1969), Biggs (1972) and Wright (1976) presented

successive EQP methods. Garcia and Mangasarian (1976) used quasi-Newton