Computer Science Department

TECHNICAL REPORT

Optimality Conditions and Duality Theory for

Minimizing Sums of the Largest Eigenvalues of

Symmetric Matrices

Michael L. Overton

Robert S. Womersley

Technical Report 566

June 1991

;Trm NEW YORK UNIVERSITY

c

(d

');
}
elseif (getClientWidth() > 430)
{
document.write('');
}
else
{
document.write('');
}
//-->

■u

\o '-'

c

o

U U-l

o o

,:- •'yttsT'T!

^-^

^ 4-1

c^ (0 c >-; 3

£ ^ o o u)

£ C rH -^ -^

o o 2 'll .5

(1) 4-> 3 -^

C3

zoo

-^ Department of Computer Science

Courant Institute of Mathematical Sciences

SfivJ

S*s''^?i

251 MERCER STREET, NEW YORK, N.Y. 10012

Optimality Conditions and Duality Theory for

Minimizing Sums of the Largest Eigenvalues of

Symmetric Matrices

Michael L. Overton

Robert S. Womersley

Technical Report 566

June 1991

Optimality conditions and duality theory for

minimizing sums of the largest eigenvalues

of symmetric matrices

M. L. Overton^ and R. S. Womersley

June 1991

Submitted to Math Programming

Abstract: This paper gives max characterizations for the sum of the largest eigen-

values of a symmetric matrix. The elements which achieve the maximum provide a

concise characterization of the generalized gradient of the eigenvalue sum in terms

of a dual matrix. The dual matrix provides the information required to either verify

first-order optimality conditions at a point or to generate a descent direction for

the eigenvalue sum from that point, splitting a multiple eigenvalue if necessary. A

model minimization algorithm is outlined, and connections with the classical lit-

erature on sums of eigenvalues are explained. Sums of the largest eigenvalues in

absolute value are also addressed.

Key words: symmetric matrix, maximum eigenvalues, spectral radius, minimax

problem, max characterization, generalized gradient.

Short title: Minimizing sums of eigenvalues

1. Courant Institute of Mathematical Sciences, New York University, New York,

New York, 10012. The work of this author was supported by the National

Science Foundation under grant CCR-88-02408.

2. School of Mathematics, University of New South Wales, P.O. Box 1, Kensing-

ton, N.S.W. 2033, Australia. The work of this author was supported in part

during a visit to Argonne National Laboratory by the Apphed Mathematical

Sciences subprogram of the Office of Energy Research of the U.S. Depart-

ment of Energy under contract W-31-109-Eng-38, and in part during a visit

to the Courant Institute by the U.S. Department of Energy under Contract

DEFG0288ER25053.

1 Introduction

Let ^ be an n by n real symmetric matrix, and let k G {l,...,n}. Denote the

eigenvalues of yi by Ai, . . . , A„ and also by //i, . . . , /x„, the difference being that the

former axe ordered by

Ai> - ->A„, (1.1)

while the latter are ordered by

l/^l|> - ->l/in|. (1.2)

Define

UA) = J2\, (1.3)

«=i

and

gM) - X^l/^.l- (1-4)

1=1

Note that fi(A) is the largest eigenvalue of A, while gi{A) is the spectral radius of

A (its largest eigenvalue in absolute value).

After establishing some further notation in Section 2, the main results of the

paper are given in Sections 3 and 4 for the functions fniA) and g^iA) respectively.

Max characterizations of these functions are established in terms of the Frobenius

inner product {A,B) = tr(^B^) for A,B e R"***" (where tr{A) denotes the trace

of A), and sets of matrices defined by positive semi-definite inequalities (see Section

2). Let

1=1 j=i

This inner product is the natural extension for reed matrix variables of the stEin-

dard inner product x^y — X!"=i 2r,j/, on R". It is widely used in problems with ma-

trix variables, for example in the work of Bellman and Fan (1963), Arnold (1971),

Craven and Mond (1981). Fletcher (1985), Overton (1988), and Overton and Wom-

ersley (1988) used the notation A : B for {A,B).

Some useful properties of the Frobenius inner product are summarized below.

1. {A,A) = \\A\\l.

•2. {A.I) = tv(A).

3. If A e 5„ and A' G ICn then {A,K) = 0.

4. As tr(A5^) = tv{BA^) = tT{A^ B)

{A,B) = {B,A) = {A^,B^) = (S^A^).

In particular

(a) For any nonsingular matrices 5 € R'"'"" and T G R"''"

{A,B) = {S-'A,S^B) = {AT-\BT^).

(b) If A E R"''".5 e R""*"* and Z e R"**"* then

{Z^AZ,B) = {A,ZBZ^).

3 Sum of the largest eigenvalues

Let A G 5„ have eigenvalues

Ai >••• > A„,

and let Q G C„,„ be a matrix whose columns are normalized eigenvectors of .4, so

Q'^AQ = A (3.1)

where A = diag(Ai, . . . , A„). The matrix Q will be regarded as fixed, although if A

has multiple eigenvalues the choice of Q is not unique.

Let K G {l,...,n}. Section 3.1 establishes a max characterization of

UA) = X:A.- (3.2)

1=1

Thereafter we concentrate on the case when A : R"* — > Sn is a smooth matrix- valued

function and

Mx) = fMi^))- (3.3)

Section 3.2 considers the differential properties of /k(j'), Section 3.3 gives necessary

conditions for a local minimizer, Section 3.4 establishes a saddle point result, Section

3.5 gives a formula for the directional derivative, Section 3.6 discusses the generation

of descent directions by splitting multiple eigenvalues and Section 3.7 considers a

model algorithm for minimizing /^(x).

3.1 A max characterization

Let

$n,« = {U eSn:0n,K o^e related by

*n.« = {U eSn:U = ZDZ'^ where Z G 0„,„,

D = diag(ui,. . . ,u„) and u G ^„,k }, (3.6)

and

(f>n.^ = { ?/ G R" : «, = U„ for i = l,...,n where U G $„.« }. (3.7)

Proof: That $„,« and (f>n_K are compact convex sets is immediate. A spectral

decomposition of U yields (3.6) and the orthogonal invariance of $„,«. The set (f)n,K

is contained in the right-hand side of (3.7) since for any u G n,K, diag(ui, . . . , t/„) G

$„,«. To obtain the reverse inclusion, let U G $n,K and define u by u, = U„. The

facts that the trace is the sum of the diagonal elements and that a positive semi-

definite matrix cannot have a negative diagonal element then show that u G n.K-

I

To characterize the elements that achieve the maximum in the following results,

information about the multiplicity of the eigenvalues of A is needed. Let

Ai > ••• > A, >

Ar + l = • • • = Ak =

Ar + ( + l ^ ■ • • ^ A„,

Ar + ( >

(3.8)

where t > 1 and r > are integers. The multiplicity of the /cth eigenvalue is t. The

number of eigenvalues Izirger than A« is r. Here r may be zero; in particular this

must be the case if k = 1. Note that by definition

r + l «S„ be a smooth (at least once continuously differentiable) function

whose partial derivative with respect to Xk is

Mil f I 1

Ak(x) =z -^ for fc = l,...,m.

dxk

This section is concerned with finding a computationally useful characterization of

the generalized gradient of the function

Although the eigenvalues and eigenvectors of A{x) are functions oi x E R"", the

explicit dependence on x will usually be omitted. Thus the eigenvalues of ^(x) are

denoted by as before (3.8), with r and t now dependent on x, and with corresponding

eigenvectors satisfying (3.18).

13

Theorem 3.9 The function fK{x) is locally Lipschiiz, subdifferentially regular, and

its generalized gradient is the nonempty compact convex set

dMx) = { u 6 R"" : 3 r e 5t with < T < 7. tr{L') = k - r. and

u, = tT{P^Ak{x)P^) + {QlAk{x)Qr,U), A- = l....,m}.(3.29)

Proof: Since /k(.4) is convex and A(x) is smooth the chain rule (Clarke (1983),

Theorem 2.3.10) impHes that /^(i) is locally Lipschitz. subdifferentially regular,

zind that

dUx) = {tiGR'" : Uk = (.4fc(x),t^),fc = l,...,m

where U € dMA{x)) }. (3.30)

Corollarj- 3.5 and the properties of the inner product complete the proof. I

Remark: Since by Theorem 3.4

f^{x) = max {A(x),U).

the result also follows from the chEiracterization of generalized gradients of functions

defined by a pointwise maximmn in Clarke (1983. Theorem 2.8.6). The Clarke

characterization shows that

df^(x) = convliiGR"* : Uk = {Ak{x),U),k = l,...,m

where (-4(i), ?7 ) = /«(x) }. (3.31)

Equation (3.29) follows from Theorem 3.4 since the maximizing set is already con-

vex.

Remark: Bellman and Fan (1963) gave an example where the set

{ u G R'" : Ufc = ( Ak. U ), where T > }

is not closed, cind gave sufiicient conditions for it to be closed. This is not a

difficulty in our case because the trace condition ensiires that $„,« is compact.

As /«(x) : R"* ^ R is a locally Lipschitz function its generalized gradient is a

nonempty compact convex set in R"*.

Corollary 3,10 7/ A« > A«+i (i.e. k = r + t) the function /« is differentiable at x