Michael L Overton.

Optimality conditions and duality theory for minimizing sums of the largest eigenvalues of symmetric matrices online

. (page 1 of 3)
Font size
QR-code for this ebook

Computer Science Department


Optimality Conditions and Duality Theory for

Minimizing Sums of the Largest Eigenvalues of

Symmetric Matrices

Michael L. Overton
Robert S. Womersley

Technical Report 566

June 1991




\o '-'


U U-l

o o

,:- •'yttsT'T!


^ 4-1

c^ (0 c >-; 3
£ ^ o o u)

£ C rH -^ -^

o o 2 'll .5

(1) 4-> 3 -^



-^ Department of Computer Science
Courant Institute of Mathematical Sciences




Optimality Conditions and Duality Theory for

Minimizing Sums of the Largest Eigenvalues of

Symmetric Matrices

Michael L. Overton
Robert S. Womersley

Technical Report 566

June 1991

Optimality conditions and duality theory for

minimizing sums of the largest eigenvalues

of symmetric matrices

M. L. Overton^ and R. S. Womersley

June 1991

Submitted to Math Programming

Abstract: This paper gives max characterizations for the sum of the largest eigen-
values of a symmetric matrix. The elements which achieve the maximum provide a
concise characterization of the generalized gradient of the eigenvalue sum in terms
of a dual matrix. The dual matrix provides the information required to either verify
first-order optimality conditions at a point or to generate a descent direction for
the eigenvalue sum from that point, splitting a multiple eigenvalue if necessary. A
model minimization algorithm is outlined, and connections with the classical lit-
erature on sums of eigenvalues are explained. Sums of the largest eigenvalues in
absolute value are also addressed.

Key words: symmetric matrix, maximum eigenvalues, spectral radius, minimax
problem, max characterization, generalized gradient.

Short title: Minimizing sums of eigenvalues

1. Courant Institute of Mathematical Sciences, New York University, New York,
New York, 10012. The work of this author was supported by the National
Science Foundation under grant CCR-88-02408.

2. School of Mathematics, University of New South Wales, P.O. Box 1, Kensing-
ton, N.S.W. 2033, Australia. The work of this author was supported in part
during a visit to Argonne National Laboratory by the Apphed Mathematical
Sciences subprogram of the Office of Energy Research of the U.S. Depart-
ment of Energy under contract W-31-109-Eng-38, and in part during a visit
to the Courant Institute by the U.S. Department of Energy under Contract

1 Introduction

Let ^ be an n by n real symmetric matrix, and let k G {l,...,n}. Denote the
eigenvalues of yi by Ai, . . . , A„ and also by //i, . . . , /x„, the difference being that the
former axe ordered by

Ai> - ->A„, (1.1)

while the latter are ordered by

l/^l|> - ->l/in|. (1.2)


UA) = J2\, (1.3)



gM) - X^l/^.l- (1-4)


Note that fi(A) is the largest eigenvalue of A, while gi{A) is the spectral radius of
A (its largest eigenvalue in absolute value).

After establishing some further notation in Section 2, the main results of the
paper are given in Sections 3 and 4 for the functions fniA) and g^iA) respectively.
Max characterizations of these functions are established in terms of the Frobenius
inner product {A,B) = tr(^B^) for A,B e R"***" (where tr{A) denotes the trace
of A), and sets of matrices defined by positive semi-definite inequalities (see Section
2). Let

1=1 j=i

This inner product is the natural extension for reed matrix variables of the stEin-
dard inner product x^y — X!"=i 2r,j/, on R". It is widely used in problems with ma-
trix variables, for example in the work of Bellman and Fan (1963), Arnold (1971),
Craven and Mond (1981). Fletcher (1985), Overton (1988), and Overton and Wom-
ersley (1988) used the notation A : B for {A,B).

Some useful properties of the Frobenius inner product are summarized below.

1. {A,A) = \\A\\l.

•2. {A.I) = tv(A).

3. If A e 5„ and A' G ICn then {A,K) = 0.

4. As tr(A5^) = tv{BA^) = tT{A^ B)

{A,B) = {B,A) = {A^,B^) = (S^A^).

In particular

(a) For any nonsingular matrices 5 € R'"'"" and T G R"''"

{A,B) = {S-'A,S^B) = {AT-\BT^).

(b) If A E R"''".5 e R""*"* and Z e R"**"* then

{Z^AZ,B) = {A,ZBZ^).

3 Sum of the largest eigenvalues

Let A G 5„ have eigenvalues

Ai >••• > A„,

and let Q G C„,„ be a matrix whose columns are normalized eigenvectors of .4, so

Q'^AQ = A (3.1)

where A = diag(Ai, . . . , A„). The matrix Q will be regarded as fixed, although if A
has multiple eigenvalues the choice of Q is not unique.

Let K G {l,...,n}. Section 3.1 establishes a max characterization of

UA) = X:A.- (3.2)


Thereafter we concentrate on the case when A : R"* — > Sn is a smooth matrix- valued
function and

Mx) = fMi^))- (3.3)

Section 3.2 considers the differential properties of /k(j'), Section 3.3 gives necessary
conditions for a local minimizer, Section 3.4 establishes a saddle point result, Section
3.5 gives a formula for the directional derivative, Section 3.6 discusses the generation
of descent directions by splitting multiple eigenvalues and Section 3.7 considers a
model algorithm for minimizing /^(x).

3.1 A max characterization


$n,« = {U eSn:0n,K o^e related by

*n.« = {U eSn:U = ZDZ'^ where Z G 0„,„,

D = diag(ui,. . . ,u„) and u G ^„,k }, (3.6)


(f>n.^ = { ?/ G R" : «, = U„ for i = l,...,n where U G $„.« }. (3.7)

Proof: That $„,« and (f>n_K are compact convex sets is immediate. A spectral
decomposition of U yields (3.6) and the orthogonal invariance of $„,«. The set (f)n,K
is contained in the right-hand side of (3.7) since for any u G n,K, diag(ui, . . . , t/„) G
$„,«. To obtain the reverse inclusion, let U G $n,K and define u by u, = U„. The
facts that the trace is the sum of the diagonal elements and that a positive semi-
definite matrix cannot have a negative diagonal element then show that u G n.K-

To characterize the elements that achieve the maximum in the following results,
information about the multiplicity of the eigenvalues of A is needed. Let

Ai > ••• > A, >

Ar + l = • • • = Ak =
Ar + ( + l ^ ■ • • ^ A„,

Ar + ( >


where t > 1 and r > are integers. The multiplicity of the /cth eigenvalue is t. The
number of eigenvalues Izirger than A« is r. Here r may be zero; in particular this
must be the case if k = 1. Note that by definition

r + l «S„ be a smooth (at least once continuously differentiable) function
whose partial derivative with respect to Xk is

Mil f I 1

Ak(x) =z -^ for fc = l,...,m.


This section is concerned with finding a computationally useful characterization of
the generalized gradient of the function

Although the eigenvalues and eigenvectors of A{x) are functions oi x E R"", the
explicit dependence on x will usually be omitted. Thus the eigenvalues of ^(x) are
denoted by as before (3.8), with r and t now dependent on x, and with corresponding
eigenvectors satisfying (3.18).


Theorem 3.9 The function fK{x) is locally Lipschiiz, subdifferentially regular, and
its generalized gradient is the nonempty compact convex set

dMx) = { u 6 R"" : 3 r e 5t with < T < 7. tr{L') = k - r. and

u, = tT{P^Ak{x)P^) + {QlAk{x)Qr,U), A- = l....,m}.(3.29)

Proof: Since /k(.4) is convex and A(x) is smooth the chain rule (Clarke (1983),
Theorem 2.3.10) impHes that /^(i) is locally Lipschitz. subdifferentially regular,
zind that

dUx) = {tiGR'" : Uk = (.4fc(x),t^),fc = l,...,m

where U € dMA{x)) }. (3.30)

Corollarj- 3.5 and the properties of the inner product complete the proof. I

Remark: Since by Theorem 3.4

f^{x) = max {A(x),U).

the result also follows from the chEiracterization of generalized gradients of functions
defined by a pointwise maximmn in Clarke (1983. Theorem 2.8.6). The Clarke
characterization shows that

df^(x) = convliiGR"* : Uk = {Ak{x),U),k = l,...,m

where (-4(i), ?7 ) = /«(x) }. (3.31)

Equation (3.29) follows from Theorem 3.4 since the maximizing set is already con-

Remark: Bellman and Fan (1963) gave an example where the set

{ u G R'" : Ufc = ( Ak. U ), where T > }

is not closed, cind gave sufiicient conditions for it to be closed. This is not a
difficulty in our case because the trace condition ensiires that $„,« is compact.
As /«(x) : R"* ^ R is a locally Lipschitz function its generalized gradient is a
nonempty compact convex set in R"*.

Corollary 3,10 7/ A« > A«+i (i.e. k = r + t) the function /« is differentiable at x

1 3