James W Demmel. # The probability that a numerical analysis problem is difficult online

. **(page 1 of 4)**

Online Library → James W Demmel → The probability that a numerical analysis problem is difficult → online text (page 1 of 4)

');
}
else
{
document.write('');
}
//-->

Font size

â–

Computer Science Department

TECHNICAL REPORT

The Probability that a Numerical

Analysis Problem is Difficult

by

James W. Demmel

Technical Report #294

April, 1987

o

XI

o

&â– *

>1

[got*

NEW YORK UNIVERSITY

epartment of Computer Science

Courant Institute of Mathematical Sciences

251 MERCER STREET, NEW YORK, N.Y 10012

/

The Probability that a Numerical

Analysis Problem is Difficult

by

James W. Demmel

Technical Report #294

April, 1987

The Probability that a Numerical Analysis Problem is Difficult

James W. Demmel

Courant Institute

New York, NY 10012

Abstract

Numerous problems in numerical analysis, including matrix inversion, eigenvalue calculations

problem is large when the distance from that problem to the nearest "ill-posed" one is small.

For example, the closer a matrix is to the set of noninvertible matrices, the larger its condi-

tion number with respect to inversion. We show that the sets of ill-posed problems for matrix

inversion, eigenproblems, and polynomial zero finding all have a common algebraic and

geometric structure which lets us compute the probability distribution of the distance from a

"random" problem to the set. From this probability distribution we derive, for example, the

distribution of the condition number of a random matrix. We examine the relevance of this

theory to the analysis and construction of numerical algorithms destined to be run in finite

precision arithmetic.

AMS classifications: 15A12, 53C65, 60D05

1. Introduction

To investigate the probability that a numerical analysis problem is difficult, we need to

do three things:

1) Choose a measure of difficulty,

2) Choose a probability distribution on the set of problems,

3) Compute the distribution of the measure of difficulty induced by the distribution on the

set of problems.

The measure of difficulty we shall use in this paper is the condition number, which meas-

ures the sensitivity of the solution to small changes in the problem. For the problems we con-

sider in this paper (matrix inversion, polynomial zero finding and eigenvalue calculation),

there are well known condition numbers in the literature of which we shall use slightly modi-

fied versions to be discussed more fully later. The condition number is an appropriate meas-

ure of difficulty because it can be used to measure the expected loss of accuracy in the com-

puted solution, or even the number of iterations required for an iterative algorithm to con-

verge to a solution.

The probability distribution on the set of problems for which we will attain most of our

results will be the "uniform distribution" which we define as follows. We will identify each

problem as a point in either IR N (if it is real) or C N (if it is complex). For example, a real n

by n matrix A will be considered to be a point in IR""", where each entry of A forms a coordi-

nate in IR"' in the natural way. Similarly, a complex nth degree polynomial can be identified

with a point in C"" 1 " 1 by using its coefficients as coordinates. On the space JR N (or C N ) we

will take any spherically symmetric distribution, i.e. the induced distribution of the normal-

ized problem jc/|[jc|| (||-|| is the Euclidean norm) must be uniform on the unit sphere in IR .

For example, we could take a uniform distribution on the interior of the unit ball in IR , or

let each component be an independent Gaussian random variable with mean and standard

deviation 1. Our answers will hold for this entire class of distributions because our condition

numbers are homogeneous (multiplying a problem by a nonzero scalar does not change its

condition number).

The main justification for using a uniform distribution is that it appears to be fair: each

problem is as likely as any other. However, it does not appear to apply in many practical

cases for a variety of reasons, including the fact that any set of problems which can be

represented in a computer is necessarily discrete rather than continuous. We will discuss the

validity of our choice of uniform distribution as well as alternatives at length in section 6

below.

Finally, given this distribution, we must compute the induced probability distribution of

the condition number. It turns out that all the problems we consider here have a common

geometric structure which lets us compute the distributions of their condition numbers with a

single analysis, which goes as follows:

(i) Certain problems of each kind are ill-posed, i.e. their condition number is infinite.

These ill-posed problems form an algebraic variety within the space of all problems. For

example, the singular matrices are ill-posed with respect to the problem of inversion,

and they lie on the variety where the determinant, a polynomial in the matrix entries, is

zero. Geometrically, varieties are possibly self-intersecting surfaces in the space of

problems.

(/'/') The condition number of a problem has a simple geometric interpretation: it is propor-

tional to (or bounded by a multiple of) the reciprocal of the distance to the set of ill-

posed problems. Thus, as a problem gets closer to the set of ill-posed ones, its condition

number approaches infinity. In the case of matrix inversion, for example, the traditional

condition number is exactly inversely proportional to the distance to the nearest singular

matrix.

(Hi) The last observation implies that the set of problems of condition number at least x is

(approximately) the set of problems within distance clx (c a constant) of the variety of

ill-posed sets. Sets of this sort, sometimes called tubular neighborhoods, have been stu-

died extensively by geometers. We will present upper bounds, lower bounds, and

asymptotic values for the volumes of such sets. The asymptotic results, lower bounds,

and some of the upper bounds are new. The formulae are very simple, depending only

on x, the degree N of the ambient space, the dimension of the variety, and the degree of

the variety. These volume bounds in turn bound the volume of the set of problems with

condition number at least x. Since we are assuming the problems are uniformly distri-

buted, volume is proportional to probability.

Thus, for example, we will prove that a scaled version k(A) = ||A||r-||A _1 || of the usual

condition number of a complex matrix with respect to inversion satisfies

1 â€” -\ =s Prob( K (A) & x) < * â€”

and that asymptotically

Prob(K(A) * x) = W( " 2 ~ 1) + o(\)

In other words, the probability that the condition number exceeds x decreases as the square

of the reciprocal of x. Even for moderate x the upper bound exceeds the asymptotic limit by

a ratio of only about e 2 n 2 . If A is real we will show

C V~ ^)" 2 "' * Prob(K(A) * *) * 2 2- ["/I- N*

where C is a constant proportional to the n 2 - 1-dimensional volume of the set of singular

matrices inside the unit ball. Thus, for real matrices the probability that the condition number

exceeds x decreases as x _1 .

There are a number of open questions and conjectures concerning these volume bounds,

in particular for how general a class of real varieties they apply (the case of complex varieties

is simpler). We will discuss the history of this work and open problems in detail in section 4

below.

It turns out that the reciprocal relationship between condition number and distance to

the nearest ill-posed problem holds for a much wider class of problem than just matrix inver-

sion, polynomial zero finding and eigenvalue calculations: it is shared, at least asymptotically,

by any problem whose solution is an algebraic function. For simplicity we shall restrict our-

selves to the three aforementioned problems, but our results do apply more widely, as dis-

cussed in section 3 below and in [4].

This work was inspired by earlier work in a number of fields. Demmel [4], Gastinel [7],

Hough [14], Kahan [15], Ruhe [24], Stewart [28], Wilkinson [35, 36, 37] and others have

analyzed the relationship between the condition number and the distance to the nearest ill-

posed problem mentioned above in (/'/). Gray [8, 9], Griffiths [11], Hotelling [13], Lelong

[20], Ocneau [21], Renegar [23], Santalo [25], Smale [26], and Weyl [32] have worked on

bounds of volumes of tubular neighborhoods. These volume bounds have been used by Smale

[26, 27], Renegar [23] and others to analyze the efficiency of Newton's method for finding

zeros of polynomials. This latter work inspired the author [3] to apply these bounds to condi-

tioning. Ocneanu [22] and Kostlan [19] have also analyzed the statistical properties of the

condition number for matrix inversion.

The rest of this paper is organized as follows. Section 2 defines notation. Section 3

discusses the relationship between conditioning and the distance to the nearest ill-posed prob-

lem. Section 4 presents the bounds on the volumes of tubular neighborhoods we shall use,

and states some related open problems. Section 5 computes the distributions of the condition

numbers of our three problems. Section 6 discusses the limitations of assuming a uniform dis-

tribution and suggests alternatives and open problems. Section 7 contains the proofs of the

theorems in section 4.

2. Notation

We introduce several ideas we will need from numerical analysis, algebra, and

geometry. ||x|| will denote the Euclidean norm of the vector x as well as the induced matrix

norm

" " 58 *m â–

\\A\\f will denote the Frobenius norm

I|a||f-(2 M 2 ) m â–

ij

If P is a set and x is a point, we will let d'ist(x,P) denote the Euclidean distance from x to the

nearest point in P.

A subset M of IR N is called an n- dimensional manifold if it is locally homeomorphic to

IR". We also write n = dim(M). The codimension of M, written codim(M), is N-n. In this

paper dimension will always refer to the real dimension rather than the complex dimension,

which is half the real dimension.

A variety is the set of solutions of a system of polynomial equations. A variety is homo-

geneous if it is cone-shaped, i.e. if x is in the variety so is every scalar multiple ax. A variety

is not generally a manifold since it can have singularities in the neighborhood of which it is

not homeomorphic to Euclidean space. However, points q with relatively open neighbor-

hoods U q CP that are manifolds are dense in P [17, Theorem 4.2.4] so that the following

definition makes sense: the dimension of P at p , written d\m p {P), is

dim p (F) ^ limsup dim(f/ 9 ) .

qtVqCP

Uq a manifold

We in turn define the dimension of the variety P as the maximum over all p â‚¬P of dim p (P). If

dim / ,(/ > ) is constant for all p, we call P pure dimensional. A complex variety defined by a

single nonconstant polynomial is called a complex hypersurface. Complex hypersurfaces are

pure dimensional with codimension 2. A real hypersurface has codimension 1 everywhere.

The real variety defined by the polynomials/!, . . . ,f p is called a complete intersection if it is

pure dimensional of codimension p.

Now we define the degree of a purely /i-dimensional variety P in IR A . Let L N ~" be a

N â€” n dimensional linear manifold (plane) in IR A . Since dim(L A _,1 )-+-dim(/ > )=dim(lR A ) = iV

we say that L A ~" and IR^ are of complementary dimension. Generically, L h ~" and P will

intersect in a surface of codimension equal to the sum of their codimensions, that is N. In

other words, their intersection will be of dimension (a finite collection of points). If P is a

complex homogeneous variety, then for almost all planes L A_ " this collection will contain the

same number of points, and this common number is called the degree of P, and is written

deg(P) (see [17, Theorem 4.6.2]). Intuitively, deg(P) gives the number of "leaves" of the

variety P that a typical plane L A_n will intersect. In the case of a nonhomogeneous or real

variety P, deg(P) is defined analogously as the maximum (finite) intersection number of a

plane L N ~" and the n-dimensional variety P in 1R N , although the intersection number will not

generally be constant for almost all L N ~".

This concept of degree is a generalization of the degree of a polynomial. Indeed, if P is

complex and defined as the solution set of a single irreducible polynomial, then the degree of

the polynomial equals the degree of P as defined above [17].

By l-volume of an n-dimensional manifold M (l^n) we mean the /-dimensional Lebesgue

measure of M, if it exists. Note that if l>n this volume is zero. The notations vol(Af) and

volâ€ž(M) denote the n-volume of the n-dimensional manifold M.

3. Condition Numbers and the Distance to the Nearest Ill-Posed Problem

We claim that many classes of numerical analysis problems permit the following

geometric characterization of their condition numbers:

(i) Certain problems of each class are ill-posed, i.e. their condition numbers are infinite.

These problems form a variety within the space of all problems.

(ii) The condition number of a problem has a simple geometric interpretation: it is propor-

tional to (or bounded by a multiple of) the reciprocal of the distance to the set of ill-

posed problems. Thus, as a problem gets closer to the set of ill-posed ones, its condition

number approaches infinity.

In this section we will cite results from the literature to prove these claims for the fol-

lowing three classes of problems: matrix inversion, polynomial zero finding, and eigenvalue

calculation. Afterwards we will outline why this characterization applies to many other prob-

lems as well [4].

First we need to define condition number more precisely. If X is our space of problems

equipped with norm ||-||x, Y our space of solution equipped with norm ||-||y, and / -JC^Y is the

solution map for our problem, the usual definition of the relative condition number is

*rel(f,x) â– limsup (3.1)

6*-0 ll 6 *!!* ' ||*||x

||P/(*)ib-||*||x

llrt*)l|y

if the Jacobian Df exists (||-|[at is the induced norm). In many cases the essential information

about the conditioning is contained in the ||D/]|at factor. We may therefore use a multiple of

||DF||xy instead of kâ€ži without losing essential information.

All three of our problems are homogeneous: multiplying the problem by a scalar does

not change the condition number. Therefore, the set of ill-posed problems will also be

homogeneous, or cone-shaped. This permits us to normalize all our problems to have unit

norm (lie on the unit sphere in either IR/ 1 "' or C A ), and implies that any results on the distribu-

tion of the condition number will hold for any distribution of problems inducing the same dis-

tribution of jc/||jc|[ on the unit sphere.

Matrix Inversion: The usual relative condition number as defined in (3.1) with the ||-|| norm

on both the problem and solution spaces is [10]:

Kâ€ž,(A) = HAIHIA-'II .

We shall use the nearly equivalent condition number

k(a)Â» iiaiihia-'h â€¢

These condition numbers are both homogeneous, and infinite when A is singular, so the set

of ill-posed problems is a variety defined by the single n-th degree homogeneous irreducible

polynomial det(A) = 0, where n = dim(A) [31]. Denote the set of ill-posed problems by IP.

From the last section, we see that if A is complex, IP is a complex hypersurface. If A is real,

it is easy to verify that IP is still a real hypersurface by using the explicit parameterization

provided by Gaussian elimination [3].

A theorem of Eckart and Young [5] gives the distance from a nonsingular matrix to IP:

Theorem 3.1: [5] dist(A,/P) = HA -1 !!" 1 .

(Gastinel [7] proved this result for an arbitrary operator norm.)

Therefore, we see that in terms of k we may write

if \\A\\ F = 1 , then dist(A,//>) = 1/k(A) , (3.2)

i.e. that the distance from a normalized problem A to the nearest ill-posed problem is the

reciprocal of its condition number.

Polynomial Zero Finding: In this case we are interested in the sensitivity of the zeros of a

polynomial to small perturbations in the coefficients. If p (*) is an nth degree polynomial, let

\\p II denote the Euclidean norm of the vector of its coefficients. If p (z) = and 8p is a small

perturbation of p, it is easy to verify that to first order the perturbed polynomial p +hp has a

zero at z + 8z where

Bz =

â– 8p(z)

P'(*)

implying that the relative condition number is

K rf ;0,z)

n(z)-\\p\

P'MI

where

Â»Â«- k _1 l(2 l* 2 'l)

1/2

Note that the condition number depends both on the polynomial p and the choice of zero z.

For simplicity we will use the similar condition number

k( z) s Up II ,

[p ' ] !/>'(*) I

Both condition numbers are infinite when p'(z) = Q, i.e. when z is a multiple zero. Thus we

will take the set IP of ill-posed problems to be those polynomials with multiple zeros. A

necessary and sufficient condition for a polynomial to have a multiple zero is that its discrim-

inant, an irreducible homogeneous polynomial of degree In -2 in the coefficients of p [31],

be zero. If p is complex, this implies the set of polynomials with zero discriminant is a hyper-

surface. If p is real, this set of polynomials is still a hypersurface, as may be verified using

the parameterization provided by the leading coefficient pâ€ž and the zeros. The discriminant

may also be zero if the two leading coefficients of p equal zero (corresponding to a double

eigenvalue at Â°Â°), but this set is a subvariety of double the codimension of the hypersurface in

which it lies, and so forms a set of measure zero we may neglect.

Now we need to estimate the distance from a given polynomial to one with a multiple

zero. The estimate we shall use is due to Hough [14] and says

Theorem 3.2: [14, 4] The distance dist(p,/P) from the polynomial p of degree at least 2 to

one with a multiple zero is bounded by

dist(p,//>)< y/2-\p'{z)\

where p (z) = 0.

In fact, this is quite a weak result gotten by estimating the smallest change in p needed

to make a double zero at z, which turns out to be a linear least squares problem. Thus we

may write

V7

if ||p || = 1, then dist(p, IP) Â£ â€” ; r (3.3)

i.e. that the distance from a normalized problem p to the nearest ill-posed problem is

bounded by a multiple of the reciprocal of its condition number.

To see how much (3.3) may overestimate dist(p,// > ), we present a lower bound. Note

that by changing the argument of p from x to ax (a scalar) we may make the leading coeffi-

cient pâ€ž larger than the other coefficients.

Theorem 3.3: [4] Assume that p is an n-th degree polynomial satisfying ||p||=l and

|Pi l< \Pn \ln for i min (-r- , 2 2 , r) (3.4)

Thus we see that the distance to the nearest ill-posed problem is bounded below essentially

by a multiple of the square of the condition number. This is a general phenomenon among

algebraic problems to which we return below.

Eigenvalue calculations: We will be interested both in the sensitivity of eigenvalues and eigen-

vectors. More precisely, we will consider the sensitivity of the projection associated with an

eigenvalue [16]. If 7" is a matrix with simple eigenvalue X., right eigenvector x and left eigen-

vector y, the projection P associated with A. is the matrix P = xy T ly T x. The reduced resolvent

associated with X is the matrix

S = lim (I-P)(T-z)

- i

If T has n distinct eigenvalues X, with projections P, one can write

s = 2 (x,-x)- 1 /\ â€¢

If 8T is a small perturbation of T, one can show that to first order X changes to X + 5X and P

changes to P + hP [16] where

8X = trP87" and hP = -SbTP - PbTS

It is easy to verify that |5X | can be as large as ||P||-||8r|| and ||8P|| can be at least as large as

||S||-||P||-||87"|| (and no more than twice as large as this). Therefore we may take as condition

numbers

) < V2-||r|| f / ||P||.

Therefore, in terms of k we may write

if ||r|| f = 1, then dist(r,//>) < -â€”Â±- . (3.5)

k(T,\)

Wilkinson's theorem provides a somewhat weak upper bound on dist(7",// ) ). The condition

for P on the other hand provides a lower bound on dist(T,IP):

Theorem 3.5: [4] dist(7\/P) > ||r|| f /(7-K(7",/>)).

This result lets us write

if ||r|| F = 1, then dist(r,/P) s - -Â± â€” . (3.6)

For somewhat stronger results and discussion, see [4].

The phenomenon described above for matrix inversion, polynomial zero finding and

eigenvalue calculation is actually quite common in numerical analysis. It turns out all the

above results can be derived from the same underlying principle, that the condition number k

satisfies one or both of the following differential inequalities:

m-K 2 < ||Dk|| < M-k 2 .

where Dk is the gradient of k. The lower bound on ||Dk|| yields the upper bound on

dist(r,/P)

dist(7\//>) s 1 â€”-

m-K{T)

and the upper bound on ||Dk|| yields the lower bound

1

dht(T,IP)

M-k(T)

This phenomenon also appears in pole placement in linear control theory, Newton's method,

and elsewhere [4]. In the case of algebraic functions, one can show that at least for asymp-

totically large condition numbers, differential inequalities of the form

m-K 2 < ||Dk|| Â£ Af-K 3

hold, the new upper bound on ||Z?k|| yielding the lower bound on dist(7\/P)

which is the source of inequality (3.4) above.

Note also that the sets of ill-posed problems are hypersurfaces in our three examples

above. Other kinds of varieties are possible as well. For example, polynomials with at most

m distinct zeros form a subvariety of the variety of polynomials with at least one multiple

zero, and have codimension 2(n - m) (if complex) orn-m (if real) [3]. Since _/-tuple zeros

are more sensitive than j - 1-tuple zeros [34], there is a natural hierarchy of of sets of ever

more ill-posed problems, each one forming a subvariety of the previous set. Similar com-

ments apply to eigenvalue calculations (_/'-tuple eigenvalues are more sensitive than j â€” 1-tuple

eigenvalues) and rank-deficient linear least-squares problems (problems with higher rank

deficiency are more sensitive than ones with lower rank deficiency). The results of the next

section apply to these situations as well.

4. On Volumes of Tubes

In this section we state our main volume estimates. Proofs appear in section 7 below.

First we consider complex varieties. The upper bounds in the following theorem are

obtained by generalizing an argument of Renegar [23], who obtained the upper bound for

hypersurfaces:

Theorem 4.1: Suppose M is a complex, purely 2d-dimensional variety in C^. Let /(e) be the

fraction of the volume of the unit ball in C A which is within distance eÂ£l of M. Then

/( â‚¬ )< ^T{N+ 1/2) e 2 N 2^ N -l)2N-2d-2. d (M) . e 2(N- t f). (l + Ne) 2d (41)

J ^ ' r(N-d+ V2)T(d+ 1/2) v ' ev '

If M is a hypersurface (d =N - 1), then this upper bound may be improved to

/(e) s; c 2 ^ 3 -deg(A/)-e 2 -(l + A^â‚¬) 2(A '- 1) . (4.2)

If M passes through the origin, it is also true that

(1-e)" , ,fV -^ T(N ~ d+ l'^rO/+ 1/2) < /(e)

x v^7 t

(4.3)

deg(M) V-iT T(N+ 1/2)

If M is a hypersurface passing through the origin this lower bound may be improved to

anc i f or

Computer Science Department

TECHNICAL REPORT

The Probability that a Numerical

Analysis Problem is Difficult

by

James W. Demmel

Technical Report #294

April, 1987

o

XI

o

&â– *

>1

[got*

NEW YORK UNIVERSITY

epartment of Computer Science

Courant Institute of Mathematical Sciences

251 MERCER STREET, NEW YORK, N.Y 10012

/

The Probability that a Numerical

Analysis Problem is Difficult

by

James W. Demmel

Technical Report #294

April, 1987

The Probability that a Numerical Analysis Problem is Difficult

James W. Demmel

Courant Institute

New York, NY 10012

Abstract

Numerous problems in numerical analysis, including matrix inversion, eigenvalue calculations

');
}
else if (getClientWidth() > 430)
{
document.write('');
}
else
{
document.write('');
}
//-->

and polynomial zero finding, share the following property: the difficulty of solving a given problem is large when the distance from that problem to the nearest "ill-posed" one is small.

For example, the closer a matrix is to the set of noninvertible matrices, the larger its condi-

tion number with respect to inversion. We show that the sets of ill-posed problems for matrix

inversion, eigenproblems, and polynomial zero finding all have a common algebraic and

geometric structure which lets us compute the probability distribution of the distance from a

"random" problem to the set. From this probability distribution we derive, for example, the

distribution of the condition number of a random matrix. We examine the relevance of this

theory to the analysis and construction of numerical algorithms destined to be run in finite

precision arithmetic.

AMS classifications: 15A12, 53C65, 60D05

1. Introduction

To investigate the probability that a numerical analysis problem is difficult, we need to

do three things:

1) Choose a measure of difficulty,

2) Choose a probability distribution on the set of problems,

3) Compute the distribution of the measure of difficulty induced by the distribution on the

set of problems.

The measure of difficulty we shall use in this paper is the condition number, which meas-

ures the sensitivity of the solution to small changes in the problem. For the problems we con-

sider in this paper (matrix inversion, polynomial zero finding and eigenvalue calculation),

there are well known condition numbers in the literature of which we shall use slightly modi-

fied versions to be discussed more fully later. The condition number is an appropriate meas-

ure of difficulty because it can be used to measure the expected loss of accuracy in the com-

puted solution, or even the number of iterations required for an iterative algorithm to con-

verge to a solution.

The probability distribution on the set of problems for which we will attain most of our

results will be the "uniform distribution" which we define as follows. We will identify each

problem as a point in either IR N (if it is real) or C N (if it is complex). For example, a real n

by n matrix A will be considered to be a point in IR""", where each entry of A forms a coordi-

nate in IR"' in the natural way. Similarly, a complex nth degree polynomial can be identified

with a point in C"" 1 " 1 by using its coefficients as coordinates. On the space JR N (or C N ) we

will take any spherically symmetric distribution, i.e. the induced distribution of the normal-

ized problem jc/|[jc|| (||-|| is the Euclidean norm) must be uniform on the unit sphere in IR .

For example, we could take a uniform distribution on the interior of the unit ball in IR , or

let each component be an independent Gaussian random variable with mean and standard

deviation 1. Our answers will hold for this entire class of distributions because our condition

numbers are homogeneous (multiplying a problem by a nonzero scalar does not change its

condition number).

The main justification for using a uniform distribution is that it appears to be fair: each

problem is as likely as any other. However, it does not appear to apply in many practical

cases for a variety of reasons, including the fact that any set of problems which can be

represented in a computer is necessarily discrete rather than continuous. We will discuss the

validity of our choice of uniform distribution as well as alternatives at length in section 6

below.

Finally, given this distribution, we must compute the induced probability distribution of

the condition number. It turns out that all the problems we consider here have a common

geometric structure which lets us compute the distributions of their condition numbers with a

single analysis, which goes as follows:

(i) Certain problems of each kind are ill-posed, i.e. their condition number is infinite.

These ill-posed problems form an algebraic variety within the space of all problems. For

example, the singular matrices are ill-posed with respect to the problem of inversion,

and they lie on the variety where the determinant, a polynomial in the matrix entries, is

zero. Geometrically, varieties are possibly self-intersecting surfaces in the space of

problems.

(/'/') The condition number of a problem has a simple geometric interpretation: it is propor-

tional to (or bounded by a multiple of) the reciprocal of the distance to the set of ill-

posed problems. Thus, as a problem gets closer to the set of ill-posed ones, its condition

number approaches infinity. In the case of matrix inversion, for example, the traditional

condition number is exactly inversely proportional to the distance to the nearest singular

matrix.

(Hi) The last observation implies that the set of problems of condition number at least x is

(approximately) the set of problems within distance clx (c a constant) of the variety of

ill-posed sets. Sets of this sort, sometimes called tubular neighborhoods, have been stu-

died extensively by geometers. We will present upper bounds, lower bounds, and

asymptotic values for the volumes of such sets. The asymptotic results, lower bounds,

and some of the upper bounds are new. The formulae are very simple, depending only

on x, the degree N of the ambient space, the dimension of the variety, and the degree of

the variety. These volume bounds in turn bound the volume of the set of problems with

condition number at least x. Since we are assuming the problems are uniformly distri-

buted, volume is proportional to probability.

Thus, for example, we will prove that a scaled version k(A) = ||A||r-||A _1 || of the usual

condition number of a complex matrix with respect to inversion satisfies

1 â€” -\ =s Prob( K (A) & x) < * â€”

and that asymptotically

Prob(K(A) * x) = W( " 2 ~ 1) + o(\)

In other words, the probability that the condition number exceeds x decreases as the square

of the reciprocal of x. Even for moderate x the upper bound exceeds the asymptotic limit by

a ratio of only about e 2 n 2 . If A is real we will show

C V~ ^)" 2 "' * Prob(K(A) * *) * 2 2- ["/I- N*

where C is a constant proportional to the n 2 - 1-dimensional volume of the set of singular

matrices inside the unit ball. Thus, for real matrices the probability that the condition number

exceeds x decreases as x _1 .

There are a number of open questions and conjectures concerning these volume bounds,

in particular for how general a class of real varieties they apply (the case of complex varieties

is simpler). We will discuss the history of this work and open problems in detail in section 4

below.

It turns out that the reciprocal relationship between condition number and distance to

the nearest ill-posed problem holds for a much wider class of problem than just matrix inver-

sion, polynomial zero finding and eigenvalue calculations: it is shared, at least asymptotically,

by any problem whose solution is an algebraic function. For simplicity we shall restrict our-

selves to the three aforementioned problems, but our results do apply more widely, as dis-

cussed in section 3 below and in [4].

This work was inspired by earlier work in a number of fields. Demmel [4], Gastinel [7],

Hough [14], Kahan [15], Ruhe [24], Stewart [28], Wilkinson [35, 36, 37] and others have

analyzed the relationship between the condition number and the distance to the nearest ill-

posed problem mentioned above in (/'/). Gray [8, 9], Griffiths [11], Hotelling [13], Lelong

[20], Ocneau [21], Renegar [23], Santalo [25], Smale [26], and Weyl [32] have worked on

bounds of volumes of tubular neighborhoods. These volume bounds have been used by Smale

[26, 27], Renegar [23] and others to analyze the efficiency of Newton's method for finding

zeros of polynomials. This latter work inspired the author [3] to apply these bounds to condi-

tioning. Ocneanu [22] and Kostlan [19] have also analyzed the statistical properties of the

condition number for matrix inversion.

The rest of this paper is organized as follows. Section 2 defines notation. Section 3

discusses the relationship between conditioning and the distance to the nearest ill-posed prob-

lem. Section 4 presents the bounds on the volumes of tubular neighborhoods we shall use,

and states some related open problems. Section 5 computes the distributions of the condition

numbers of our three problems. Section 6 discusses the limitations of assuming a uniform dis-

tribution and suggests alternatives and open problems. Section 7 contains the proofs of the

theorems in section 4.

2. Notation

We introduce several ideas we will need from numerical analysis, algebra, and

geometry. ||x|| will denote the Euclidean norm of the vector x as well as the induced matrix

norm

" " 58 *m â–

\\A\\f will denote the Frobenius norm

I|a||f-(2 M 2 ) m â–

ij

If P is a set and x is a point, we will let d'ist(x,P) denote the Euclidean distance from x to the

nearest point in P.

A subset M of IR N is called an n- dimensional manifold if it is locally homeomorphic to

IR". We also write n = dim(M). The codimension of M, written codim(M), is N-n. In this

paper dimension will always refer to the real dimension rather than the complex dimension,

which is half the real dimension.

A variety is the set of solutions of a system of polynomial equations. A variety is homo-

geneous if it is cone-shaped, i.e. if x is in the variety so is every scalar multiple ax. A variety

is not generally a manifold since it can have singularities in the neighborhood of which it is

not homeomorphic to Euclidean space. However, points q with relatively open neighbor-

hoods U q CP that are manifolds are dense in P [17, Theorem 4.2.4] so that the following

definition makes sense: the dimension of P at p , written d\m p {P), is

dim p (F) ^ limsup dim(f/ 9 ) .

qtVqCP

Uq a manifold

We in turn define the dimension of the variety P as the maximum over all p â‚¬P of dim p (P). If

dim / ,(/ > ) is constant for all p, we call P pure dimensional. A complex variety defined by a

single nonconstant polynomial is called a complex hypersurface. Complex hypersurfaces are

pure dimensional with codimension 2. A real hypersurface has codimension 1 everywhere.

The real variety defined by the polynomials/!, . . . ,f p is called a complete intersection if it is

pure dimensional of codimension p.

Now we define the degree of a purely /i-dimensional variety P in IR A . Let L N ~" be a

N â€” n dimensional linear manifold (plane) in IR A . Since dim(L A _,1 )-+-dim(/ > )=dim(lR A ) = iV

we say that L A ~" and IR^ are of complementary dimension. Generically, L h ~" and P will

intersect in a surface of codimension equal to the sum of their codimensions, that is N. In

other words, their intersection will be of dimension (a finite collection of points). If P is a

complex homogeneous variety, then for almost all planes L A_ " this collection will contain the

same number of points, and this common number is called the degree of P, and is written

deg(P) (see [17, Theorem 4.6.2]). Intuitively, deg(P) gives the number of "leaves" of the

variety P that a typical plane L A_n will intersect. In the case of a nonhomogeneous or real

variety P, deg(P) is defined analogously as the maximum (finite) intersection number of a

plane L N ~" and the n-dimensional variety P in 1R N , although the intersection number will not

generally be constant for almost all L N ~".

This concept of degree is a generalization of the degree of a polynomial. Indeed, if P is

complex and defined as the solution set of a single irreducible polynomial, then the degree of

the polynomial equals the degree of P as defined above [17].

By l-volume of an n-dimensional manifold M (l^n) we mean the /-dimensional Lebesgue

measure of M, if it exists. Note that if l>n this volume is zero. The notations vol(Af) and

volâ€ž(M) denote the n-volume of the n-dimensional manifold M.

3. Condition Numbers and the Distance to the Nearest Ill-Posed Problem

We claim that many classes of numerical analysis problems permit the following

geometric characterization of their condition numbers:

(i) Certain problems of each class are ill-posed, i.e. their condition numbers are infinite.

These problems form a variety within the space of all problems.

(ii) The condition number of a problem has a simple geometric interpretation: it is propor-

tional to (or bounded by a multiple of) the reciprocal of the distance to the set of ill-

posed problems. Thus, as a problem gets closer to the set of ill-posed ones, its condition

number approaches infinity.

In this section we will cite results from the literature to prove these claims for the fol-

lowing three classes of problems: matrix inversion, polynomial zero finding, and eigenvalue

calculation. Afterwards we will outline why this characterization applies to many other prob-

lems as well [4].

First we need to define condition number more precisely. If X is our space of problems

equipped with norm ||-||x, Y our space of solution equipped with norm ||-||y, and / -JC^Y is the

solution map for our problem, the usual definition of the relative condition number is

*rel(f,x) â– limsup (3.1)

6*-0 ll 6 *!!* ' ||*||x

||P/(*)ib-||*||x

llrt*)l|y

if the Jacobian Df exists (||-|[at is the induced norm). In many cases the essential information

about the conditioning is contained in the ||D/]|at factor. We may therefore use a multiple of

||DF||xy instead of kâ€ži without losing essential information.

All three of our problems are homogeneous: multiplying the problem by a scalar does

not change the condition number. Therefore, the set of ill-posed problems will also be

homogeneous, or cone-shaped. This permits us to normalize all our problems to have unit

norm (lie on the unit sphere in either IR/ 1 "' or C A ), and implies that any results on the distribu-

tion of the condition number will hold for any distribution of problems inducing the same dis-

tribution of jc/||jc|[ on the unit sphere.

Matrix Inversion: The usual relative condition number as defined in (3.1) with the ||-|| norm

on both the problem and solution spaces is [10]:

Kâ€ž,(A) = HAIHIA-'II .

We shall use the nearly equivalent condition number

k(a)Â» iiaiihia-'h â€¢

These condition numbers are both homogeneous, and infinite when A is singular, so the set

of ill-posed problems is a variety defined by the single n-th degree homogeneous irreducible

polynomial det(A) = 0, where n = dim(A) [31]. Denote the set of ill-posed problems by IP.

From the last section, we see that if A is complex, IP is a complex hypersurface. If A is real,

it is easy to verify that IP is still a real hypersurface by using the explicit parameterization

provided by Gaussian elimination [3].

A theorem of Eckart and Young [5] gives the distance from a nonsingular matrix to IP:

Theorem 3.1: [5] dist(A,/P) = HA -1 !!" 1 .

(Gastinel [7] proved this result for an arbitrary operator norm.)

Therefore, we see that in terms of k we may write

if \\A\\ F = 1 , then dist(A,//>) = 1/k(A) , (3.2)

i.e. that the distance from a normalized problem A to the nearest ill-posed problem is the

reciprocal of its condition number.

Polynomial Zero Finding: In this case we are interested in the sensitivity of the zeros of a

polynomial to small perturbations in the coefficients. If p (*) is an nth degree polynomial, let

\\p II denote the Euclidean norm of the vector of its coefficients. If p (z) = and 8p is a small

perturbation of p, it is easy to verify that to first order the perturbed polynomial p +hp has a

zero at z + 8z where

Bz =

â– 8p(z)

P'(*)

implying that the relative condition number is

K rf ;0,z)

n(z)-\\p\

P'MI

where

Â»Â«- k _1 l(2 l* 2 'l)

1/2

Note that the condition number depends both on the polynomial p and the choice of zero z.

For simplicity we will use the similar condition number

k( z) s Up II ,

[p ' ] !/>'(*) I

Both condition numbers are infinite when p'(z) = Q, i.e. when z is a multiple zero. Thus we

will take the set IP of ill-posed problems to be those polynomials with multiple zeros. A

necessary and sufficient condition for a polynomial to have a multiple zero is that its discrim-

inant, an irreducible homogeneous polynomial of degree In -2 in the coefficients of p [31],

be zero. If p is complex, this implies the set of polynomials with zero discriminant is a hyper-

surface. If p is real, this set of polynomials is still a hypersurface, as may be verified using

the parameterization provided by the leading coefficient pâ€ž and the zeros. The discriminant

may also be zero if the two leading coefficients of p equal zero (corresponding to a double

eigenvalue at Â°Â°), but this set is a subvariety of double the codimension of the hypersurface in

which it lies, and so forms a set of measure zero we may neglect.

Now we need to estimate the distance from a given polynomial to one with a multiple

zero. The estimate we shall use is due to Hough [14] and says

Theorem 3.2: [14, 4] The distance dist(p,/P) from the polynomial p of degree at least 2 to

one with a multiple zero is bounded by

dist(p,//>)< y/2-\p'{z)\

where p (z) = 0.

In fact, this is quite a weak result gotten by estimating the smallest change in p needed

to make a double zero at z, which turns out to be a linear least squares problem. Thus we

may write

V7

if ||p || = 1, then dist(p, IP) Â£ â€” ; r (3.3)

i.e. that the distance from a normalized problem p to the nearest ill-posed problem is

bounded by a multiple of the reciprocal of its condition number.

To see how much (3.3) may overestimate dist(p,// > ), we present a lower bound. Note

that by changing the argument of p from x to ax (a scalar) we may make the leading coeffi-

cient pâ€ž larger than the other coefficients.

Theorem 3.3: [4] Assume that p is an n-th degree polynomial satisfying ||p||=l and

|Pi l< \Pn \ln for i min (-r- , 2 2 , r) (3.4)

Thus we see that the distance to the nearest ill-posed problem is bounded below essentially

by a multiple of the square of the condition number. This is a general phenomenon among

algebraic problems to which we return below.

Eigenvalue calculations: We will be interested both in the sensitivity of eigenvalues and eigen-

vectors. More precisely, we will consider the sensitivity of the projection associated with an

eigenvalue [16]. If 7" is a matrix with simple eigenvalue X., right eigenvector x and left eigen-

vector y, the projection P associated with A. is the matrix P = xy T ly T x. The reduced resolvent

associated with X is the matrix

S = lim (I-P)(T-z)

- i

If T has n distinct eigenvalues X, with projections P, one can write

s = 2 (x,-x)- 1 /\ â€¢

If 8T is a small perturbation of T, one can show that to first order X changes to X + 5X and P

changes to P + hP [16] where

8X = trP87" and hP = -SbTP - PbTS

It is easy to verify that |5X | can be as large as ||P||-||8r|| and ||8P|| can be at least as large as

||S||-||P||-||87"|| (and no more than twice as large as this). Therefore we may take as condition

numbers

) < V2-||r|| f / ||P||.

Therefore, in terms of k we may write

if ||r|| f = 1, then dist(r,//>) < -â€”Â±- . (3.5)

k(T,\)

Wilkinson's theorem provides a somewhat weak upper bound on dist(7",// ) ). The condition

for P on the other hand provides a lower bound on dist(T,IP):

Theorem 3.5: [4] dist(7\/P) > ||r|| f /(7-K(7",/>)).

This result lets us write

if ||r|| F = 1, then dist(r,/P) s - -Â± â€” . (3.6)

For somewhat stronger results and discussion, see [4].

The phenomenon described above for matrix inversion, polynomial zero finding and

eigenvalue calculation is actually quite common in numerical analysis. It turns out all the

above results can be derived from the same underlying principle, that the condition number k

satisfies one or both of the following differential inequalities:

m-K 2 < ||Dk|| < M-k 2 .

where Dk is the gradient of k. The lower bound on ||Dk|| yields the upper bound on

dist(r,/P)

dist(7\//>) s 1 â€”-

m-K{T)

and the upper bound on ||Dk|| yields the lower bound

1

dht(T,IP)

M-k(T)

This phenomenon also appears in pole placement in linear control theory, Newton's method,

and elsewhere [4]. In the case of algebraic functions, one can show that at least for asymp-

totically large condition numbers, differential inequalities of the form

m-K 2 < ||Dk|| Â£ Af-K 3

hold, the new upper bound on ||Z?k|| yielding the lower bound on dist(7\/P)

which is the source of inequality (3.4) above.

Note also that the sets of ill-posed problems are hypersurfaces in our three examples

above. Other kinds of varieties are possible as well. For example, polynomials with at most

m distinct zeros form a subvariety of the variety of polynomials with at least one multiple

zero, and have codimension 2(n - m) (if complex) orn-m (if real) [3]. Since _/-tuple zeros

are more sensitive than j - 1-tuple zeros [34], there is a natural hierarchy of of sets of ever

more ill-posed problems, each one forming a subvariety of the previous set. Similar com-

ments apply to eigenvalue calculations (_/'-tuple eigenvalues are more sensitive than j â€” 1-tuple

eigenvalues) and rank-deficient linear least-squares problems (problems with higher rank

deficiency are more sensitive than ones with lower rank deficiency). The results of the next

section apply to these situations as well.

4. On Volumes of Tubes

In this section we state our main volume estimates. Proofs appear in section 7 below.

First we consider complex varieties. The upper bounds in the following theorem are

obtained by generalizing an argument of Renegar [23], who obtained the upper bound for

hypersurfaces:

Theorem 4.1: Suppose M is a complex, purely 2d-dimensional variety in C^. Let /(e) be the

fraction of the volume of the unit ball in C A which is within distance eÂ£l of M. Then

/( â‚¬ )< ^T{N+ 1/2) e 2 N 2^ N -l)2N-2d-2. d (M) . e 2(N- t f). (l + Ne) 2d (41)

J ^ ' r(N-d+ V2)T(d+ 1/2) v ' ev '

If M is a hypersurface (d =N - 1), then this upper bound may be improved to

/(e) s; c 2 ^ 3 -deg(A/)-e 2 -(l + A^â‚¬) 2(A '- 1) . (4.2)

If M passes through the origin, it is also true that

(1-e)" , ,fV -^ T(N ~ d+ l'^rO/+ 1/2) < /(e)

x v^7 t

(4.3)

deg(M) V-iT T(N+ 1/2)

If M is a hypersurface passing through the origin this lower bound may be improved to

anc i f or

Online Library → James W Demmel → The probability that a numerical analysis problem is difficult → online text (page 1 of 4)