James W Demmel.

# The probability that a numerical analysis problem is difficult online

. (page 1 of 4)
Online LibraryJames W DemmelThe probability that a numerical analysis problem is difficult → online text (page 1 of 4)
Font size â–

Computer Science Department

TECHNICAL REPORT

The Probability that a Numerical
Analysis Problem is Difficult

by

James W. Demmel

Technical Report #294
April, 1987

o

XI

o

&â– *
>1

[got*

NEW YORK UNIVERSITY

epartment of Computer Science
Courant Institute of Mathematical Sciences

251 MERCER STREET, NEW YORK, N.Y 10012

/

The Probability that a Numerical
Analysis Problem is Difficult

by

James W. Demmel

Technical Report #294
April, 1987

The Probability that a Numerical Analysis Problem is Difficult

James W. Demmel

Courant Institute

New York, NY 10012

Abstract

Numerous problems in numerical analysis, including matrix inversion, eigenvalue calculations
and polynomial zero finding, share the following property: the difficulty of solving a given
problem is large when the distance from that problem to the nearest "ill-posed" one is small.
For example, the closer a matrix is to the set of noninvertible matrices, the larger its condi-
tion number with respect to inversion. We show that the sets of ill-posed problems for matrix
inversion, eigenproblems, and polynomial zero finding all have a common algebraic and
geometric structure which lets us compute the probability distribution of the distance from a
"random" problem to the set. From this probability distribution we derive, for example, the
distribution of the condition number of a random matrix. We examine the relevance of this
theory to the analysis and construction of numerical algorithms destined to be run in finite
precision arithmetic.

AMS classifications: 15A12, 53C65, 60D05

1. Introduction

To investigate the probability that a numerical analysis problem is difficult, we need to
do three things:

1) Choose a measure of difficulty,

2) Choose a probability distribution on the set of problems,

3) Compute the distribution of the measure of difficulty induced by the distribution on the
set of problems.

The measure of difficulty we shall use in this paper is the condition number, which meas-
ures the sensitivity of the solution to small changes in the problem. For the problems we con-
sider in this paper (matrix inversion, polynomial zero finding and eigenvalue calculation),
there are well known condition numbers in the literature of which we shall use slightly modi-
fied versions to be discussed more fully later. The condition number is an appropriate meas-
ure of difficulty because it can be used to measure the expected loss of accuracy in the com-
puted solution, or even the number of iterations required for an iterative algorithm to con-
verge to a solution.

The probability distribution on the set of problems for which we will attain most of our
results will be the "uniform distribution" which we define as follows. We will identify each
problem as a point in either IR N (if it is real) or C N (if it is complex). For example, a real n
by n matrix A will be considered to be a point in IR""", where each entry of A forms a coordi-
nate in IR"' in the natural way. Similarly, a complex nth degree polynomial can be identified
with a point in C"" 1 " 1 by using its coefficients as coordinates. On the space JR N (or C N ) we
will take any spherically symmetric distribution, i.e. the induced distribution of the normal-
ized problem jc/|[jc|| (||-|| is the Euclidean norm) must be uniform on the unit sphere in IR .
For example, we could take a uniform distribution on the interior of the unit ball in IR , or
let each component be an independent Gaussian random variable with mean and standard

deviation 1. Our answers will hold for this entire class of distributions because our condition
numbers are homogeneous (multiplying a problem by a nonzero scalar does not change its
condition number).

The main justification for using a uniform distribution is that it appears to be fair: each
problem is as likely as any other. However, it does not appear to apply in many practical
cases for a variety of reasons, including the fact that any set of problems which can be
represented in a computer is necessarily discrete rather than continuous. We will discuss the
validity of our choice of uniform distribution as well as alternatives at length in section 6
below.

Finally, given this distribution, we must compute the induced probability distribution of
the condition number. It turns out that all the problems we consider here have a common
geometric structure which lets us compute the distributions of their condition numbers with a
single analysis, which goes as follows:

(i) Certain problems of each kind are ill-posed, i.e. their condition number is infinite.
These ill-posed problems form an algebraic variety within the space of all problems. For
example, the singular matrices are ill-posed with respect to the problem of inversion,
and they lie on the variety where the determinant, a polynomial in the matrix entries, is
zero. Geometrically, varieties are possibly self-intersecting surfaces in the space of
problems.

(/'/') The condition number of a problem has a simple geometric interpretation: it is propor-
tional to (or bounded by a multiple of) the reciprocal of the distance to the set of ill-
posed problems. Thus, as a problem gets closer to the set of ill-posed ones, its condition
number approaches infinity. In the case of matrix inversion, for example, the traditional
condition number is exactly inversely proportional to the distance to the nearest singular
matrix.

(Hi) The last observation implies that the set of problems of condition number at least x is
(approximately) the set of problems within distance clx (c a constant) of the variety of
ill-posed sets. Sets of this sort, sometimes called tubular neighborhoods, have been stu-
died extensively by geometers. We will present upper bounds, lower bounds, and
asymptotic values for the volumes of such sets. The asymptotic results, lower bounds,
and some of the upper bounds are new. The formulae are very simple, depending only
on x, the degree N of the ambient space, the dimension of the variety, and the degree of
the variety. These volume bounds in turn bound the volume of the set of problems with
condition number at least x. Since we are assuming the problems are uniformly distri-
buted, volume is proportional to probability.

Thus, for example, we will prove that a scaled version k(A) = ||A||r-||A _1 || of the usual
condition number of a complex matrix with respect to inversion satisfies

1 â€” -\ =s Prob( K (A) & x) < * â€”

and that asymptotically

Prob(K(A) * x) = W( " 2 ~ 1) + o(\)

In other words, the probability that the condition number exceeds x decreases as the square
of the reciprocal of x. Even for moderate x the upper bound exceeds the asymptotic limit by
a ratio of only about e 2 n 2 . If A is real we will show

C V~ ^)" 2 "' * Prob(K(A) * *) * 2 2- ["/I- N*

where C is a constant proportional to the n 2 - 1-dimensional volume of the set of singular

matrices inside the unit ball. Thus, for real matrices the probability that the condition number
exceeds x decreases as x _1 .

There are a number of open questions and conjectures concerning these volume bounds,
in particular for how general a class of real varieties they apply (the case of complex varieties
is simpler). We will discuss the history of this work and open problems in detail in section 4
below.

It turns out that the reciprocal relationship between condition number and distance to
the nearest ill-posed problem holds for a much wider class of problem than just matrix inver-
sion, polynomial zero finding and eigenvalue calculations: it is shared, at least asymptotically,
by any problem whose solution is an algebraic function. For simplicity we shall restrict our-
selves to the three aforementioned problems, but our results do apply more widely, as dis-
cussed in section 3 below and in .

This work was inspired by earlier work in a number of fields. Demmel , Gastinel ,
Hough , Kahan , Ruhe , Stewart , Wilkinson [35, 36, 37] and others have
analyzed the relationship between the condition number and the distance to the nearest ill-
posed problem mentioned above in (/'/). Gray [8, 9], Griffiths , Hotelling , Lelong
, Ocneau , Renegar , Santalo , Smale , and Weyl  have worked on
bounds of volumes of tubular neighborhoods. These volume bounds have been used by Smale
[26, 27], Renegar  and others to analyze the efficiency of Newton's method for finding
zeros of polynomials. This latter work inspired the author  to apply these bounds to condi-
tioning. Ocneanu  and Kostlan  have also analyzed the statistical properties of the
condition number for matrix inversion.

The rest of this paper is organized as follows. Section 2 defines notation. Section 3
discusses the relationship between conditioning and the distance to the nearest ill-posed prob-
lem. Section 4 presents the bounds on the volumes of tubular neighborhoods we shall use,
and states some related open problems. Section 5 computes the distributions of the condition
numbers of our three problems. Section 6 discusses the limitations of assuming a uniform dis-
tribution and suggests alternatives and open problems. Section 7 contains the proofs of the
theorems in section 4.

2. Notation

We introduce several ideas we will need from numerical analysis, algebra, and
geometry. ||x|| will denote the Euclidean norm of the vector x as well as the induced matrix
norm

" " 58 *m â–

\\A\\f will denote the Frobenius norm

I|a||f-(2 M 2 ) m â–

ij
If P is a set and x is a point, we will let d'ist(x,P) denote the Euclidean distance from x to the
nearest point in P.

A subset M of IR N is called an n- dimensional manifold if it is locally homeomorphic to
IR". We also write n = dim(M). The codimension of M, written codim(M), is N-n. In this
paper dimension will always refer to the real dimension rather than the complex dimension,
which is half the real dimension.

A variety is the set of solutions of a system of polynomial equations. A variety is homo-
geneous if it is cone-shaped, i.e. if x is in the variety so is every scalar multiple ax. A variety
is not generally a manifold since it can have singularities in the neighborhood of which it is

not homeomorphic to Euclidean space. However, points q with relatively open neighbor-
hoods U q CP that are manifolds are dense in P [17, Theorem 4.2.4] so that the following
definition makes sense: the dimension of P at p , written d\m p {P), is

dim p (F) ^ limsup dim(f/ 9 ) .

qtVqCP

Uq a manifold

We in turn define the dimension of the variety P as the maximum over all p â‚¬P of dim p (P). If
dim / ,(/ > ) is constant for all p, we call P pure dimensional. A complex variety defined by a
single nonconstant polynomial is called a complex hypersurface. Complex hypersurfaces are
pure dimensional with codimension 2. A real hypersurface has codimension 1 everywhere.
The real variety defined by the polynomials/!, . . . ,f p is called a complete intersection if it is
pure dimensional of codimension p.

Now we define the degree of a purely /i-dimensional variety P in IR A . Let L N ~" be a
N â€” n dimensional linear manifold (plane) in IR A . Since dim(L A _,1 )-+-dim(/ > )=dim(lR A ) = iV
we say that L A ~" and IR^ are of complementary dimension. Generically, L h ~" and P will
intersect in a surface of codimension equal to the sum of their codimensions, that is N. In
other words, their intersection will be of dimension (a finite collection of points). If P is a
complex homogeneous variety, then for almost all planes L A_ " this collection will contain the
same number of points, and this common number is called the degree of P, and is written
deg(P) (see [17, Theorem 4.6.2]). Intuitively, deg(P) gives the number of "leaves" of the
variety P that a typical plane L A_n will intersect. In the case of a nonhomogeneous or real
variety P, deg(P) is defined analogously as the maximum (finite) intersection number of a
plane L N ~" and the n-dimensional variety P in 1R N , although the intersection number will not
generally be constant for almost all L N ~".

This concept of degree is a generalization of the degree of a polynomial. Indeed, if P is
complex and defined as the solution set of a single irreducible polynomial, then the degree of
the polynomial equals the degree of P as defined above .

By l-volume of an n-dimensional manifold M (l^n) we mean the /-dimensional Lebesgue
measure of M, if it exists. Note that if l>n this volume is zero. The notations vol(Af) and
volâ€ž(M) denote the n-volume of the n-dimensional manifold M.

3. Condition Numbers and the Distance to the Nearest Ill-Posed Problem

We claim that many classes of numerical analysis problems permit the following
geometric characterization of their condition numbers:

(i) Certain problems of each class are ill-posed, i.e. their condition numbers are infinite.
These problems form a variety within the space of all problems.

(ii) The condition number of a problem has a simple geometric interpretation: it is propor-
tional to (or bounded by a multiple of) the reciprocal of the distance to the set of ill-
posed problems. Thus, as a problem gets closer to the set of ill-posed ones, its condition
number approaches infinity.

In this section we will cite results from the literature to prove these claims for the fol-
lowing three classes of problems: matrix inversion, polynomial zero finding, and eigenvalue
calculation. Afterwards we will outline why this characterization applies to many other prob-
lems as well .

First we need to define condition number more precisely. If X is our space of problems
equipped with norm ||-||x, Y our space of solution equipped with norm ||-||y, and / -JC^Y is the
solution map for our problem, the usual definition of the relative condition number is

*rel(f,x) â–  limsup (3.1)

6*-0 ll 6 *!!* ' ||*||x

||P/(*)ib-||*||x

llrt*)l|y

if the Jacobian Df exists (||-|[at is the induced norm). In many cases the essential information
about the conditioning is contained in the ||D/]|at factor. We may therefore use a multiple of
||DF||xy instead of kâ€ži without losing essential information.

All three of our problems are homogeneous: multiplying the problem by a scalar does
not change the condition number. Therefore, the set of ill-posed problems will also be
homogeneous, or cone-shaped. This permits us to normalize all our problems to have unit
norm (lie on the unit sphere in either IR/ 1 "' or C A ), and implies that any results on the distribu-
tion of the condition number will hold for any distribution of problems inducing the same dis-
tribution of jc/||jc|[ on the unit sphere.

Matrix Inversion: The usual relative condition number as defined in (3.1) with the ||-|| norm
on both the problem and solution spaces is :

Kâ€ž,(A) = HAIHIA-'II .

We shall use the nearly equivalent condition number

k(a)Â» iiaiihia-'h â€˘

These condition numbers are both homogeneous, and infinite when A is singular, so the set
of ill-posed problems is a variety defined by the single n-th degree homogeneous irreducible
polynomial det(A) = 0, where n = dim(A) . Denote the set of ill-posed problems by IP.
From the last section, we see that if A is complex, IP is a complex hypersurface. If A is real,
it is easy to verify that IP is still a real hypersurface by using the explicit parameterization
provided by Gaussian elimination .

A theorem of Eckart and Young  gives the distance from a nonsingular matrix to IP:
Theorem 3.1:  dist(A,/P) = HA -1 !!" 1 .
(Gastinel  proved this result for an arbitrary operator norm.)

Therefore, we see that in terms of k we may write

if \\A\\ F = 1 , then dist(A,//>) = 1/k(A) , (3.2)

i.e. that the distance from a normalized problem A to the nearest ill-posed problem is the
reciprocal of its condition number.

Polynomial Zero Finding: In this case we are interested in the sensitivity of the zeros of a
polynomial to small perturbations in the coefficients. If p (*) is an nth degree polynomial, let
\\p II denote the Euclidean norm of the vector of its coefficients. If p (z) = and 8p is a small
perturbation of p, it is easy to verify that to first order the perturbed polynomial p +hp has a
zero at z + 8z where

Bz =

â– 8p(z)

P'(*)

implying that the relative condition number is

K rf ;0,z)

n(z)-\\p\

P'MI

where

Â»Â«- k _1 l(2 l* 2 'l)

1/2

Note that the condition number depends both on the polynomial p and the choice of zero z.
For simplicity we will use the similar condition number

k( z) s Up II ,
[p ' ] !/>'(*) I

Both condition numbers are infinite when p'(z) = Q, i.e. when z is a multiple zero. Thus we
will take the set IP of ill-posed problems to be those polynomials with multiple zeros. A
necessary and sufficient condition for a polynomial to have a multiple zero is that its discrim-
inant, an irreducible homogeneous polynomial of degree In -2 in the coefficients of p ,
be zero. If p is complex, this implies the set of polynomials with zero discriminant is a hyper-
surface. If p is real, this set of polynomials is still a hypersurface, as may be verified using
the parameterization provided by the leading coefficient pâ€ž and the zeros. The discriminant
may also be zero if the two leading coefficients of p equal zero (corresponding to a double
eigenvalue at Â°Â°), but this set is a subvariety of double the codimension of the hypersurface in
which it lies, and so forms a set of measure zero we may neglect.

Now we need to estimate the distance from a given polynomial to one with a multiple
zero. The estimate we shall use is due to Hough  and says

Theorem 3.2: [14, 4] The distance dist(p,/P) from the polynomial p of degree at least 2 to
one with a multiple zero is bounded by

dist(p,//>)< y/2-\p'{z)\

where p (z) = 0.

In fact, this is quite a weak result gotten by estimating the smallest change in p needed
to make a double zero at z, which turns out to be a linear least squares problem. Thus we
may write

V7

if ||p || = 1, then dist(p, IP) ÂŁ â€” ; r (3.3)

i.e. that the distance from a normalized problem p to the nearest ill-posed problem is
bounded by a multiple of the reciprocal of its condition number.

To see how much (3.3) may overestimate dist(p,// > ), we present a lower bound. Note
that by changing the argument of p from x to ax (a scalar) we may make the leading coeffi-
cient pâ€ž larger than the other coefficients.

Theorem 3.3:  Assume that p is an n-th degree polynomial satisfying ||p||=l and
|Pi l< \Pn \ln for i min (-r- , 2 2 , r) (3.4)

Thus we see that the distance to the nearest ill-posed problem is bounded below essentially
by a multiple of the square of the condition number. This is a general phenomenon among
algebraic problems to which we return below.

Eigenvalue calculations: We will be interested both in the sensitivity of eigenvalues and eigen-
vectors. More precisely, we will consider the sensitivity of the projection associated with an
eigenvalue . If 7" is a matrix with simple eigenvalue X., right eigenvector x and left eigen-
vector y, the projection P associated with A. is the matrix P = xy T ly T x. The reduced resolvent
associated with X is the matrix

S = lim (I-P)(T-z)

- i

If T has n distinct eigenvalues X, with projections P, one can write

s = 2 (x,-x)- 1 /\ â€˘

If 8T is a small perturbation of T, one can show that to first order X changes to X + 5X and P
changes to P + hP  where

8X = trP87" and hP = -SbTP - PbTS

It is easy to verify that |5X | can be as large as ||P||-||8r|| and ||8P|| can be at least as large as
||S||-||P||-||87"|| (and no more than twice as large as this). Therefore we may take as condition
numbers

) < V2-||r|| f / ||P||.

Therefore, in terms of k we may write

if ||r|| f = 1, then dist(r,//>) < -â€”Â±- . (3.5)

k(T,\)

Wilkinson's theorem provides a somewhat weak upper bound on dist(7",// ) ). The condition
for P on the other hand provides a lower bound on dist(T,IP):

Theorem 3.5:  dist(7\/P) > ||r|| f /(7-K(7",/>)).
This result lets us write

if ||r|| F = 1, then dist(r,/P) s - -Â± â€” . (3.6)

For somewhat stronger results and discussion, see .

The phenomenon described above for matrix inversion, polynomial zero finding and
eigenvalue calculation is actually quite common in numerical analysis. It turns out all the
above results can be derived from the same underlying principle, that the condition number k
satisfies one or both of the following differential inequalities:

m-K 2 < ||Dk|| < M-k 2 .

where Dk is the gradient of k. The lower bound on ||Dk|| yields the upper bound on

dist(r,/P)

dist(7\//>) s 1 â€”-

m-K{T)

and the upper bound on ||Dk|| yields the lower bound

1

dht(T,IP)

M-k(T)

This phenomenon also appears in pole placement in linear control theory, Newton's method,
and elsewhere . In the case of algebraic functions, one can show that at least for asymp-
totically large condition numbers, differential inequalities of the form

m-K 2 < ||Dk|| ÂŁ Af-K 3

hold, the new upper bound on ||Z?k|| yielding the lower bound on dist(7\/P)

which is the source of inequality (3.4) above.

Note also that the sets of ill-posed problems are hypersurfaces in our three examples
above. Other kinds of varieties are possible as well. For example, polynomials with at most
m distinct zeros form a subvariety of the variety of polynomials with at least one multiple
zero, and have codimension 2(n - m) (if complex) orn-m (if real) . Since _/-tuple zeros
are more sensitive than j - 1-tuple zeros , there is a natural hierarchy of of sets of ever
more ill-posed problems, each one forming a subvariety of the previous set. Similar com-
ments apply to eigenvalue calculations (_/'-tuple eigenvalues are more sensitive than j â€” 1-tuple
eigenvalues) and rank-deficient linear least-squares problems (problems with higher rank
deficiency are more sensitive than ones with lower rank deficiency). The results of the next
section apply to these situations as well.

4. On Volumes of Tubes

In this section we state our main volume estimates. Proofs appear in section 7 below.

First we consider complex varieties. The upper bounds in the following theorem are
obtained by generalizing an argument of Renegar , who obtained the upper bound for
hypersurfaces:

Theorem 4.1: Suppose M is a complex, purely 2d-dimensional variety in C^. Let /(e) be the
fraction of the volume of the unit ball in C A which is within distance eÂŁl of M. Then

/( â‚¬ )< ^T{N+ 1/2) e 2 N 2^ N -l)2N-2d-2. d (M) . e 2(N- t f). (l + Ne) 2d (41)

J ^ ' r(N-d+ V2)T(d+ 1/2) v ' ev '

If M is a hypersurface (d =N - 1), then this upper bound may be improved to

/(e) s; c 2 ^ 3 -deg(A/)-e 2 -(l + A^â‚¬) 2(A '- 1) . (4.2)

If M passes through the origin, it is also true that

(1-e)" , ,fV -^ T(N ~ d+ l'^rO/+ 1/2) < /(e)

x v^7 t

(4.3)

deg(M) V-iT T(N+ 1/2)
If M is a hypersurface passing through the origin this lower bound may be improved to

anc i f or

1 3 4

Online LibraryJames W DemmelThe probability that a numerical analysis problem is difficult → online text (page 1 of 4)