J. H Curtiss.

# A theoretical comparison of the efficiencies of two classical methods and a Monte Carlo method for computing one component of the solution of a set of linear algebraic equations online

. (page 1 of 4)
Font size

COPY NO.

iÂ§ WaVWly P1*SÂ«/ Nw York 3, N. Y.

NEW YORK UNIVERSITY
INSTITUTE OF
MATHEMATICAL SCIENCES

A Theoretical Comparison of the Efficiencies

of Two Classical Methods and a

Monte Carlo Method for Computing

One Component of the Solution

of a Set of Linear Algebraic Equations

J. H. CURTISS

Research in the Field of Probability,
Statistics, and Numerical Analysis
(Monte Carlo Method)

TECHNICAL REPORT NO. Ill

IMM-NYU 211
MAY 1954

CONTRACT NO. D A- 3 0- 069-ORD- 1 2 5 7

ARMY OFFICE OF ORDNANCE RESEARCH

.â€¢;av/ vor;-. LI". .â€¢.,_.â€¢ -.
INSTITUTE OF MATHEMA'I iC-\L SClÂ£iNCES
LIBRARY
as Waverly Place, New York 3, N. Y.

IMM-NYU 211
May 1951^.

A THEORETICAL COMPARISOI^ OF THE EFFICIENCIES
OF TV/0 CLASSICAL METHODS AND A MONTE CARLO METHOD
FOR COMPUTING ONE COMPONENT OF THE SOLUTION
OF A SET OF LINEAR ALGEBRAIC EQUATIONS

by
J. H. Curtiss

Based on an invited address given on March 17, 195i|
at a Symposium on Monte Carlo Methods sponsored by
the Aeronautical Research Laboratory, Wright Air
Development Center, and conducted by the Statistical
Laboratory, University of Florida, at Gainesville,
Florida.

Contract No. DA-30-069-0RD-1257

Research in the field of Probability,
Statistics, and Numerical Analysis
(Monte Carlo Method)

Technical Report No. 3

This report represents results obtained at the
Institute of Mathematical Sciences, New York University,
under the sponsorship of the Army Office of Ordnance
Research.

New York, 1951^

â€¢ f-l"

J:

â–  :', aiiUj =

ERRATA FOR IMM-NYU 211

p. 5, line 11. For " J" a, ,''* read " T~ a, "

For " Â£3 a^ ,''* read " J"
X I

p. 11, line 10. For "it cannot be singular," read "then I *â–  H

cannot be singular"*

p. 13, last line before footnote. For "vector," read "vectors".

p. 11+, formula (2,7). In first line, delete numeral 1 below first

In Second line, delete comma and "N = 1,2,.,."

p. 26, line 2. Nximber this formula as (3.1^.) *

p. 27, formula (3Â«5)Â« Close parentheses after | |h| | but before

exponent 2 in denominator *

p. 27, line 13 i Insert "P = [|h^.M" before "therefore".

p. 28, line 1 . Delete "P = [|h^.|]".

11 n St t)

p. 28, formula (3.7) â€¢ For ||p||, read ||p|| in last two members

of the inequality.

p. 35 Â» line 13 . For "number", read "numbers".

p. 35, line 5 from bottom. Delete "of".

p. 38, formula (1;.6). For "2llo|l^", read Â«2||p||^".

p. 38, line 2 and line 3 from bottom. Delete "with high probability

(specifically, a probability of 95Â°^o),"

p. I4.I, line 10. Insert "i = " before "i^" .

p. l|l, line 11. For "1-th", read "i^-th".

p. Ul, line 12. For "i-th", read "i^-th".

p. I4.I, line 17. For element", read "component".

i.::-, :JYvI-MM

-, I :

IJri.,;. if:

ni

Ni

C JT!0 :

j: .:; .'

I ; m'

"r^M^ }.,..:,.-r'

'n^nc;u!:o-

ERPATA FOR IMM-NYU 211 (oont.)

p. [4.2, line 10 ff. Delete paragraph beginning "Now the key..."
p. I|.2, line 15. Insert "from formulas (5*1) , (5*2), and

(5.3)" before "that",
p. kk, Note (c). Change "ic""'-^" to "lO''-^" in second line,
p. [^.8, "formula {6.14.). Insert "^ " after "H " .

p. Skt line 3. In third member of inequality, change

N
Â»jjN-lâ€ž ^^ u^s -1â€ž ^

::!T^

A THEORETICAL C0hPARI30Ij OF THE EFFICIENCIES
OF TWO CLASSICAL rfiiThCDj AMD A KONTE CARLO MjiTKOD
FOR COIIPUTIEG ONE COMPONENT OF THE SOLUTION
OF A SET OF LINioAR ALGEBRAIC E.^UATIQNS

J. H. Curtiss

Summary Â« In this paper a basis is established for a
theoretical comparison of the amount of work required by
three widely different methods of calculating (to a given
accuracy) one component of the solution of a set of simul-
taneous linear algebraic equations. The equations are
assumed to be in the form f = E^: + y> where y is a given
n-dimensional vector and H is a given nz n real matrix.
The amount of vrork is measured by the number of multipli-
cations required under the most unfavorable conditions.
The three methods are (a) the Gauss elimination m.ethod,
(b) the particular stationary linear iterative method de-
fined by the recursion formula 0, Z p, =1. Then a random variable J is set up with
k: - ^ k

the probability distribution given by Pr(J = k) = p ,

k = 1,2, ... â€¢ The random variable z-. will obviously have

a theoretical mean value equal to the sum of the series

a-, + ap + ... . The statistic

z = -A 2. ^

where J.,, Jâ€ž, ... are independent replicas of J, furnishes
an estimator of the mean value of Z-^. (that i,:;. jf the solu-

o

tion of the problem) which has various weil - knovjn optim.um
statistical properties.

It is hard for some classically train'^d numerical
analysts to see hov; Monte Carlo methods can ever be advan-
tageous in such a problem. A sor.iewhat ovei'-simplif ied

5i

version of their reasoning might go as follows. Let s

s
be chosen so that 2a, gives a satisfactory approximation

to the sum of the scries. (If the series is finite, let

s be the number of terms.) Since each observed value of

z ,|. conditionally estimates just one of the numbers a, ,

there will have to be at least as many terms in the sum-

_ s
mation for Z as in Za, Â« Indeed, because of statistical
V 1 â€¢'^

fluctuations it will probably be necessary to make very

many more observations on z , than there are terms in the

s _

finite approximation Sa, , and even then, Z will probably

s
not be as good as 2a, . This means that the comit of

X ^

additions alone v/lll be very much greater for the stochastic
approximation than for the straightforward method of solu-
tion; and then too, there is the trouble of setting up the
z, 's and Pi^ ' s in advance, and of determining stochastically,
over and over again, the value of J to use in the obser-
vations.

The case against this argument can very conveniently
be stated in terms of a specific problem which is funda-
mental to this paper. It is the problem of calculating

N+1
the i-th component of the vector K 9, where K denotes

the 2n X 2n matrix [k. .] and 9 is the 2n-dimensional vector

(t,,...,tâ€ž ). The i-th element of K 0, which we write
1 ' 2n '

as CI 9]., is the sura of (2n)^ term.s of the type

k.. k. . ... k. . t. â€¢ The stochastic estim.ation"
11^ ^1^2 ^irN+1 ^N+1

"See Curtiss [3]Â» (The square brackets refer to the
references at the end of the paper.)

6.

is accomplished by selecting numbers z. . and p. .,

i,j = l,...,2n, such that z^jP^j ~ ^ii' "^^^ Pij = ^'
2p. . = 1 for all i. Then a family of random variables

J ,J ,J,,...,Jâ€ž , is set up in such a way that it repre-
sents a Markov process (or "random x^alk") with states
(resting places) designated by l,2,...,2n and with station-
ary transition probabilities. The specification of the
joint probability distribution is accomplished by assigning
an arbitrary distribution to J^ (but with Pr(J^=i) / 0),
and thereafter using the equations" Pr( J^^-j^= j | Jj^=i ) = p^.,
i, j = l,...,2n. Finally, a random variable

'^o"^l**"^N+l '^o'^l '^1*^2'** '^N'^N+l *^N +1

is set up. It is now easily seen from the definition of
mean value in probability theory, that

E(z1j =i) = K?^^"^9].. We use (Z^+. . .+Z^ )/v as the esti-
mator of K 9]., where Z,,...,Z denote v independent
determinations of Z.

The various possible values of k.. k k. k. t.

correspond to the values of a, in the previous more general

"By Pr(z|b) v;e mean the conditional probability of the
event a, given that the event b has occurred.

By E(xia), we mean the mean value of the conditional
distribution of the random variable X, given that the
event a has occurred.

formulation. Let us think of these possible values as

renumbered in a linear and serial order, using a sxngle

index k. There will be s = (2n)"' suci- numbers. (They

may not be all distinct.) Then the (2n)^ correspondingly

renumbered products P.- .â€¢ â€¢â€¢â€¢?.; a play the role of

^^1 ^N N+1

p, ,Pp,...,p in the previous formulation, and the various

possible values of the vector random variable

J = (J , J, , . . . , J.,^-, ) correspond to the values of J in the
= o 1 N+1

previous formulation.

It now begins to be evident that our formulation of
the general summation problem at the begin ing of the
section was deceptively over-simplified. In a multi-
dimensional summation problem, the following two factors
may come into play on the side of a Monte Carlo method of
solution:

(a) If the calculation of each a, is a complicated

one, it may be possible to arrange things so that the

calculation of each observation on z, is very much simpler

than the calculation of the corresponding term a,. This

will m particular be the case if the calculation of each

a, involves the formation of a continued product, because
k

then part of the worl: of calculating the product can be
sidestepped in the stochastic process by using the multi-
plicative law of probabilities.

(b) Some of the numbers a. In the finite approxima-
tion to Za, may be very unimportant and need not be
represented in the statistical estimator at all. The
stochastic estimation process, if properly set up, will
automatically take care of this by making the appearance
of the representative of a non-essential term a very rare
event .

We add here, more or less as an aside, the remark
that ^^rhen the calculation of each particular z is much
simpler than that of the corresponding aj, then the
problem of the accumulation of round-off errors may not
be nearly as serious for the stochastic method as it is
in the direct computation.

All of these factors favoring the Monte Carlo method
are pi'esent in the case of the problem R 9] . and in the
related problem of solving systems of linear algebraic
equations. The factor (a) is usually particularly in
evidence in numerical problems of stochastic origin, and
indeed it is in such problems that the Monte Carlo Method
has had its chief successes. Matrix problem.s can always
be throvm into a form in wbich they are nuinerically equi-
valent to pi'oblems with a stochastic pedigree. We shall
exploit that fact in the present paper to obtain a favor-
able environment for the com.parisons to be made.

But we cannot conclude this introduction without
making a remark which is on the negative side as far as

9.

Monte Carlo methods are concerned. It certainly would seem
that v/r.enever Monte Carlo methods appear to advantage in
sumraation problems, factor (b) ai^ove must be playing an
important role, becau3C otherwise the criticism regarding
the necessary momber of addition operations required for
any reasonable degree of accuracy in the Monte Carlo
approach V70uld be valid. But if that is so, i^rhy cannot
a detei^ministic method be devised which will ignore the
unimportant terms and be even more efficient? The author
suspects that whr.t we now need is a more i.aghly developed
deterministic theor'y of quadrature and of linear computation
in many dimensions. Vi'hen this becomes availaole, the Monte
Carlo method may, at least for miatrj.x problems, lose the
very modest advantages wi^ich will be claimed for it in
this paper.

2Â« The nuinerical p roblem and its non-stochas t ic solution .
Throughout tne paper we shall cor;ti^.^ue to denote matrices
by capital letters ana their elements by the corresponding
lower case letters; thus, for example, H = [h. .]. We
represent vectors by lovjer case Greek letters and their
components by the corresponding Roman letters; thus for
example, cf = (x-, , Xp, . . . , x ). Vie shall also find it con-
venient occasionally to designate the elements of a matrix

by double subscripts affixed to the symbol for the matrix

-1 2
(thus, H. . or [(I-H) H ]. .). Furthermore we shall frequently

10,

designate the components of a vector by a similar subscript

notation (thus ^ = (c,] , ^] , ..., ^] ), All vectors will

be real and n-dimensional and all matrices will have only

real elements.

By II H II we shall mean max 2 |h. .|, and by H'?; II , we

1 j iJ

shall mean maxfx. j. It is obvious that || Et.\\ ^ |i H || ||^ ||

and that ||a+b||^ || a|| + ||b|| . It is well-knovjn""' that

11 AbII ^ II All I! bII t therefore, by induction, || H^ || ^ I|h ||^ â€¢

Tlie numerica] problem -,'ith which we shall be mainly
concerned is tbat of solving the linear system

(2.1) AE = ^i,

for %. , whei'-e A is a given non-singular n x n matrix and
is a given vector, VJe assume that this system has been
thro'-.^ni into the form

(2.2) -i, = Hf, + Y ,

where H = [h. _.] is an nxn mc;trix vith the property that

{2,3) i! H !! = max S Ih j ^ 1 .

'See for exair.ple Couraiit ana Hllbert [IJ* PÂ« 16, footnote.

11.

It is beyond the scope of this paper to give an
extended discussion of the nethods of passing from (2,1)
to (2.2), but a few general remarks on the subject are in
order at this point. In the first place, it is always
theoretically possible to transform the problera repre-
sented by (2.1) in this manner. Indeed, let H be any
matrix whatsoever with the property (2.3 )â€¢ (For instance,
let H = dl, where I is the unit matrix and d is a scalar
lying between and 1.) It is known that if H satisfies
(2.3), it cannot be singular". Let I'i be defined by the
equation I - i'4A = H. Tl'iis says that M = (I-H)a" , There-
fore M cannot be singular. The system

? = HI + MNi. = (I-m)C + Mn\. ,

is just the same as the system

I4AC = Mn ,

and since II is non-singular, this system is precisely
equivalent (2.1).

But in practice it is not feasible to set up an
arbitrary II satisfying (2.3)> s.nd then to determine I-I as

"0. Taussky-Todd, [9].

12.

above. The reason is thr'it the formula M = (I-H)A~ pre-
supposes a knowledge of A , and in the presence of this
the original problera becori:es trivial. Thus it is natural
to think of M as being chosen first, the choice being made
in such a way that I - MA = H has suitable properties.
There are a number of different procedures in the literature
of linear computation for arriving at an appropriate choice
of M, For ezaiTiple, if A has dominant diagonal - that is,
if |a..| > 2 |a. .1, i = l,...,n, - then M can be chosen
as the inverse of the principal diagonal matrix whose
principal diagonal is that of A. This will obviously insure
that II H|1 < l'"'.

There are nuraerous non-stochastic methods of solving
(2.1) 'Â» VJe shall here restrict our considerstions mainly
to a class of metl ods known as linear iterative processes.
An effective Monte Carlo method for the problem (2.1) can
be based on this type of process, as we presently shall
see â€¢

The general stationary linear iterative process for
solving (2.1) is ari\ived at by throwing (2.1) into an
equivalent fon.i which has the appearance of (2.2), but with
H restricted only by the requirement that its eigenvalues
all lie in the unit circle. An initial estimate Â£ of the

Further discussion will be found in any good treatise on
numerical analysis which deals with Iterative methods of
solving linear equatioxis. See for example Householder [7]
and Kilne [8]. See Porsythe [6] for further references.

See previous footnote for references.

13

solution is mads, and successive approximations Â£,^, E, ,
are then define:! by

(2.[^) Cjj+i = H^^^ + Y , N = 0,1,2,... .

If f' denotes the solution of the equations (2.2), then
clearly, since ^^ = E^^ + r,

(2.5) Â£^ - Cn = H(?^ - Ch-i)

= ^^(^co -^N-1^
= ^''(?oo - ^o^

Thus the condition for convergence for any starting vector
? is that lim H = 0. The well-known necessary and

sufficient condition [?] for lim H = is that all the

N^oo

eigenvalues of H should lie in the unit circle, vdnich
explains Ihe requirement on H imposed earlier in this
paragraph.

But in the present discussion, ^^re go one step beyond
this requirement and insist that ||h|I< 1. Since
||H II g II H II \ this condition will certainly insure that
H^^ -. 0.

For purposes of error analysis it is advantageous
now to introduce the residual vector

'By error in this paper, we shall always mean truncation
error or statistical error or both at once. There will be
no study of round-off error, nor of the effect of miscel-
laneous arithmetical mistal:es.

^.-f^-

rn

â–  -â– .^s:-i i i y ^ f ^ v!T

fi.

n^

Ik*

(2.6) Pn = Y - (I-H)^N

. = H^j^ + Y - ?i^ = '-N+1 " "^'N' N " 0.1,2,..

The vectors p^ are of course always computable at any stage
in the iterative solution. If T,,. -^> '?_ = (I-H) y, then

In CO

obviously p,, -^ 0. The converse is also true, because
(I-H)" p^ = \$ - n^T* Thus pjj, or || pj^|| , are logical measures
of the error in the N-th approximation to the solution. It
is to be noted from (2.. '4.) and (2.6) that

(2.7) Pn = ^^+1 - S = ^Â«'n t ^^ ' ^Â«?N-i â– " Y)

^ ^^'n - ni-1^ = ^Pn-1 ' N = 1,2,*..

- â€¢ â€¢ â€¢ - ^^ Po â€¢

Prom (2.6) and (2.?) it follows that the successive
approximations -'^ generated by (2.1+) can theoretically also
be generated in the following manner: Select 'h as before, and
compute p from the definition of residual vector,
p = Y - (I-H)H:, â€¢ Then conduct the iterations by means of the
pair of formulas

(2.8) ^-N+l = ^^N ^ Pn ^ N = 0,1,.., ,

(2.9) f'j^^^L = Hp^ , N = 0,1,...

15.

We note that by back substitution, we find that

^2-10) ^m = ?o^Po^Pl^ - - *-PN"^o-'(I^K-' - -^^^'fio â€¢

In actual practice, the customary running check on the
accuracy of the solution consists in computing || p ||from time
to time to see hovj small it is. The iterations are stopped
when II a- 1 1 reaches a nredetermined order of smallness. If
(2.L|.) xvere used for the iterations, the computation of p would
be done by using the formula a^ = Crj^-, - ^jtÂ» If (2.8) and
(2.9) were to be used, it would be advisable to compute test
values of || pâ€ž|| from the definition of pâ€ž (that is, from the
formula p,, - y ~ (I-H)Cm)> rather tha:i to accept the values of
II pKrllgivÂ®^ by (2.9). We shall com.e back to this point in a
moment .

An a priori truncation error analysis, made with the
purpose of estimating the number of iterations which will be
required to achieve a given accuracy, can be conducted either
in terms of p or in terms of the error vector L - E^, If
the size of || d . || is to be the criterion, then we use (2.?)
to obtain the very simple ectimate

(2.11) llpjjl ^ llHll^llpJI .

But if the deviation of Â£ , from ^ seems to be a more appro-
priate or convenient measure of the truncation error, then we

16.

use the fact that (I-H)~ p = ^_ - ^ â€¢ Substituting this

into [Z.SK we find that

(2.12) ?co - ^N = hN(I-H)-\

Since (I-H)"-'" = I + H + H^ + . . . , it follows that

||(I-H)"^|| g 11x11+ I1h1I+ |1h||2 + ... = (1-||h|| )"^. Therefore

(2.13) H^ -,,11 ^^^^[^ II Poll .

It is perhaps worthwhile to point out here, by way of
an aside, that the inequalities (2.11) and (2.13) hold for
any one of the various matrix and vector norms in common use",
and not just for the norm 1|h1| = max z|h. .|Â» We are using this

â€¢ â€¢ -LI

1 J â– ^
particular norm here because of a special application it has

to the Monte Carlo method, which will be brought out in the
next section.

The iterative formulas (2.8) and (2.9) are (so to speak)
homogeneous, and therefore look easier to use than (2.L|.). But
(2.8) and (2.9) have the great disadvantage of being not self-
correcting in case a mistake is made at one stage, whereas
(2.[|.) does have this property. Suppose for example that for

"See Householder [7]j PP â€¢ 38-^l-Â«

17.

some N, Pv]-+-i is riistakenly computed as a zero vector in using
(2.9). Then since all subsequent vectors p^ will equal zero,
it is obvious from (2.10) that Â£,â€ž will be irretreviably wrong.
But if a mistake is made in com.puting -E,-^,-, by (2.[|.), the subse-
quent effect is like starting over a^^ain with another E, â€¢

Therefore we are not proposing (2.8) and (2.9) as a
practical substitute for (2.[|.) for a ncn-stochastic numerical
solution of the problem ^ = H^ + y. Our real purpose in intro-
ducing the alternative iteration formulas was to develop the
representation of ^â€ž given by formula (2.10). This representa-
tion seems to be an advanta'^eous one upon which to construct a
Monte Carlo solution, as we shall see in the next section.

It is not without interest, however, to point out that
if the amount of work involved in a computing job is measured
in any sensible way, it theoretically requires no more work to
use (2.8) and (2.9) up to a predetermined value of N than to
use (2.I1). This assumes that check values of p will not have
to be computed from the definition of residual vector from
time to time in the use of (2.8) and (2.9). In this paper, we
shall measure amount of work by merely counting up the number
of multiplications required under the worst conditions, assum-
ing no zero or unit elements. Additions and subtractions will
be ignored. A division or reciprocation will count as a single

multiplication. To arrive at ^^ by (2.i|), starting with 4 ,

2 2

requires n multiplications per iteration, or Nn in all. To

arrive

15.

at ;p^ by (2.8) and (2.9) without computing any intermediate or

2
final check values of p,, takes n multiplications for p , and

2
thereafter (N-l)n multiplications for each iterative determin-

2
ation of pâ€ž. 3o once again the count is Nn Â«

Actually, in later sections we are going to set up an

error analysis which implies that p will always have to be

computed. The use of (2.4.) ^^fill be tacitly assumed, so n

extra multiplications will have to be added to the total count.

3* Stochastic solution of the problems . It should be apparent
from the foregoing section that even if one restricts oneself
to the class of stationary linear iterative processes, there

1 3 4