J. H Curtiss. # A theoretical comparison of the efficiencies of two classical methods and a Monte Carlo method for computing one component of the solution of a set of linear algebraic equations online

. **(page 1 of 4)**

Online Library → J. H Curtiss → A theoretical comparison of the efficiencies of two classical methods and a Monte Carlo method for computing one component of the solution of a set of linear algebraic equations → online text (page 1 of 4)

Font size

COPY NO.

iÂ§ WaVWly P1*SÂ«/ Nw York 3, N. Y.

NEW YORK UNIVERSITY

INSTITUTE OF

MATHEMATICAL SCIENCES

A Theoretical Comparison of the Efficiencies

of Two Classical Methods and a

Monte Carlo Method for Computing

One Component of the Solution

of a Set of Linear Algebraic Equations

J. H. CURTISS

Research in the Field of Probability,

Statistics, and Numerical Analysis

(Monte Carlo Method)

TECHNICAL REPORT NO. Ill

IMM-NYU 211

MAY 1954

CONTRACT NO. D A- 3 0- 069-ORD- 1 2 5 7

RAD PROJECT NO. TB2-0001 (1089)

PREPARED UNDER THE SPONSORSHIP OF

ARMY OFFICE OF ORDNANCE RESEARCH

.â€¢;av/ vor;-. LI". .â€¢.,_.â€¢ -.

INSTITUTE OF MATHEMA'I iC-\L SClÂ£iNCES

LIBRARY

as Waverly Place, New York 3, N. Y.

IMM-NYU 211

May 1951^.

A THEORETICAL COMPARISOI^ OF THE EFFICIENCIES

OF TV/0 CLASSICAL METHODS AND A MONTE CARLO METHOD

FOR COMPUTING ONE COMPONENT OF THE SOLUTION

OF A SET OF LINEAR ALGEBRAIC EQUATIONS

by

J. H. Curtiss

Based on an invited address given on March 17, 195i|

at a Symposium on Monte Carlo Methods sponsored by

the Aeronautical Research Laboratory, Wright Air

Development Center, and conducted by the Statistical

Laboratory, University of Florida, at Gainesville,

Florida.

Contract No. DA-30-069-0RD-1257

RAD Project No. TB2-0001 (1089)

Research in the field of Probability,

Statistics, and Numerical Analysis

(Monte Carlo Method)

Technical Report No. 3

This report represents results obtained at the

Institute of Mathematical Sciences, New York University,

under the sponsorship of the Army Office of Ordnance

Research.

New York, 1951^

â€¢ f-l"

J:

â– :', aiiUj =

ERRATA FOR IMM-NYU 211

p. 5, line 11. For " J" a, ,''* read " T~ a, "

For " Â£3 a^ ,''* read " J"

X I

p. 11, line 10. For "it cannot be singular," read "then I *â– H

cannot be singular"*

p. 13, last line before footnote. For "vector," read "vectors".

p. 11+, formula (2,7). In first line, delete numeral 1 below first

plus sign in third member of the equation.

In Second line, delete comma and "N = 1,2,.,."

p. 26, line 2. Nximber this formula as (3.1^.) *

p. 27, formula (3Â«5)Â« Close parentheses after | |h| | but before

exponent 2 in denominator *

p. 27, line 13 i Insert "P = [|h^.M" before "therefore".

p. 28, line 1 . Delete "P = [|h^.|]".

11 n St t)

p. 28, formula (3.7) â€¢ For ||p||, read ||p|| in last two members

of the inequality.

p. 35 Â» line 13 . For "number", read "numbers".

p. 35, line 5 from bottom. Delete "of".

p. 38, formula (1;.6). For "2llo|l^", read Â«2||p||^".

p. 38, line 2 and line 3 from bottom. Delete "with high probability

(specifically, a probability of 95Â°^o),"

p. I4.I, line 10. Insert "i = " before "i^" .

p. l|l, line 11. For "1-th", read "i^-th".

p. Ul, line 12. For "i-th", read "i^-th".

p. I4.I, line 17. For element", read "component".

i.::-, :JYvI-MM

-, I :

IJri.,;. if:

ni

Ni

C JT!0 :

j: .:; .'

I ; m'

"r^M^ }.,..:,.-r'

'n^nc;u!:o-

ERPATA FOR IMM-NYU 211 (oont.)

p. [4.2, line 10 ff. Delete paragraph beginning "Now the key..."

p. I|.2, line 15. Insert "from formulas (5*1) , (5*2), and

(5.3)" before "that",

p. kk, Note (c). Change "ic""'-^" to "lO''-^" in second line,

p. [^.8, "formula {6.14.). Insert "^ " after "H " .

p. Skt line 3. In third member of inequality, change

N

Â»jjN-lâ€ž ^^ u^s -1â€ž ^

::!T^

A THEORETICAL C0hPARI30Ij OF THE EFFICIENCIES

OF TWO CLASSICAL rfiiThCDj AMD A KONTE CARLO MjiTKOD

FOR COIIPUTIEG ONE COMPONENT OF THE SOLUTION

OF A SET OF LINioAR ALGEBRAIC E.^UATIQNS

J. H. Curtiss

Summary Â« In this paper a basis is established for a

theoretical comparison of the amount of work required by

three widely different methods of calculating (to a given

accuracy) one component of the solution of a set of simul-

taneous linear algebraic equations. The equations are

assumed to be in the form f = E^: + y> where y is a given

n-dimensional vector and H is a given nz n real matrix.

The amount of vrork is measured by the number of multipli-

cations required under the most unfavorable conditions.

The three methods are (a) the Gauss elimination m.ethod,

(b) the particular stationary linear iterative method de-

fined by the recursion formula 0, Z p, =1. Then a random variable J is set up with

k: - ^ k

the probability distribution given by Pr(J = k) = p ,

k = 1,2, ... â€¢ The random variable z-. will obviously have

a theoretical mean value equal to the sum of the series

a-, + ap + ... . The statistic

z = -A 2. ^

where J.,, Jâ€ž, ... are independent replicas of J, furnishes

an estimator of the mean value of Z-^. (that i,:;. jf the solu-

o

tion of the problem) which has various weil - knovjn optim.um

statistical properties.

It is hard for some classically train'^d numerical

analysts to see hov; Monte Carlo methods can ever be advan-

tageous in such a problem. A sor.iewhat ovei'-simplif ied

5i

version of their reasoning might go as follows. Let s

s

be chosen so that 2a, gives a satisfactory approximation

to the sum of the scries. (If the series is finite, let

s be the number of terms.) Since each observed value of

z ,|. conditionally estimates just one of the numbers a, ,

there will have to be at least as many terms in the sum-

_ s

mation for Z as in Za, Â« Indeed, because of statistical

V 1 â€¢'^

fluctuations it will probably be necessary to make very

many more observations on z , than there are terms in the

s _

finite approximation Sa, , and even then, Z will probably

s

not be as good as 2a, . This means that the comit of

X ^

additions alone v/lll be very much greater for the stochastic

approximation than for the straightforward method of solu-

tion; and then too, there is the trouble of setting up the

z, 's and Pi^ ' s in advance, and of determining stochastically,

over and over again, the value of J to use in the obser-

vations.

The case against this argument can very conveniently

be stated in terms of a specific problem which is funda-

mental to this paper. It is the problem of calculating

N+1

the i-th component of the vector K 9, where K denotes

the 2n X 2n matrix [k. .] and 9 is the 2n-dimensional vector

(t,,...,tâ€ž ). The i-th element of K 0, which we write

1 ' 2n '

as CI 9]., is the sura of (2n)^ term.s of the type

k.. k. . ... k. . t. â€¢ The stochastic estim.ation"

11^ ^1^2 ^irN+1 ^N+1

"See Curtiss [3]Â» (The square brackets refer to the

references at the end of the paper.)

6.

is accomplished by selecting numbers z. . and p. .,

i,j = l,...,2n, such that z^jP^j ~ ^ii' "^^^ Pij = ^'

2p. . = 1 for all i. Then a family of random variables

J ,J ,J,,...,Jâ€ž , is set up in such a way that it repre-

sents a Markov process (or "random x^alk") with states

(resting places) designated by l,2,...,2n and with station-

ary transition probabilities. The specification of the

joint probability distribution is accomplished by assigning

an arbitrary distribution to J^ (but with Pr(J^=i) / 0),

and thereafter using the equations" Pr( J^^-j^= j | Jj^=i ) = p^.,

i, j = l,...,2n. Finally, a random variable

'^o"^l**"^N+l '^o'^l '^1*^2'** '^N'^N+l *^N +1

is set up. It is now easily seen from the definition of

mean value in probability theory, that

E(z1j =i) = K?^^"^9].. We use (Z^+. . .+Z^ )/v as the esti-

mator of K 9]., where Z,,...,Z denote v independent

determinations of Z.

The various possible values of k.. k k. k. t.

correspond to the values of a, in the previous more general

"By Pr(z|b) v;e mean the conditional probability of the

event a, given that the event b has occurred.

By E(xia), we mean the mean value of the conditional

distribution of the random variable X, given that the

event a has occurred.

formulation. Let us think of these possible values as

renumbered in a linear and serial order, using a sxngle

index k. There will be s = (2n)"' suci- numbers. (They

may not be all distinct.) Then the (2n)^ correspondingly

renumbered products P.- .â€¢ â€¢â€¢â€¢?.; a play the role of

^^1 ^N N+1

p, ,Pp,...,p in the previous formulation, and the various

possible values of the vector random variable

J = (J , J, , . . . , J.,^-, ) correspond to the values of J in the

= o 1 N+1

previous formulation.

It now begins to be evident that our formulation of

the general summation problem at the begin ing of the

section was deceptively over-simplified. In a multi-

dimensional summation problem, the following two factors

may come into play on the side of a Monte Carlo method of

solution:

(a) If the calculation of each a, is a complicated

one, it may be possible to arrange things so that the

calculation of each observation on z, is very much simpler

than the calculation of the corresponding term a,. This

will m particular be the case if the calculation of each

a, involves the formation of a continued product, because

k

then part of the worl: of calculating the product can be

sidestepped in the stochastic process by using the multi-

plicative law of probabilities.

(b) Some of the numbers a. In the finite approxima-

tion to Za, may be very unimportant and need not be

represented in the statistical estimator at all. The

stochastic estimation process, if properly set up, will

automatically take care of this by making the appearance

of the representative of a non-essential term a very rare

event .

We add here, more or less as an aside, the remark

that ^^rhen the calculation of each particular z is much

simpler than that of the corresponding aj, then the

problem of the accumulation of round-off errors may not

be nearly as serious for the stochastic method as it is

in the direct computation.

All of these factors favoring the Monte Carlo method

are pi'esent in the case of the problem R 9] . and in the

related problem of solving systems of linear algebraic

equations. The factor (a) is usually particularly in

evidence in numerical problems of stochastic origin, and

indeed it is in such problems that the Monte Carlo Method

has had its chief successes. Matrix problem.s can always

be throvm into a form in wbich they are nuinerically equi-

valent to pi'oblems with a stochastic pedigree. We shall

exploit that fact in the present paper to obtain a favor-

able environment for the com.parisons to be made.

But we cannot conclude this introduction without

making a remark which is on the negative side as far as

9.

Monte Carlo methods are concerned. It certainly would seem

that v/r.enever Monte Carlo methods appear to advantage in

sumraation problems, factor (b) ai^ove must be playing an

important role, becau3C otherwise the criticism regarding

the necessary momber of addition operations required for

any reasonable degree of accuracy in the Monte Carlo

approach V70uld be valid. But if that is so, i^rhy cannot

a detei^ministic method be devised which will ignore the

unimportant terms and be even more efficient? The author

suspects that whr.t we now need is a more i.aghly developed

deterministic theor'y of quadrature and of linear computation

in many dimensions. Vi'hen this becomes availaole, the Monte

Carlo method may, at least for miatrj.x problems, lose the

very modest advantages wi^ich will be claimed for it in

this paper.

2Â« The nuinerical p roblem and its non-stochas t ic solution .

Throughout tne paper we shall cor;ti^.^ue to denote matrices

by capital letters ana their elements by the corresponding

lower case letters; thus, for example, H = [h. .]. We

represent vectors by lovjer case Greek letters and their

components by the corresponding Roman letters; thus for

example, cf = (x-, , Xp, . . . , x ). Vie shall also find it con-

venient occasionally to designate the elements of a matrix

by double subscripts affixed to the symbol for the matrix

-1 2

(thus, H. . or [(I-H) H ]. .). Furthermore we shall frequently

10,

designate the components of a vector by a similar subscript

notation (thus ^ = (c,] , ^] , ..., ^] ), All vectors will

be real and n-dimensional and all matrices will have only

real elements.

By II H II we shall mean max 2 |h. .|, and by H'?; II , we

1 j iJ

shall mean maxfx. j. It is obvious that || Et.\\ ^ |i H || ||^ ||

and that ||a+b||^ || a|| + ||b|| . It is well-knovjn""' that

11 AbII ^ II All I! bII t therefore, by induction, || H^ || ^ I|h ||^ â€¢

Tlie numerica] problem -,'ith which we shall be mainly

concerned is tbat of solving the linear system

(2.1) AE = ^i,

for %. , whei'-e A is a given non-singular n x n matrix and

is a given vector, VJe assume that this system has been

thro'-.^ni into the form

(2.2) -i, = Hf, + Y ,

where H = [h. _.] is an nxn mc;trix vith the property that

{2,3) i! H !! = max S Ih j ^ 1 .

'See for exair.ple Couraiit ana Hllbert [IJ* PÂ« 16, footnote.

11.

It is beyond the scope of this paper to give an

extended discussion of the nethods of passing from (2,1)

to (2.2), but a few general remarks on the subject are in

order at this point. In the first place, it is always

theoretically possible to transform the problera repre-

sented by (2.1) in this manner. Indeed, let H be any

matrix whatsoever with the property (2.3 )â€¢ (For instance,

let H = dl, where I is the unit matrix and d is a scalar

lying between and 1.) It is known that if H satisfies

(2.3), it cannot be singular". Let I'i be defined by the

equation I - i'4A = H. Tl'iis says that M = (I-H)a" , There-

fore M cannot be singular. The system

? = HI + MNi. = (I-m)C + Mn\. ,

is just the same as the system

I4AC = Mn ,

and since II is non-singular, this system is precisely

equivalent (2.1).

But in practice it is not feasible to set up an

arbitrary II satisfying (2.3)> s.nd then to determine I-I as

"0. Taussky-Todd, [9].

12.

above. The reason is thr'it the formula M = (I-H)A~ pre-

supposes a knowledge of A , and in the presence of this

the original problera becori:es trivial. Thus it is natural

to think of M as being chosen first, the choice being made

in such a way that I - MA = H has suitable properties.

There are a number of different procedures in the literature

of linear computation for arriving at an appropriate choice

of M, For ezaiTiple, if A has dominant diagonal - that is,

if |a..| > 2 |a. .1, i = l,...,n, - then M can be chosen

as the inverse of the principal diagonal matrix whose

principal diagonal is that of A. This will obviously insure

that II H|1 < l'"'.

There are nuraerous non-stochastic methods of solving

(2.1) 'Â» VJe shall here restrict our considerstions mainly

to a class of metl ods known as linear iterative processes.

An effective Monte Carlo method for the problem (2.1) can

be based on this type of process, as we presently shall

see â€¢

The general stationary linear iterative process for

solving (2.1) is ari\ived at by throwing (2.1) into an

equivalent fon.i which has the appearance of (2.2), but with

H restricted only by the requirement that its eigenvalues

all lie in the unit circle. An initial estimate Â£ of the

Further discussion will be found in any good treatise on

numerical analysis which deals with Iterative methods of

solving linear equatioxis. See for example Householder [7]

and Kilne [8]. See Porsythe [6] for further references.

See previous footnote for references.

13

solution is mads, and successive approximations Â£,^, E, ,

are then define:! by

(2.[^) Cjj+i = H^^^ + Y , N = 0,1,2,... .

If f' denotes the solution of the equations (2.2), then

clearly, since ^^ = E^^ + r,

(2.5) Â£^ - Cn = H(?^ - Ch-i)

= ^^(^co -^N-1^

= ^''(?oo - ^o^

Thus the condition for convergence for any starting vector

? is that lim H = 0. The well-known necessary and

sufficient condition [?] for lim H = is that all the

N^oo

eigenvalues of H should lie in the unit circle, vdnich

explains Ihe requirement on H imposed earlier in this

paragraph.

But in the present discussion, ^^re go one step beyond

this requirement and insist that ||h|I< 1. Since

||H II g II H II \ this condition will certainly insure that

H^^ -. 0.

For purposes of error analysis it is advantageous

now to introduce the residual vector

'By error in this paper, we shall always mean truncation

error or statistical error or both at once. There will be

no study of round-off error, nor of the effect of miscel-

laneous arithmetical mistal:es.

^.-f^-

rn

â– -â– .^s:-i i i y ^ f ^ v!T

fi.

n^

Ik*

(2.6) Pn = Y - (I-H)^N

. = H^j^ + Y - ?i^ = '-N+1 " "^'N' N " 0.1,2,..

The vectors p^ are of course always computable at any stage

in the iterative solution. If T,,. -^> '?_ = (I-H) y, then

In CO

obviously p,, -^ 0. The converse is also true, because

(I-H)" p^ = $ - n^T* Thus pjj, or || pj^|| , are logical measures

of the error in the N-th approximation to the solution. It

is to be noted from (2.. '4.) and (2.6) that

(2.7) Pn = ^^+1 - S = ^Â«'n t ^^ ' ^Â«?N-i â– " Y)

^ ^^'n - ni-1^ = ^Pn-1 ' N = 1,2,*..

- â€¢ â€¢ â€¢ - ^^ Po â€¢

Prom (2.6) and (2.?) it follows that the successive

approximations -'^ generated by (2.1+) can theoretically also

be generated in the following manner: Select 'h as before, and

compute p from the definition of residual vector,

p = Y - (I-H)H:, â€¢ Then conduct the iterations by means of the

pair of formulas

(2.8) ^-N+l = ^^N ^ Pn ^ N = 0,1,.., ,

(2.9) f'j^^^L = Hp^ , N = 0,1,...

15.

We note that by back substitution, we find that

^2-10) ^m = ?o^Po^Pl^ - - *-PN"^o-'(I^K-' - -^^^'fio â€¢

In actual practice, the customary running check on the

accuracy of the solution consists in computing || p ||from time

to time to see hovj small it is. The iterations are stopped

when II a- 1 1 reaches a nredetermined order of smallness. If

(2.L|.) xvere used for the iterations, the computation of p would

be done by using the formula a^ = Crj^-, - ^jtÂ» If (2.8) and

(2.9) were to be used, it would be advisable to compute test

values of || pâ€ž|| from the definition of pâ€ž (that is, from the

formula p,, - y ~ (I-H)Cm)> rather tha:i to accept the values of

II pKrllgivÂ®^ by (2.9). We shall com.e back to this point in a

moment .

An a priori truncation error analysis, made with the

purpose of estimating the number of iterations which will be

required to achieve a given accuracy, can be conducted either

in terms of p or in terms of the error vector L - E^, If

the size of || d . || is to be the criterion, then we use (2.?)

to obtain the very simple ectimate

(2.11) llpjjl ^ llHll^llpJI .

But if the deviation of Â£ , from ^ seems to be a more appro-

priate or convenient measure of the truncation error, then we

16.

use the fact that (I-H)~ p = ^_ - ^ â€¢ Substituting this

into [Z.SK we find that

(2.12) ?co - ^N = hN(I-H)-\

Since (I-H)"-'" = I + H + H^ + . . . , it follows that

||(I-H)"^|| g 11x11+ I1h1I+ |1h||2 + ... = (1-||h|| )"^. Therefore

(2.13) H^ -,,11 ^^^^[^ II Poll .

It is perhaps worthwhile to point out here, by way of

an aside, that the inequalities (2.11) and (2.13) hold for

any one of the various matrix and vector norms in common use",

and not just for the norm 1|h1| = max z|h. .|Â» We are using this

â€¢ â€¢ -LI

1 J â– ^

particular norm here because of a special application it has

to the Monte Carlo method, which will be brought out in the

next section.

The iterative formulas (2.8) and (2.9) are (so to speak)

homogeneous, and therefore look easier to use than (2.L|.). But

(2.8) and (2.9) have the great disadvantage of being not self-

correcting in case a mistake is made at one stage, whereas

(2.[|.) does have this property. Suppose for example that for

"See Householder [7]j PP â€¢ 38-^l-Â«

17.

some N, Pv]-+-i is riistakenly computed as a zero vector in using

(2.9). Then since all subsequent vectors p^ will equal zero,

it is obvious from (2.10) that Â£,â€ž will be irretreviably wrong.

But if a mistake is made in com.puting -E,-^,-, by (2.[|.), the subse-

quent effect is like starting over a^^ain with another E, â€¢

Therefore we are not proposing (2.8) and (2.9) as a

practical substitute for (2.[|.) for a ncn-stochastic numerical

solution of the problem ^ = H^ + y. Our real purpose in intro-

ducing the alternative iteration formulas was to develop the

representation of ^â€ž given by formula (2.10). This representa-

tion seems to be an advanta'^eous one upon which to construct a

Monte Carlo solution, as we shall see in the next section.

It is not without interest, however, to point out that

if the amount of work involved in a computing job is measured

in any sensible way, it theoretically requires no more work to

use (2.8) and (2.9) up to a predetermined value of N than to

use (2.I1). This assumes that check values of p will not have

to be computed from the definition of residual vector from

time to time in the use of (2.8) and (2.9). In this paper, we

shall measure amount of work by merely counting up the number

of multiplications required under the worst conditions, assum-

ing no zero or unit elements. Additions and subtractions will

be ignored. A division or reciprocation will count as a single

multiplication. To arrive at ^^ by (2.i|), starting with 4 ,

2 2

requires n multiplications per iteration, or Nn in all. To

arrive

15.

at ;p^ by (2.8) and (2.9) without computing any intermediate or

2

final check values of p,, takes n multiplications for p , and

2

thereafter (N-l)n multiplications for each iterative determin-

2

ation of pâ€ž. 3o once again the count is Nn Â«

Actually, in later sections we are going to set up an

error analysis which implies that p will always have to be

computed. The use of (2.4.) ^^fill be tacitly assumed, so n

extra multiplications will have to be added to the total count.

3* Stochastic solution of the problems . It should be apparent

from the foregoing section that even if one restricts oneself

to the class of stationary linear iterative processes, there

iÂ§ WaVWly P1*SÂ«/ Nw York 3, N. Y.

NEW YORK UNIVERSITY

INSTITUTE OF

MATHEMATICAL SCIENCES

A Theoretical Comparison of the Efficiencies

of Two Classical Methods and a

Monte Carlo Method for Computing

One Component of the Solution

of a Set of Linear Algebraic Equations

J. H. CURTISS

Research in the Field of Probability,

Statistics, and Numerical Analysis

(Monte Carlo Method)

TECHNICAL REPORT NO. Ill

IMM-NYU 211

MAY 1954

CONTRACT NO. D A- 3 0- 069-ORD- 1 2 5 7

RAD PROJECT NO. TB2-0001 (1089)

PREPARED UNDER THE SPONSORSHIP OF

ARMY OFFICE OF ORDNANCE RESEARCH

.â€¢;av/ vor;-. LI". .â€¢.,_.â€¢ -.

INSTITUTE OF MATHEMA'I iC-\L SClÂ£iNCES

LIBRARY

as Waverly Place, New York 3, N. Y.

IMM-NYU 211

May 1951^.

A THEORETICAL COMPARISOI^ OF THE EFFICIENCIES

OF TV/0 CLASSICAL METHODS AND A MONTE CARLO METHOD

FOR COMPUTING ONE COMPONENT OF THE SOLUTION

OF A SET OF LINEAR ALGEBRAIC EQUATIONS

by

J. H. Curtiss

Based on an invited address given on March 17, 195i|

at a Symposium on Monte Carlo Methods sponsored by

the Aeronautical Research Laboratory, Wright Air

Development Center, and conducted by the Statistical

Laboratory, University of Florida, at Gainesville,

Florida.

Contract No. DA-30-069-0RD-1257

RAD Project No. TB2-0001 (1089)

Research in the field of Probability,

Statistics, and Numerical Analysis

(Monte Carlo Method)

Technical Report No. 3

This report represents results obtained at the

Institute of Mathematical Sciences, New York University,

under the sponsorship of the Army Office of Ordnance

Research.

New York, 1951^

â€¢ f-l"

J:

â– :', aiiUj =

ERRATA FOR IMM-NYU 211

p. 5, line 11. For " J" a, ,''* read " T~ a, "

For " Â£3 a^ ,''* read " J"

X I

p. 11, line 10. For "it cannot be singular," read "then I *â– H

cannot be singular"*

p. 13, last line before footnote. For "vector," read "vectors".

p. 11+, formula (2,7). In first line, delete numeral 1 below first

plus sign in third member of the equation.

In Second line, delete comma and "N = 1,2,.,."

p. 26, line 2. Nximber this formula as (3.1^.) *

p. 27, formula (3Â«5)Â« Close parentheses after | |h| | but before

exponent 2 in denominator *

p. 27, line 13 i Insert "P = [|h^.M" before "therefore".

p. 28, line 1 . Delete "P = [|h^.|]".

11 n St t)

p. 28, formula (3.7) â€¢ For ||p||, read ||p|| in last two members

of the inequality.

p. 35 Â» line 13 . For "number", read "numbers".

p. 35, line 5 from bottom. Delete "of".

p. 38, formula (1;.6). For "2llo|l^", read Â«2||p||^".

p. 38, line 2 and line 3 from bottom. Delete "with high probability

(specifically, a probability of 95Â°^o),"

p. I4.I, line 10. Insert "i = " before "i^" .

p. l|l, line 11. For "1-th", read "i^-th".

p. Ul, line 12. For "i-th", read "i^-th".

p. I4.I, line 17. For element", read "component".

i.::-, :JYvI-MM

-, I :

IJri.,;. if:

ni

Ni

C JT!0 :

j: .:; .'

I ; m'

"r^M^ }.,..:,.-r'

'n^nc;u!:o-

ERPATA FOR IMM-NYU 211 (oont.)

p. [4.2, line 10 ff. Delete paragraph beginning "Now the key..."

p. I|.2, line 15. Insert "from formulas (5*1) , (5*2), and

(5.3)" before "that",

p. kk, Note (c). Change "ic""'-^" to "lO''-^" in second line,

p. [^.8, "formula {6.14.). Insert "^ " after "H " .

p. Skt line 3. In third member of inequality, change

N

Â»jjN-lâ€ž ^^ u^s -1â€ž ^

::!T^

A THEORETICAL C0hPARI30Ij OF THE EFFICIENCIES

OF TWO CLASSICAL rfiiThCDj AMD A KONTE CARLO MjiTKOD

FOR COIIPUTIEG ONE COMPONENT OF THE SOLUTION

OF A SET OF LINioAR ALGEBRAIC E.^UATIQNS

J. H. Curtiss

Summary Â« In this paper a basis is established for a

theoretical comparison of the amount of work required by

three widely different methods of calculating (to a given

accuracy) one component of the solution of a set of simul-

taneous linear algebraic equations. The equations are

assumed to be in the form f = E^: + y> where y is a given

n-dimensional vector and H is a given nz n real matrix.

The amount of vrork is measured by the number of multipli-

cations required under the most unfavorable conditions.

The three methods are (a) the Gauss elimination m.ethod,

(b) the particular stationary linear iterative method de-

fined by the recursion formula 0, Z p, =1. Then a random variable J is set up with

k: - ^ k

the probability distribution given by Pr(J = k) = p ,

k = 1,2, ... â€¢ The random variable z-. will obviously have

a theoretical mean value equal to the sum of the series

a-, + ap + ... . The statistic

z = -A 2. ^

where J.,, Jâ€ž, ... are independent replicas of J, furnishes

an estimator of the mean value of Z-^. (that i,:;. jf the solu-

o

tion of the problem) which has various weil - knovjn optim.um

statistical properties.

It is hard for some classically train'^d numerical

analysts to see hov; Monte Carlo methods can ever be advan-

tageous in such a problem. A sor.iewhat ovei'-simplif ied

5i

version of their reasoning might go as follows. Let s

s

be chosen so that 2a, gives a satisfactory approximation

to the sum of the scries. (If the series is finite, let

s be the number of terms.) Since each observed value of

z ,|. conditionally estimates just one of the numbers a, ,

there will have to be at least as many terms in the sum-

_ s

mation for Z as in Za, Â« Indeed, because of statistical

V 1 â€¢'^

fluctuations it will probably be necessary to make very

many more observations on z , than there are terms in the

s _

finite approximation Sa, , and even then, Z will probably

s

not be as good as 2a, . This means that the comit of

X ^

additions alone v/lll be very much greater for the stochastic

approximation than for the straightforward method of solu-

tion; and then too, there is the trouble of setting up the

z, 's and Pi^ ' s in advance, and of determining stochastically,

over and over again, the value of J to use in the obser-

vations.

The case against this argument can very conveniently

be stated in terms of a specific problem which is funda-

mental to this paper. It is the problem of calculating

N+1

the i-th component of the vector K 9, where K denotes

the 2n X 2n matrix [k. .] and 9 is the 2n-dimensional vector

(t,,...,tâ€ž ). The i-th element of K 0, which we write

1 ' 2n '

as CI 9]., is the sura of (2n)^ term.s of the type

k.. k. . ... k. . t. â€¢ The stochastic estim.ation"

11^ ^1^2 ^irN+1 ^N+1

"See Curtiss [3]Â» (The square brackets refer to the

references at the end of the paper.)

6.

is accomplished by selecting numbers z. . and p. .,

i,j = l,...,2n, such that z^jP^j ~ ^ii' "^^^ Pij = ^'

2p. . = 1 for all i. Then a family of random variables

J ,J ,J,,...,Jâ€ž , is set up in such a way that it repre-

sents a Markov process (or "random x^alk") with states

(resting places) designated by l,2,...,2n and with station-

ary transition probabilities. The specification of the

joint probability distribution is accomplished by assigning

an arbitrary distribution to J^ (but with Pr(J^=i) / 0),

and thereafter using the equations" Pr( J^^-j^= j | Jj^=i ) = p^.,

i, j = l,...,2n. Finally, a random variable

'^o"^l**"^N+l '^o'^l '^1*^2'** '^N'^N+l *^N +1

is set up. It is now easily seen from the definition of

mean value in probability theory, that

E(z1j =i) = K?^^"^9].. We use (Z^+. . .+Z^ )/v as the esti-

mator of K 9]., where Z,,...,Z denote v independent

determinations of Z.

The various possible values of k.. k k. k. t.

correspond to the values of a, in the previous more general

"By Pr(z|b) v;e mean the conditional probability of the

event a, given that the event b has occurred.

By E(xia), we mean the mean value of the conditional

distribution of the random variable X, given that the

event a has occurred.

formulation. Let us think of these possible values as

renumbered in a linear and serial order, using a sxngle

index k. There will be s = (2n)"' suci- numbers. (They

may not be all distinct.) Then the (2n)^ correspondingly

renumbered products P.- .â€¢ â€¢â€¢â€¢?.; a play the role of

^^1 ^N N+1

p, ,Pp,...,p in the previous formulation, and the various

possible values of the vector random variable

J = (J , J, , . . . , J.,^-, ) correspond to the values of J in the

= o 1 N+1

previous formulation.

It now begins to be evident that our formulation of

the general summation problem at the begin ing of the

section was deceptively over-simplified. In a multi-

dimensional summation problem, the following two factors

may come into play on the side of a Monte Carlo method of

solution:

(a) If the calculation of each a, is a complicated

one, it may be possible to arrange things so that the

calculation of each observation on z, is very much simpler

than the calculation of the corresponding term a,. This

will m particular be the case if the calculation of each

a, involves the formation of a continued product, because

k

then part of the worl: of calculating the product can be

sidestepped in the stochastic process by using the multi-

plicative law of probabilities.

(b) Some of the numbers a. In the finite approxima-

tion to Za, may be very unimportant and need not be

represented in the statistical estimator at all. The

stochastic estimation process, if properly set up, will

automatically take care of this by making the appearance

of the representative of a non-essential term a very rare

event .

We add here, more or less as an aside, the remark

that ^^rhen the calculation of each particular z is much

simpler than that of the corresponding aj, then the

problem of the accumulation of round-off errors may not

be nearly as serious for the stochastic method as it is

in the direct computation.

All of these factors favoring the Monte Carlo method

are pi'esent in the case of the problem R 9] . and in the

related problem of solving systems of linear algebraic

equations. The factor (a) is usually particularly in

evidence in numerical problems of stochastic origin, and

indeed it is in such problems that the Monte Carlo Method

has had its chief successes. Matrix problem.s can always

be throvm into a form in wbich they are nuinerically equi-

valent to pi'oblems with a stochastic pedigree. We shall

exploit that fact in the present paper to obtain a favor-

able environment for the com.parisons to be made.

But we cannot conclude this introduction without

making a remark which is on the negative side as far as

9.

Monte Carlo methods are concerned. It certainly would seem

that v/r.enever Monte Carlo methods appear to advantage in

sumraation problems, factor (b) ai^ove must be playing an

important role, becau3C otherwise the criticism regarding

the necessary momber of addition operations required for

any reasonable degree of accuracy in the Monte Carlo

approach V70uld be valid. But if that is so, i^rhy cannot

a detei^ministic method be devised which will ignore the

unimportant terms and be even more efficient? The author

suspects that whr.t we now need is a more i.aghly developed

deterministic theor'y of quadrature and of linear computation

in many dimensions. Vi'hen this becomes availaole, the Monte

Carlo method may, at least for miatrj.x problems, lose the

very modest advantages wi^ich will be claimed for it in

this paper.

2Â« The nuinerical p roblem and its non-stochas t ic solution .

Throughout tne paper we shall cor;ti^.^ue to denote matrices

by capital letters ana their elements by the corresponding

lower case letters; thus, for example, H = [h. .]. We

represent vectors by lovjer case Greek letters and their

components by the corresponding Roman letters; thus for

example, cf = (x-, , Xp, . . . , x ). Vie shall also find it con-

venient occasionally to designate the elements of a matrix

by double subscripts affixed to the symbol for the matrix

-1 2

(thus, H. . or [(I-H) H ]. .). Furthermore we shall frequently

10,

designate the components of a vector by a similar subscript

notation (thus ^ = (c,] , ^] , ..., ^] ), All vectors will

be real and n-dimensional and all matrices will have only

real elements.

By II H II we shall mean max 2 |h. .|, and by H'?; II , we

1 j iJ

shall mean maxfx. j. It is obvious that || Et.\\ ^ |i H || ||^ ||

and that ||a+b||^ || a|| + ||b|| . It is well-knovjn""' that

11 AbII ^ II All I! bII t therefore, by induction, || H^ || ^ I|h ||^ â€¢

Tlie numerica] problem -,'ith which we shall be mainly

concerned is tbat of solving the linear system

(2.1) AE = ^i,

for %. , whei'-e A is a given non-singular n x n matrix and

is a given vector, VJe assume that this system has been

thro'-.^ni into the form

(2.2) -i, = Hf, + Y ,

where H = [h. _.] is an nxn mc;trix vith the property that

{2,3) i! H !! = max S Ih j ^ 1 .

'See for exair.ple Couraiit ana Hllbert [IJ* PÂ« 16, footnote.

11.

It is beyond the scope of this paper to give an

extended discussion of the nethods of passing from (2,1)

to (2.2), but a few general remarks on the subject are in

order at this point. In the first place, it is always

theoretically possible to transform the problera repre-

sented by (2.1) in this manner. Indeed, let H be any

matrix whatsoever with the property (2.3 )â€¢ (For instance,

let H = dl, where I is the unit matrix and d is a scalar

lying between and 1.) It is known that if H satisfies

(2.3), it cannot be singular". Let I'i be defined by the

equation I - i'4A = H. Tl'iis says that M = (I-H)a" , There-

fore M cannot be singular. The system

? = HI + MNi. = (I-m)C + Mn\. ,

is just the same as the system

I4AC = Mn ,

and since II is non-singular, this system is precisely

equivalent (2.1).

But in practice it is not feasible to set up an

arbitrary II satisfying (2.3)> s.nd then to determine I-I as

"0. Taussky-Todd, [9].

12.

above. The reason is thr'it the formula M = (I-H)A~ pre-

supposes a knowledge of A , and in the presence of this

the original problera becori:es trivial. Thus it is natural

to think of M as being chosen first, the choice being made

in such a way that I - MA = H has suitable properties.

There are a number of different procedures in the literature

of linear computation for arriving at an appropriate choice

of M, For ezaiTiple, if A has dominant diagonal - that is,

if |a..| > 2 |a. .1, i = l,...,n, - then M can be chosen

as the inverse of the principal diagonal matrix whose

principal diagonal is that of A. This will obviously insure

that II H|1 < l'"'.

There are nuraerous non-stochastic methods of solving

(2.1) 'Â» VJe shall here restrict our considerstions mainly

to a class of metl ods known as linear iterative processes.

An effective Monte Carlo method for the problem (2.1) can

be based on this type of process, as we presently shall

see â€¢

The general stationary linear iterative process for

solving (2.1) is ari\ived at by throwing (2.1) into an

equivalent fon.i which has the appearance of (2.2), but with

H restricted only by the requirement that its eigenvalues

all lie in the unit circle. An initial estimate Â£ of the

Further discussion will be found in any good treatise on

numerical analysis which deals with Iterative methods of

solving linear equatioxis. See for example Householder [7]

and Kilne [8]. See Porsythe [6] for further references.

See previous footnote for references.

13

solution is mads, and successive approximations Â£,^, E, ,

are then define:! by

(2.[^) Cjj+i = H^^^ + Y , N = 0,1,2,... .

If f' denotes the solution of the equations (2.2), then

clearly, since ^^ = E^^ + r,

(2.5) Â£^ - Cn = H(?^ - Ch-i)

= ^^(^co -^N-1^

= ^''(?oo - ^o^

Thus the condition for convergence for any starting vector

? is that lim H = 0. The well-known necessary and

sufficient condition [?] for lim H = is that all the

N^oo

eigenvalues of H should lie in the unit circle, vdnich

explains Ihe requirement on H imposed earlier in this

paragraph.

But in the present discussion, ^^re go one step beyond

this requirement and insist that ||h|I< 1. Since

||H II g II H II \ this condition will certainly insure that

H^^ -. 0.

For purposes of error analysis it is advantageous

now to introduce the residual vector

'By error in this paper, we shall always mean truncation

error or statistical error or both at once. There will be

no study of round-off error, nor of the effect of miscel-

laneous arithmetical mistal:es.

^.-f^-

rn

â– -â– .^s:-i i i y ^ f ^ v!T

fi.

n^

Ik*

(2.6) Pn = Y - (I-H)^N

. = H^j^ + Y - ?i^ = '-N+1 " "^'N' N " 0.1,2,..

The vectors p^ are of course always computable at any stage

in the iterative solution. If T,,. -^> '?_ = (I-H) y, then

In CO

obviously p,, -^ 0. The converse is also true, because

(I-H)" p^ = $ - n^T* Thus pjj, or || pj^|| , are logical measures

of the error in the N-th approximation to the solution. It

is to be noted from (2.. '4.) and (2.6) that

(2.7) Pn = ^^+1 - S = ^Â«'n t ^^ ' ^Â«?N-i â– " Y)

^ ^^'n - ni-1^ = ^Pn-1 ' N = 1,2,*..

- â€¢ â€¢ â€¢ - ^^ Po â€¢

Prom (2.6) and (2.?) it follows that the successive

approximations -'^ generated by (2.1+) can theoretically also

be generated in the following manner: Select 'h as before, and

compute p from the definition of residual vector,

p = Y - (I-H)H:, â€¢ Then conduct the iterations by means of the

pair of formulas

(2.8) ^-N+l = ^^N ^ Pn ^ N = 0,1,.., ,

(2.9) f'j^^^L = Hp^ , N = 0,1,...

15.

We note that by back substitution, we find that

^2-10) ^m = ?o^Po^Pl^ - - *-PN"^o-'(I^K-' - -^^^'fio â€¢

In actual practice, the customary running check on the

accuracy of the solution consists in computing || p ||from time

to time to see hovj small it is. The iterations are stopped

when II a- 1 1 reaches a nredetermined order of smallness. If

(2.L|.) xvere used for the iterations, the computation of p would

be done by using the formula a^ = Crj^-, - ^jtÂ» If (2.8) and

(2.9) were to be used, it would be advisable to compute test

values of || pâ€ž|| from the definition of pâ€ž (that is, from the

formula p,, - y ~ (I-H)Cm)> rather tha:i to accept the values of

II pKrllgivÂ®^ by (2.9). We shall com.e back to this point in a

moment .

An a priori truncation error analysis, made with the

purpose of estimating the number of iterations which will be

required to achieve a given accuracy, can be conducted either

in terms of p or in terms of the error vector L - E^, If

the size of || d . || is to be the criterion, then we use (2.?)

to obtain the very simple ectimate

(2.11) llpjjl ^ llHll^llpJI .

But if the deviation of Â£ , from ^ seems to be a more appro-

priate or convenient measure of the truncation error, then we

16.

use the fact that (I-H)~ p = ^_ - ^ â€¢ Substituting this

into [Z.SK we find that

(2.12) ?co - ^N = hN(I-H)-\

Since (I-H)"-'" = I + H + H^ + . . . , it follows that

||(I-H)"^|| g 11x11+ I1h1I+ |1h||2 + ... = (1-||h|| )"^. Therefore

(2.13) H^ -,,11 ^^^^[^ II Poll .

It is perhaps worthwhile to point out here, by way of

an aside, that the inequalities (2.11) and (2.13) hold for

any one of the various matrix and vector norms in common use",

and not just for the norm 1|h1| = max z|h. .|Â» We are using this

â€¢ â€¢ -LI

1 J â– ^

particular norm here because of a special application it has

to the Monte Carlo method, which will be brought out in the

next section.

The iterative formulas (2.8) and (2.9) are (so to speak)

homogeneous, and therefore look easier to use than (2.L|.). But

(2.8) and (2.9) have the great disadvantage of being not self-

correcting in case a mistake is made at one stage, whereas

(2.[|.) does have this property. Suppose for example that for

"See Householder [7]j PP â€¢ 38-^l-Â«

17.

some N, Pv]-+-i is riistakenly computed as a zero vector in using

(2.9). Then since all subsequent vectors p^ will equal zero,

it is obvious from (2.10) that Â£,â€ž will be irretreviably wrong.

But if a mistake is made in com.puting -E,-^,-, by (2.[|.), the subse-

quent effect is like starting over a^^ain with another E, â€¢

Therefore we are not proposing (2.8) and (2.9) as a

practical substitute for (2.[|.) for a ncn-stochastic numerical

solution of the problem ^ = H^ + y. Our real purpose in intro-

ducing the alternative iteration formulas was to develop the

representation of ^â€ž given by formula (2.10). This representa-

tion seems to be an advanta'^eous one upon which to construct a

Monte Carlo solution, as we shall see in the next section.

It is not without interest, however, to point out that

if the amount of work involved in a computing job is measured

in any sensible way, it theoretically requires no more work to

use (2.8) and (2.9) up to a predetermined value of N than to

use (2.I1). This assumes that check values of p will not have

to be computed from the definition of residual vector from

time to time in the use of (2.8) and (2.9). In this paper, we

shall measure amount of work by merely counting up the number

of multiplications required under the worst conditions, assum-

ing no zero or unit elements. Additions and subtractions will

be ignored. A division or reciprocation will count as a single

multiplication. To arrive at ^^ by (2.i|), starting with 4 ,

2 2

requires n multiplications per iteration, or Nn in all. To

arrive

15.

at ;p^ by (2.8) and (2.9) without computing any intermediate or

2

final check values of p,, takes n multiplications for p , and

2

thereafter (N-l)n multiplications for each iterative determin-

2

ation of pâ€ž. 3o once again the count is Nn Â«

Actually, in later sections we are going to set up an

error analysis which implies that p will always have to be

computed. The use of (2.4.) ^^fill be tacitly assumed, so n

extra multiplications will have to be added to the total count.

3* Stochastic solution of the problems . It should be apparent

from the foregoing section that even if one restricts oneself

to the class of stationary linear iterative processes, there