Allan Birnbaum. # Two expository notes on statistical inference: Generalized maximum likelihood methods; Confidence curves online

. **(page 1 of 2)**

Online Library → Allan Birnbaum → Two expository notes on statistical inference: Generalized maximum likelihood methods; Confidence curves → online text (page 1 of 2)

Font size

NEW YORK UNIVERSITY

INSTITUTE OF MATHEMATICAL SCIENCES

NO. 196059 LIBRARY

23 Waverly Place, New York 3, N. Y.

NEW YORK UNIVERSITY

INSTITUTE OF

MATHEMATICAL SCIENCES

IMM-NYU 269

MAY I960

TWO EXPOSITORY NOTES ON STATISTICAL INFERENCE:

Generalized Maximum Likelihood Methods With

Exact Justifications on Two Levels

Confidence Curves: An Omnibus Technique for

Estimation and Testing Statistical Hypotheses

ALLAN BIRNBAUM

I ^

h

PREPARED UNDER

CONTRACT NO. NONR-285(38)

WITH THE

OFFICE OF NAVAL RESEARCH

UNITED STATES NAVY

;{E."H0DLKT10N IN \l-MOLir. OR IN PART

13 fEKMlTTEU \\j\{ ANV (^URPO.SE

Of THE UNITED STATES C0VÂ£8MÂ«NT.

No. 196059 IMM-NYU 269

May I960

New York University

Institute of Mathematical Sciences

Two expository notes on statistical inference:

GENERALIZED MAXIMUM LIKELIHOOD METHODS WITH

EXACT JUSTIFICATIONS ON TWO LEVELS

CONFIDENCE CURVES: AN OMNIBUS TECHNIQUE FOR

ESTIMATION AND TESTING STATISTICAL HYPOTHESES

Allan BirnbaTom

This report represents results obtained at the

Institute of Mathematical Sciences, New York

University, under the sponsorship of the Office

of Naval Research, Contract No. Nonr-285(38).

I960

Contributed paper to be read at the 32

Session of the International Statistical

Institute, Tokyo, May 30 - June 9, I960.

GENERALIZED MAXIMUM LIKELIHOOD METHODS WITH

EXACT JUSTIFICATIONS ON TWO LEVELS

1. Introduction and summary. This paper is an expository

accovint of recent extensions of the theory of estimation [l] and of

the foundations of statistical inference [2], This work exhibits

in different ways, and on different theoretical levels, the central

position of the likelihood function as the objective basis for

efficient statistical inference, as well as giving new practical

techniques of statistical inference,

2Â« Likelihood methods with objective .1ustificatlons Â« We con-

sider first the familiar problem of estimation of a real-valued

parameter Q, from an outcome x of a specified statistical experiment

E (which may be sequential), represented by probability density

f\mctions f(x,P) (with respect to a fixed measure on a specified

sample space S = |x r ), Q in some interval Jl., A simple broad basis

for appraising, any estimator 0^"" = 0'"'{x) is given by the various

probabilities of its errors of overestlraation and underestimation

by various amounts :

iProb [o'Hx) ^ u|a], if u < 0,

Prob [P''''(X) > u|0], if u > 0,

0, if u = o,

for each and u. Such functions are called the risk curves of an

estimator i^'"' , and are simply a representation of the cumulative dis-

tribution functions of the estimator. The general goal in the

estimation problem is to choose O'^ so as to minimize simultaneously

as far as possible all the quantities a(u,a^a'''') , in non-trivial

problems, such appraisal and comparison of estimators leads not to

a simple ordering but to a partial ordering of estimators; for

example, errors of overestlmation can be reduced in general only at

the cost of increasing errors of underestimation. In typical

â€¢SJ .

> I-

r

2:

problems, one is led to consideration of a rather large class of

admissible estimators, which includes confidence limit estimators

as well as point estimators.

The simplest approach to the problem of jointly minimizing such

error-probabilities begins with the consideration of three values

Â®o' S^Â®o^' ^"^ ^2^^o^ Â°^ '^' ^l^^o^ = % = ^2^^o^' ^^^ *^Â® *^Â°

error-probabilities a(P ,Â©^,P'"") and a(P -Op,Â©'''), This problem is

solved by direct application of the fundamental lemma of Neyman and

Pearson: We define generalized score statistics:

S(x,0^,o^) = [log f{x,Q^) - log f(x,a^)]/(02 - Q^) if c^ ^ Q^,

and

S(x,o^,0^) = S(x,P^) =^ log f(x,0^) if Â©2 = a^.

Then the two error-probabilities mentioned are jointly minimized in

the usual sense by any estimator Q'"{x) such that 0""(x) ^ P if and

only if S(x,o^,P^) s ^ ,a

^L3-

"i"

the contexts of statistical experiments and the situations where

these are applied. Discussions of statistical inference problems

which do not have specified statistical experiraents as their frames

of reference are usually considered ujisatisfactory, and lacking in

objectivity, ilevertheless there is continuing dissatisfaction and

disagreement concerning the foiondations of mathematical statistics

as a theory of statistical inference. We shall illustrate here,

first by discussion of a simple concrete example of a statistical

experiment, and then in terms of a general mathematical theorem

which the example illustrates, that for one Important category of

inference problems, the concept of a statistical experiment, with

its probability terms interpreted in the usual objective ways, is

lacking; in objectivity in a relevant sense which can be demon -

strated mathematically (and physically); and that mathematical

analysis leads to a different basis which is more objective and

satisfactory for such problems of statistical inference .

To simplify all but the central issues here, we consider

binary statistical experiments (those in which just t;^J0 simple

hypotheses, H^ or Hp, appear). A simple binary experiment is one

in which the sample space S contains only two points 3 except in the

trivial case that the hypotheses are equivalent, one point, to be

called "positive", has larger probability imder H2 than under H, j

the other point vrill be called "negative". Each simple binary

experiment is represented by a pair (a,p), where a is the probability

of a "false positive" (or a Type I error), and p is the probability

\ander Hp of "negative", that is, the probability of a "false nega-

tive" (or a Type II error). For any a, (a,l-a) represents a trivial

Sc$j

experiment in ^^rhich H, and Hp are equivalent. For applications

such as the detection of presence or absence of some physical or

biological condition in a person or raaterial under investigation,

a single application of any technique of measiirement or

observation which gives dichotomous outcomes is represented

mathematically by a simple binary experiment (a,p). If such a

technique is applied, with statistical independence, n times, the

experiment is binary but no longer simple; its mathematical model

Is given by the binomial distributions:

Prob {x\e^) = (^)a^(l.a)^-^, Prob (x|h^) = (2)(l-p)''p""'' ,

X = 0,1, . ,. n.

We denote any such binary experiment by the symbol (a,p)^,

A symmetric simple binary experiment is one of the form (a,a).

If various experiments of this form (vxithout replication) are

possible in a given application, these admit a simple ordering:

( a,a) is more informative than (a', a') if ^ a < aÂ» gpÂ« Corres-

pondingly, an outcome from (â€¢^,3) is xminf ormat i ve and irrelevant to

the hypotheses. For < a < â€¢^, an outcome from (a, a) is incompletely

informative . And if ^ a < a' S "^9 then an outcome from (a, a) is

more informative than one from (aÂ«,a')Â« These terms concerning the

value or strength of a specified outcome of a binary experiment,

as evidence relevant to the hypotheses H, or lip, are objectively

defined, mathematically and physically, in the same sense as are the

terms of modern probability theory referred to above; in fact, their

i- i> i. â€¢' .

'â– T'C

. f-\

a^

5- C^ . >

objective character may be viexiied as based directly, by

definition, on the mathematically and physically objective charac-

ters of the symmetrical simple binary statistical experiments (a, a),

s a ^ ^.

It is convenient to employ the following (sufficient)

statistic, defined on the sample space S = fxl of any binary experi-

ment represented by any elementary probability fxinctions f, (x),

fp(x) respectively representing H, and Hp:

r = r(x) = log f2(x)/f^(x) ,

Then in the case of any symmetrical simple binary experiment (a, a),

the outcome "negative" gives r = - log (l-a)/a and the outcome

"positive" gives r = log (l-a)/a. For such experiments, the

algebraic sign of r(x) represents a qualitative property of any

outcome, as favoring either H, or Hoj while the absolute value

|r(x) I represents on a convenient scale, from to oo, its

strength as evidence relevant to H^ or Hp, with the value oo repres-

enting a completely informative outcome, and the value repres-

enting a (completely) unlnf ormative outcome. The interpretation

of r(x) in other types of binary experiments remains to be discussed*

Example Â» Consider the "mixture" experiment E'"' defined as

follows: V/ith respective probabilities g = ol536, g, = ,2Sl\l^,

gp = .5920, select at random one of the experiments iÂ»S,Â»5),

(.0588,. 0588), or ( .0039, .0039); and obtain a single outcome

("positive" or "negative") by use of the selected experiment. The

discussion above shows hovj any outcome of E""'" should be interpreted,

for the purposes of inference being considered here; such inter-

â€¢t- r-.^ .- 'â–

erf-f'

.00 nt i

8

pretatlon depends only on the selected simple experiment and its

outcome, and is otherwise independent of the mathematical structvire

of E'"'i it is easily verified that the sufficient statistic r,

defined as above, of the mixture experiment E'"' automatically takes

the same numerical values as does the corresponding statistic r

defined on any selected simple experiment.

Consider alternatively the binomial experiment E: (.2, ,2)^,

with possible outcomes x = 0,l,.j>,[|., Consider the problem of

interpreting corresponding values of the sufficient statistic r(x),

as evidence relevant to H^ or Hp, Since E is not a symmetrical

simple binary experiment, the above discussion has not been shovm

to be relevant to the interpretation of n-umerical values of r(x).

However, it is a mathematical fact, easily verified, that E is

equivalent to E"'^ in the sense that the sufficient statistics r of

the two experiments have the same distributions, vmder H^ and Hp

respectively, (The rational numbers required to define E*"' have

been given here only to four-decimal accuracy, ) Under this

equivalence, outcomes of E and of E'-' are equivalent if and only

if they give the same values to the sufficient statistics r. It

follows that the outcome r(x) of E should be interpreted as if the

same numerical value r had been obtained from a symmetrical simple

binary experiment. The scope of interpretations of values r, found

above, is extended in this way to the present experiment E; and

in this sanse.,. the mathematical structure of E as a whole becomes

irrelevant to the interpretation of outcomes, once the value of

r(x) is given. It may fairly be said that the frame of reference

of these interpretations continues to be the mathematical model of

t â– - - 4

t -*

:>B iBmioab-ijJO X

some experiment, namely the particular simple experiment chosen in

the mixtiire experiment E'"' which mathematical analysis shows to be

equivalent to the binomial experiment E,

Consider alternatively a different mixture experiment E'""""

defined as follows: With respective probabilities g = ,l536,

g^ = ,14.232, gi = ,14.232, select at random one of the experiraents

(.5, .5), (.0037, .0623), or (,0623,.0037)i and obtain a single

outcome by use of the selected experiment. For our inference

purposes, any outcome of this experiment should again be interpreted

with the selected simple experiment as the frame of reference, and

for these purposes the form of E'""'" as a whole is otherwise

irrelevant. However, it is easily verified that E'-"''" is mathe-

matically equivalent to E'"" (in the sense defined above), and that

each outcome of E"""'"' is equivalent to a certain outcome of E'*'', In

particular, the outcome "positive" from ( ,0037, .0623) in E''""' is

equivalent \inder this correspondence to the outcome "positive" from

(.0039,, 0039) in E'"", and the outcome "positive" from ( ,0623, .0037 )

in E'""""" is equivalent to the outcome "positive" from ( ,0568, ,0588)

in E'"-.

Thus v/e have found that, as a frame of reference for inter-

preting outcomes, "the selected simple experiment" in any mixtiire

experiment is clearly more relevant and objective than the

structure of the mixture experiment as a whole j and yet the

objectivity of ".the selected simple experiment" is in a sense

Illusory, since in different but equivalent mathematical models we

find different simple experiments serving equally well as "objective"

frames of reference for interpretations of the same outcome. V/hat

is in fact both objective and essentially relevant for such inter-

.â– ;r-3A;

".^- * cr Rfi^

10

pretations is only the numerical value of the sufficient statistic

r on the observed outcome, ^^fith its objective interpretations as

given above.

The generality of the features illustrated in this example is

established in the

Theorem , Each binary experiment is equivalent to a mixture

of simple binary experiments, (Most binary experiments, including

the binoraial example above, can be represented in an infinite

number of different forms as mixtures of simple experiments, )

Such analysis leads to the following conclusion: For problems

of statistical inference of the kind described above, given the

n"umerical values of the likelihood function determined on the

observed outcome of any specified binary experiment (that is, given

f,(x) and fpCx) for the observed x^ or, more concisely, given

r(x) = log f2(x)/f^(x) ), the structure of the experiment as a

whole is irrelevant.

One result of this analysis Is that a long-standing point of

difference between Bayesian and non-Bayesian statisticians can be

in part resolved as follows: for problems of the kind considered

here, Bayesian statisticians can agree with non-Bayesians who follow

the above analysis that r-values express in an objective sense the

relevant evidence from the experimental outcome Itself; the remain-

ing questions concern only the various possible m.odes of inter-

pretation of r-values in various inference situations.

The structxire of any experiment _is crucial for many other

kinds of problems of inference or decision-making dealt with by

mathematical statistics. And even for the kind of inference

wi-'ac.

11

problem considered here, the structure of an experiment Is crucial

in the sense that it represents the design of an experiment: even

if the interpretation of outcomes vjill leave aside the structure of

an experiment, there remain the crucial problems of appraising,

comparing, and choosing experimental designs for use in this way,

A highly informative experiment is one which gives with high

probability highly informative outcom^es (large values of lr|,

under each hypothesis). It is not clear that a numerical measure

of informatlveness of an experiment in this sense is necessary or

that it could be fully adequate, since the distributions

of r under respective hypotheses are basic and directly inter-

pret able â€¢

REF5RE1JCES

[1] Birnbaum, A, "A unified theory of estimation, I,", (revised

and extended, I960), Technical Report, IMM-iWJ 266, Institute

of Mathematical Sciences, New York University,

[2] Birnbaum, A, "On the foundations of statistical inference, I."

(I960), Technical Report, IIOi-I^IYU 267, Institute of Mathematical

Sciences, New York University,

R^su-m6'

Get article est un expose des quelques extensions de la

theorie d 'estimation statistique et des fondements de la theorie

de la statistique math^matique.

12

1Â« Introductlon Â« There Is Increasing awareness among applied

and theoretical statisticians that many problems customarily

formulated in terms of testing statistical hypotheses can be

formiolated more appropriately as problems of estimation. The recent

expository paper by Natrella [1] describes this trend and some

principal reasons for it, and Illustrates how the close relation-

ship betxveen confidence intervals and tests facilitates a smooth

shift of emphasis from the techniques and concepts of testing to

those of estimation. The purpose of the present note is to describe

a technique of estimation by confidence curve S j which more formally

incorporates the practical techniques of testing, along with those

of point estimation and estimation by confidence limits and

confidence intervals at various levels. In one-parameter problems,

a confidence curve estimate can be interpreted flexibly, in any

context of application for general-p^^rpose informative inferences,

so as to provide conveniently any number of valid Inferences of

the following forms: (a) confidence limits and confidence inter-

vals, at various confidence levels, and a point estimate;

(b) significance tests, one- or ti\ro-sided, of particular parameter

values representing any hypothesis of interest; and for the latter

tests, (c) the critical level of Type I (that is, the customary

"P-level", the significance level at which the observed data x^jould

Indicate rejection) j and also (d) at each parameter value repres-

enting an alternative hypothesis of interest, a critical level of

Type II (that is, the analogue of the customary "P-level" which

flOO

, -> J. OV Qj.

13

corresponds to errors of Type II), which represents the power of

the test in a form which can be interpreted conveniently as part

of the over-all interpretation of observed data.

2, Definition of a confidence curve estimate, and an example >

For typical problems in which one parameter is of primary

interest, a confidence curve estimate is defined simply as a set

of confidence limits at various confidence levels.

It is convenient to use the notation t for the observed value

of the appropriate basic statistic in any specified experiment.

For example, if n independent observations y. are obtained, and if

the sample mean is the appropriate statistic, then

t = y = ) ' y./nÂ» Let Q denote the unknovjn value of the parameter

1=1 ^

of interest. Let y denote any fixed nuiaber, g y ^ 1, For each

Y > ,5, let 0(t,Y) denote a lower confidence limit for Â©, at the

Y confidence level, based on the observed value t. For each

Y < ,5, let 0(t,Y) denote an upper confidence limit for Q, at the

(1-y) confidence level, based on t. For y = â€¢5, the corresponding

mathematical definition of S(t,Y) = '3(t,,5) can be interpreted more

usefully as follows: Â©(t,,5) is a point-estimator of Â© which is

median-unbiased . To avoid ambiguity, it is convenient to replace

the usual terra "unbiased" by mean-unbiased , to refer to the property

that an estimator's mean value is the true parameter value being

estimated, A median-unbiased estimator is one whose median is the

true value being estimated; that is, a median-unbiased estimator

has probabilities of overestimation and of underestimation each

equal to "jo

O I. \ \. *. s

Ik

All of these definitions are suinmed up by stating: For each

y, whatever may be the true value o of the parameter, the estimator

Â©{t,Y) has the basic property that its value is less than with

probability equal to y (and hence its value exceeds Q with

probability (1-y) we leave aside the minor technicalities of

cases where estimators have discontinuous distributions). In

typical problems, the usual definitions of confidence limits

provide the following additional property: For each possible

observed t, as y decreases from 1 to 0, the respective values of

the estimates ^{t,x) increase continuously through the range of

possible values of d*

The manner of computing and reporting such sets of estimates

will naturally vary with problems and purposes. One form which is

often convenient, and for which the terra confidence curve seems

particularly appropriate, may be defined as follows for typical

problems; If a standard confidence limit method is applied to a

given observed value t, each of the possible values of 6 will be

a lower confidence limit at some confidence level and also an

upper confidence limit at some corresponding level (1-y); for

each Â©, let c(P,t) denote the smaller of these ti/o values, y or

(1-y)Â« Then, for any observed value t^ as increases through its

range, the confidence curv e c(Q,t) will ii.creas:^ continuously from

to ^, and then decrease continuously to 0, An alternative

definition of the confidence curve c(0,t) is the following: given

the observed t, for each Q the value of c(0,t) is the smaller of

INSTITUTE OF MATHEMATICAL SCIENCES

NO. 196059 LIBRARY

23 Waverly Place, New York 3, N. Y.

NEW YORK UNIVERSITY

INSTITUTE OF

MATHEMATICAL SCIENCES

IMM-NYU 269

MAY I960

TWO EXPOSITORY NOTES ON STATISTICAL INFERENCE:

Generalized Maximum Likelihood Methods With

Exact Justifications on Two Levels

Confidence Curves: An Omnibus Technique for

Estimation and Testing Statistical Hypotheses

ALLAN BIRNBAUM

I ^

h

PREPARED UNDER

CONTRACT NO. NONR-285(38)

WITH THE

OFFICE OF NAVAL RESEARCH

UNITED STATES NAVY

;{E."H0DLKT10N IN \l-MOLir. OR IN PART

13 fEKMlTTEU \\j\{ ANV (^URPO.SE

Of THE UNITED STATES C0VÂ£8MÂ«NT.

No. 196059 IMM-NYU 269

May I960

New York University

Institute of Mathematical Sciences

Two expository notes on statistical inference:

GENERALIZED MAXIMUM LIKELIHOOD METHODS WITH

EXACT JUSTIFICATIONS ON TWO LEVELS

CONFIDENCE CURVES: AN OMNIBUS TECHNIQUE FOR

ESTIMATION AND TESTING STATISTICAL HYPOTHESES

Allan BirnbaTom

This report represents results obtained at the

Institute of Mathematical Sciences, New York

University, under the sponsorship of the Office

of Naval Research, Contract No. Nonr-285(38).

I960

Contributed paper to be read at the 32

Session of the International Statistical

Institute, Tokyo, May 30 - June 9, I960.

GENERALIZED MAXIMUM LIKELIHOOD METHODS WITH

EXACT JUSTIFICATIONS ON TWO LEVELS

1. Introduction and summary. This paper is an expository

accovint of recent extensions of the theory of estimation [l] and of

the foundations of statistical inference [2], This work exhibits

in different ways, and on different theoretical levels, the central

position of the likelihood function as the objective basis for

efficient statistical inference, as well as giving new practical

techniques of statistical inference,

2Â« Likelihood methods with objective .1ustificatlons Â« We con-

sider first the familiar problem of estimation of a real-valued

parameter Q, from an outcome x of a specified statistical experiment

E (which may be sequential), represented by probability density

f\mctions f(x,P) (with respect to a fixed measure on a specified

sample space S = |x r ), Q in some interval Jl., A simple broad basis

for appraising, any estimator 0^"" = 0'"'{x) is given by the various

probabilities of its errors of overestlraation and underestimation

by various amounts :

iProb [o'Hx) ^ u|a], if u < 0,

Prob [P''''(X) > u|0], if u > 0,

0, if u = o,

for each and u. Such functions are called the risk curves of an

estimator i^'"' , and are simply a representation of the cumulative dis-

tribution functions of the estimator. The general goal in the

estimation problem is to choose O'^ so as to minimize simultaneously

as far as possible all the quantities a(u,a^a'''') , in non-trivial

problems, such appraisal and comparison of estimators leads not to

a simple ordering but to a partial ordering of estimators; for

example, errors of overestlmation can be reduced in general only at

the cost of increasing errors of underestimation. In typical

â€¢SJ .

> I-

r

2:

problems, one is led to consideration of a rather large class of

admissible estimators, which includes confidence limit estimators

as well as point estimators.

The simplest approach to the problem of jointly minimizing such

error-probabilities begins with the consideration of three values

Â®o' S^Â®o^' ^"^ ^2^^o^ Â°^ '^' ^l^^o^ = % = ^2^^o^' ^^^ *^Â® *^Â°

error-probabilities a(P ,Â©^,P'"") and a(P -Op,Â©'''), This problem is

solved by direct application of the fundamental lemma of Neyman and

Pearson: We define generalized score statistics:

S(x,0^,o^) = [log f{x,Q^) - log f(x,a^)]/(02 - Q^) if c^ ^ Q^,

and

S(x,o^,0^) = S(x,P^) =^ log f(x,0^) if Â©2 = a^.

Then the two error-probabilities mentioned are jointly minimized in

the usual sense by any estimator Q'"{x) such that 0""(x) ^ P if and

only if S(x,o^,P^) s ^ ,a

^L3-

"i"

the contexts of statistical experiments and the situations where

these are applied. Discussions of statistical inference problems

which do not have specified statistical experiraents as their frames

of reference are usually considered ujisatisfactory, and lacking in

objectivity, ilevertheless there is continuing dissatisfaction and

disagreement concerning the foiondations of mathematical statistics

as a theory of statistical inference. We shall illustrate here,

first by discussion of a simple concrete example of a statistical

experiment, and then in terms of a general mathematical theorem

which the example illustrates, that for one Important category of

inference problems, the concept of a statistical experiment, with

its probability terms interpreted in the usual objective ways, is

lacking; in objectivity in a relevant sense which can be demon -

strated mathematically (and physically); and that mathematical

analysis leads to a different basis which is more objective and

satisfactory for such problems of statistical inference .

To simplify all but the central issues here, we consider

binary statistical experiments (those in which just t;^J0 simple

hypotheses, H^ or Hp, appear). A simple binary experiment is one

in which the sample space S contains only two points 3 except in the

trivial case that the hypotheses are equivalent, one point, to be

called "positive", has larger probability imder H2 than under H, j

the other point vrill be called "negative". Each simple binary

experiment is represented by a pair (a,p), where a is the probability

of a "false positive" (or a Type I error), and p is the probability

\ander Hp of "negative", that is, the probability of a "false nega-

tive" (or a Type II error). For any a, (a,l-a) represents a trivial

Sc$j

experiment in ^^rhich H, and Hp are equivalent. For applications

such as the detection of presence or absence of some physical or

biological condition in a person or raaterial under investigation,

a single application of any technique of measiirement or

observation which gives dichotomous outcomes is represented

mathematically by a simple binary experiment (a,p). If such a

technique is applied, with statistical independence, n times, the

experiment is binary but no longer simple; its mathematical model

Is given by the binomial distributions:

Prob {x\e^) = (^)a^(l.a)^-^, Prob (x|h^) = (2)(l-p)''p""'' ,

X = 0,1, . ,. n.

We denote any such binary experiment by the symbol (a,p)^,

A symmetric simple binary experiment is one of the form (a,a).

If various experiments of this form (vxithout replication) are

possible in a given application, these admit a simple ordering:

( a,a) is more informative than (a', a') if ^ a < aÂ» gpÂ« Corres-

pondingly, an outcome from (â€¢^,3) is xminf ormat i ve and irrelevant to

the hypotheses. For < a < â€¢^, an outcome from (a, a) is incompletely

informative . And if ^ a < a' S "^9 then an outcome from (a, a) is

more informative than one from (aÂ«,a')Â« These terms concerning the

value or strength of a specified outcome of a binary experiment,

as evidence relevant to the hypotheses H, or lip, are objectively

defined, mathematically and physically, in the same sense as are the

terms of modern probability theory referred to above; in fact, their

i- i> i. â€¢' .

'â– T'C

. f-\

a^

5- C^ . >

objective character may be viexiied as based directly, by

definition, on the mathematically and physically objective charac-

ters of the symmetrical simple binary statistical experiments (a, a),

s a ^ ^.

It is convenient to employ the following (sufficient)

statistic, defined on the sample space S = fxl of any binary experi-

ment represented by any elementary probability fxinctions f, (x),

fp(x) respectively representing H, and Hp:

r = r(x) = log f2(x)/f^(x) ,

Then in the case of any symmetrical simple binary experiment (a, a),

the outcome "negative" gives r = - log (l-a)/a and the outcome

"positive" gives r = log (l-a)/a. For such experiments, the

algebraic sign of r(x) represents a qualitative property of any

outcome, as favoring either H, or Hoj while the absolute value

|r(x) I represents on a convenient scale, from to oo, its

strength as evidence relevant to H^ or Hp, with the value oo repres-

enting a completely informative outcome, and the value repres-

enting a (completely) unlnf ormative outcome. The interpretation

of r(x) in other types of binary experiments remains to be discussed*

Example Â» Consider the "mixture" experiment E'"' defined as

follows: V/ith respective probabilities g = ol536, g, = ,2Sl\l^,

gp = .5920, select at random one of the experiments iÂ»S,Â»5),

(.0588,. 0588), or ( .0039, .0039); and obtain a single outcome

("positive" or "negative") by use of the selected experiment. The

discussion above shows hovj any outcome of E""'" should be interpreted,

for the purposes of inference being considered here; such inter-

â€¢t- r-.^ .- 'â–

erf-f'

.00 nt i

8

pretatlon depends only on the selected simple experiment and its

outcome, and is otherwise independent of the mathematical structvire

of E'"'i it is easily verified that the sufficient statistic r,

defined as above, of the mixture experiment E'"' automatically takes

the same numerical values as does the corresponding statistic r

defined on any selected simple experiment.

Consider alternatively the binomial experiment E: (.2, ,2)^,

with possible outcomes x = 0,l,.j>,[|., Consider the problem of

interpreting corresponding values of the sufficient statistic r(x),

as evidence relevant to H^ or Hp, Since E is not a symmetrical

simple binary experiment, the above discussion has not been shovm

to be relevant to the interpretation of n-umerical values of r(x).

However, it is a mathematical fact, easily verified, that E is

equivalent to E"'^ in the sense that the sufficient statistics r of

the two experiments have the same distributions, vmder H^ and Hp

respectively, (The rational numbers required to define E*"' have

been given here only to four-decimal accuracy, ) Under this

equivalence, outcomes of E and of E'-' are equivalent if and only

if they give the same values to the sufficient statistics r. It

follows that the outcome r(x) of E should be interpreted as if the

same numerical value r had been obtained from a symmetrical simple

binary experiment. The scope of interpretations of values r, found

above, is extended in this way to the present experiment E; and

in this sanse.,. the mathematical structure of E as a whole becomes

irrelevant to the interpretation of outcomes, once the value of

r(x) is given. It may fairly be said that the frame of reference

of these interpretations continues to be the mathematical model of

t â– - - 4

t -*

:>B iBmioab-ijJO X

some experiment, namely the particular simple experiment chosen in

the mixtiire experiment E'"' which mathematical analysis shows to be

equivalent to the binomial experiment E,

Consider alternatively a different mixture experiment E'""""

defined as follows: With respective probabilities g = ,l536,

g^ = ,14.232, gi = ,14.232, select at random one of the experiraents

(.5, .5), (.0037, .0623), or (,0623,.0037)i and obtain a single

outcome by use of the selected experiment. For our inference

purposes, any outcome of this experiment should again be interpreted

with the selected simple experiment as the frame of reference, and

for these purposes the form of E'""'" as a whole is otherwise

irrelevant. However, it is easily verified that E'-"''" is mathe-

matically equivalent to E'"" (in the sense defined above), and that

each outcome of E"""'"' is equivalent to a certain outcome of E'*'', In

particular, the outcome "positive" from ( ,0037, .0623) in E''""' is

equivalent \inder this correspondence to the outcome "positive" from

(.0039,, 0039) in E'"", and the outcome "positive" from ( ,0623, .0037 )

in E'""""" is equivalent to the outcome "positive" from ( ,0568, ,0588)

in E'"-.

Thus v/e have found that, as a frame of reference for inter-

preting outcomes, "the selected simple experiment" in any mixtiire

experiment is clearly more relevant and objective than the

structure of the mixture experiment as a whole j and yet the

objectivity of ".the selected simple experiment" is in a sense

Illusory, since in different but equivalent mathematical models we

find different simple experiments serving equally well as "objective"

frames of reference for interpretations of the same outcome. V/hat

is in fact both objective and essentially relevant for such inter-

.â– ;r-3A;

".^- * cr Rfi^

10

pretations is only the numerical value of the sufficient statistic

r on the observed outcome, ^^fith its objective interpretations as

given above.

The generality of the features illustrated in this example is

established in the

Theorem , Each binary experiment is equivalent to a mixture

of simple binary experiments, (Most binary experiments, including

the binoraial example above, can be represented in an infinite

number of different forms as mixtures of simple experiments, )

Such analysis leads to the following conclusion: For problems

of statistical inference of the kind described above, given the

n"umerical values of the likelihood function determined on the

observed outcome of any specified binary experiment (that is, given

f,(x) and fpCx) for the observed x^ or, more concisely, given

r(x) = log f2(x)/f^(x) ), the structure of the experiment as a

whole is irrelevant.

One result of this analysis Is that a long-standing point of

difference between Bayesian and non-Bayesian statisticians can be

in part resolved as follows: for problems of the kind considered

here, Bayesian statisticians can agree with non-Bayesians who follow

the above analysis that r-values express in an objective sense the

relevant evidence from the experimental outcome Itself; the remain-

ing questions concern only the various possible m.odes of inter-

pretation of r-values in various inference situations.

The structxire of any experiment _is crucial for many other

kinds of problems of inference or decision-making dealt with by

mathematical statistics. And even for the kind of inference

wi-'ac.

11

problem considered here, the structure of an experiment Is crucial

in the sense that it represents the design of an experiment: even

if the interpretation of outcomes vjill leave aside the structure of

an experiment, there remain the crucial problems of appraising,

comparing, and choosing experimental designs for use in this way,

A highly informative experiment is one which gives with high

probability highly informative outcom^es (large values of lr|,

under each hypothesis). It is not clear that a numerical measure

of informatlveness of an experiment in this sense is necessary or

that it could be fully adequate, since the distributions

of r under respective hypotheses are basic and directly inter-

pret able â€¢

REF5RE1JCES

[1] Birnbaum, A, "A unified theory of estimation, I,", (revised

and extended, I960), Technical Report, IMM-iWJ 266, Institute

of Mathematical Sciences, New York University,

[2] Birnbaum, A, "On the foundations of statistical inference, I."

(I960), Technical Report, IIOi-I^IYU 267, Institute of Mathematical

Sciences, New York University,

R^su-m6'

Get article est un expose des quelques extensions de la

theorie d 'estimation statistique et des fondements de la theorie

de la statistique math^matique.

12

1Â« Introductlon Â« There Is Increasing awareness among applied

and theoretical statisticians that many problems customarily

formulated in terms of testing statistical hypotheses can be

formiolated more appropriately as problems of estimation. The recent

expository paper by Natrella [1] describes this trend and some

principal reasons for it, and Illustrates how the close relation-

ship betxveen confidence intervals and tests facilitates a smooth

shift of emphasis from the techniques and concepts of testing to

those of estimation. The purpose of the present note is to describe

a technique of estimation by confidence curve S j which more formally

incorporates the practical techniques of testing, along with those

of point estimation and estimation by confidence limits and

confidence intervals at various levels. In one-parameter problems,

a confidence curve estimate can be interpreted flexibly, in any

context of application for general-p^^rpose informative inferences,

so as to provide conveniently any number of valid Inferences of

the following forms: (a) confidence limits and confidence inter-

vals, at various confidence levels, and a point estimate;

(b) significance tests, one- or ti\ro-sided, of particular parameter

values representing any hypothesis of interest; and for the latter

tests, (c) the critical level of Type I (that is, the customary

"P-level", the significance level at which the observed data x^jould

Indicate rejection) j and also (d) at each parameter value repres-

enting an alternative hypothesis of interest, a critical level of

Type II (that is, the analogue of the customary "P-level" which

flOO

, -> J. OV Qj.

13

corresponds to errors of Type II), which represents the power of

the test in a form which can be interpreted conveniently as part

of the over-all interpretation of observed data.

2, Definition of a confidence curve estimate, and an example >

For typical problems in which one parameter is of primary

interest, a confidence curve estimate is defined simply as a set

of confidence limits at various confidence levels.

It is convenient to use the notation t for the observed value

of the appropriate basic statistic in any specified experiment.

For example, if n independent observations y. are obtained, and if

the sample mean is the appropriate statistic, then

t = y = ) ' y./nÂ» Let Q denote the unknovjn value of the parameter

1=1 ^

of interest. Let y denote any fixed nuiaber, g y ^ 1, For each

Y > ,5, let 0(t,Y) denote a lower confidence limit for Â©, at the

Y confidence level, based on the observed value t. For each

Y < ,5, let 0(t,Y) denote an upper confidence limit for Q, at the

(1-y) confidence level, based on t. For y = â€¢5, the corresponding

mathematical definition of S(t,Y) = '3(t,,5) can be interpreted more

usefully as follows: Â©(t,,5) is a point-estimator of Â© which is

median-unbiased . To avoid ambiguity, it is convenient to replace

the usual terra "unbiased" by mean-unbiased , to refer to the property

that an estimator's mean value is the true parameter value being

estimated, A median-unbiased estimator is one whose median is the

true value being estimated; that is, a median-unbiased estimator

has probabilities of overestimation and of underestimation each

equal to "jo

O I. \ \. *. s

Ik

All of these definitions are suinmed up by stating: For each

y, whatever may be the true value o of the parameter, the estimator

Â©{t,Y) has the basic property that its value is less than with

probability equal to y (and hence its value exceeds Q with

probability (1-y) we leave aside the minor technicalities of

cases where estimators have discontinuous distributions). In

typical problems, the usual definitions of confidence limits

provide the following additional property: For each possible

observed t, as y decreases from 1 to 0, the respective values of

the estimates ^{t,x) increase continuously through the range of

possible values of d*

The manner of computing and reporting such sets of estimates

will naturally vary with problems and purposes. One form which is

often convenient, and for which the terra confidence curve seems

particularly appropriate, may be defined as follows for typical

problems; If a standard confidence limit method is applied to a

given observed value t, each of the possible values of 6 will be

a lower confidence limit at some confidence level and also an

upper confidence limit at some corresponding level (1-y); for

each Â©, let c(P,t) denote the smaller of these ti/o values, y or

(1-y)Â« Then, for any observed value t^ as increases through its

range, the confidence curv e c(Q,t) will ii.creas:^ continuously from

to ^, and then decrease continuously to 0, An alternative

definition of the confidence curve c(0,t) is the following: given

the observed t, for each Q the value of c(0,t) is the smaller of

1 2

Online Library → Allan Birnbaum → Two expository notes on statistical inference: Generalized maximum likelihood methods; Confidence curves → online text (page 1 of 2)