Allan Birnbaum.

Two expository notes on statistical inference: Generalized maximum likelihood methods; Confidence curves online

. (page 1 of 2)
Online LibraryAllan BirnbaumTwo expository notes on statistical inference: Generalized maximum likelihood methods; Confidence curves → online text (page 1 of 2)
Font size
QR-code for this ebook


NO. 196059 LIBRARY

23 Waverly Place, New York 3, N. Y.


MAY I960


Generalized Maximum Likelihood Methods With
Exact Justifications on Two Levels

Confidence Curves: An Omnibus Technique for
Estimation and Testing Statistical Hypotheses


I ^





13 fEKMlTTEU \\j\{ ANV (^URPO.SE

No. 196059 IMM-NYU 269

May I960

New York University
Institute of Mathematical Sciences

Two expository notes on statistical inference:



Allan BirnbaTom

This report represents results obtained at the
Institute of Mathematical Sciences, New York
University, under the sponsorship of the Office
of Naval Research, Contract No. Nonr-285(38).


Contributed paper to be read at the 32
Session of the International Statistical
Institute, Tokyo, May 30 - June 9, I960.


1. Introduction and summary. This paper is an expository
accovint of recent extensions of the theory of estimation [l] and of
the foundations of statistical inference [2], This work exhibits
in different ways, and on different theoretical levels, the central
position of the likelihood function as the objective basis for
efficient statistical inference, as well as giving new practical
techniques of statistical inference,

2« Likelihood methods with objective .1ustificatlons « We con-
sider first the familiar problem of estimation of a real-valued
parameter Q, from an outcome x of a specified statistical experiment
E (which may be sequential), represented by probability density
f\mctions f(x,P) (with respect to a fixed measure on a specified
sample space S = |x r ), Q in some interval Jl., A simple broad basis
for appraising, any estimator 0^"" = 0'"'{x) is given by the various
probabilities of its errors of overestlraation and underestimation
by various amounts :

iProb [o'Hx) ^ u|a], if u < 0,
Prob [P''''(X) > u|0], if u > 0,
0, if u = o,

for each and u. Such functions are called the risk curves of an
estimator i^'"' , and are simply a representation of the cumulative dis-
tribution functions of the estimator. The general goal in the
estimation problem is to choose O'^ so as to minimize simultaneously
as far as possible all the quantities a(u,a^a'''') , in non-trivial
problems, such appraisal and comparison of estimators leads not to
a simple ordering but to a partial ordering of estimators; for
example, errors of overestlmation can be reduced in general only at
the cost of increasing errors of underestimation. In typical

•SJ .

> I-



problems, one is led to consideration of a rather large class of
admissible estimators, which includes confidence limit estimators
as well as point estimators.

The simplest approach to the problem of jointly minimizing such
error-probabilities begins with the consideration of three values

®o' S^®o^' ^"^ ^2^^o^ °^ '^' ^l^^o^ = % = ^2^^o^' ^^^ *^® *^°
error-probabilities a(P ,©^,P'"") and a(P -Op,©'''), This problem is

solved by direct application of the fundamental lemma of Neyman and

Pearson: We define generalized score statistics:

S(x,0^,o^) = [log f{x,Q^) - log f(x,a^)]/(02 - Q^) if c^ ^ Q^,

S(x,o^,0^) = S(x,P^) =^ log f(x,0^) if ©2 = a^.

Then the two error-probabilities mentioned are jointly minimized in
the usual sense by any estimator Q'"{x) such that 0""(x) ^ P if and
only if S(x,o^,P^) s ^ ,a



the contexts of statistical experiments and the situations where
these are applied. Discussions of statistical inference problems
which do not have specified statistical experiraents as their frames
of reference are usually considered ujisatisfactory, and lacking in
objectivity, ilevertheless there is continuing dissatisfaction and
disagreement concerning the foiondations of mathematical statistics
as a theory of statistical inference. We shall illustrate here,
first by discussion of a simple concrete example of a statistical
experiment, and then in terms of a general mathematical theorem
which the example illustrates, that for one Important category of
inference problems, the concept of a statistical experiment, with
its probability terms interpreted in the usual objective ways, is
lacking; in objectivity in a relevant sense which can be demon -
strated mathematically (and physically); and that mathematical
analysis leads to a different basis which is more objective and
satisfactory for such problems of statistical inference .

To simplify all but the central issues here, we consider
binary statistical experiments (those in which just t;^J0 simple
hypotheses, H^ or Hp, appear). A simple binary experiment is one
in which the sample space S contains only two points 3 except in the
trivial case that the hypotheses are equivalent, one point, to be
called "positive", has larger probability imder H2 than under H, j
the other point vrill be called "negative". Each simple binary
experiment is represented by a pair (a,p), where a is the probability
of a "false positive" (or a Type I error), and p is the probability
\ander Hp of "negative", that is, the probability of a "false nega-
tive" (or a Type II error). For any a, (a,l-a) represents a trivial


experiment in ^^rhich H, and Hp are equivalent. For applications
such as the detection of presence or absence of some physical or
biological condition in a person or raaterial under investigation,
a single application of any technique of measiirement or
observation which gives dichotomous outcomes is represented
mathematically by a simple binary experiment (a,p). If such a
technique is applied, with statistical independence, n times, the
experiment is binary but no longer simple; its mathematical model
Is given by the binomial distributions:

Prob {x\e^) = (^)a^(l.a)^-^, Prob (x|h^) = (2)(l-p)''p""'' ,

X = 0,1, . ,. n.

We denote any such binary experiment by the symbol (a,p)^,

A symmetric simple binary experiment is one of the form (a,a).
If various experiments of this form (vxithout replication) are
possible in a given application, these admit a simple ordering:
( a,a) is more informative than (a', a') if ^ a < a» gp« Corres-
pondingly, an outcome from (•^,3) is xminf ormat i ve and irrelevant to
the hypotheses. For < a < •^, an outcome from (a, a) is incompletely
informative . And if ^ a < a' S "^9 then an outcome from (a, a) is
more informative than one from (a«,a')« These terms concerning the
value or strength of a specified outcome of a binary experiment,
as evidence relevant to the hypotheses H, or lip, are objectively
defined, mathematically and physically, in the same sense as are the
terms of modern probability theory referred to above; in fact, their

i- i> i. •' .


. f-\


5- C^ . >

objective character may be viexiied as based directly, by
definition, on the mathematically and physically objective charac-
ters of the symmetrical simple binary statistical experiments (a, a),

s a ^ ^.

It is convenient to employ the following (sufficient)
statistic, defined on the sample space S = fxl of any binary experi-
ment represented by any elementary probability fxinctions f, (x),
fp(x) respectively representing H, and Hp:

r = r(x) = log f2(x)/f^(x) ,

Then in the case of any symmetrical simple binary experiment (a, a),
the outcome "negative" gives r = - log (l-a)/a and the outcome
"positive" gives r = log (l-a)/a. For such experiments, the
algebraic sign of r(x) represents a qualitative property of any
outcome, as favoring either H, or Hoj while the absolute value
|r(x) I represents on a convenient scale, from to oo, its
strength as evidence relevant to H^ or Hp, with the value oo repres-
enting a completely informative outcome, and the value repres-
enting a (completely) unlnf ormative outcome. The interpretation
of r(x) in other types of binary experiments remains to be discussed*

Example » Consider the "mixture" experiment E'"' defined as
follows: V/ith respective probabilities g = ol536, g, = ,2Sl\l^,
gp = .5920, select at random one of the experiments i»S,»5),
(.0588,. 0588), or ( .0039, .0039); and obtain a single outcome
("positive" or "negative") by use of the selected experiment. The
discussion above shows hovj any outcome of E""'" should be interpreted,
for the purposes of inference being considered here; such inter-

•t- r-.^ .- '■


.00 nt i


pretatlon depends only on the selected simple experiment and its
outcome, and is otherwise independent of the mathematical structvire
of E'"'i it is easily verified that the sufficient statistic r,
defined as above, of the mixture experiment E'"' automatically takes
the same numerical values as does the corresponding statistic r
defined on any selected simple experiment.

Consider alternatively the binomial experiment E: (.2, ,2)^,
with possible outcomes x = 0,l,.j>,[|., Consider the problem of
interpreting corresponding values of the sufficient statistic r(x),
as evidence relevant to H^ or Hp, Since E is not a symmetrical
simple binary experiment, the above discussion has not been shovm
to be relevant to the interpretation of n-umerical values of r(x).
However, it is a mathematical fact, easily verified, that E is
equivalent to E"'^ in the sense that the sufficient statistics r of
the two experiments have the same distributions, vmder H^ and Hp
respectively, (The rational numbers required to define E*"' have
been given here only to four-decimal accuracy, ) Under this
equivalence, outcomes of E and of E'-' are equivalent if and only
if they give the same values to the sufficient statistics r. It
follows that the outcome r(x) of E should be interpreted as if the
same numerical value r had been obtained from a symmetrical simple
binary experiment. The scope of interpretations of values r, found
above, is extended in this way to the present experiment E; and
in this sanse.,. the mathematical structure of E as a whole becomes
irrelevant to the interpretation of outcomes, once the value of
r(x) is given. It may fairly be said that the frame of reference
of these interpretations continues to be the mathematical model of

t ■- - 4

t -*

:>B iBmioab-ijJO X

some experiment, namely the particular simple experiment chosen in
the mixtiire experiment E'"' which mathematical analysis shows to be
equivalent to the binomial experiment E,

Consider alternatively a different mixture experiment E'""""
defined as follows: With respective probabilities g = ,l536,
g^ = ,14.232, gi = ,14.232, select at random one of the experiraents
(.5, .5), (.0037, .0623), or (,0623,.0037)i and obtain a single
outcome by use of the selected experiment. For our inference
purposes, any outcome of this experiment should again be interpreted
with the selected simple experiment as the frame of reference, and
for these purposes the form of E'""'" as a whole is otherwise
irrelevant. However, it is easily verified that E'-"''" is mathe-
matically equivalent to E'"" (in the sense defined above), and that
each outcome of E"""'"' is equivalent to a certain outcome of E'*'', In
particular, the outcome "positive" from ( ,0037, .0623) in E''""' is
equivalent \inder this correspondence to the outcome "positive" from
(.0039,, 0039) in E'"", and the outcome "positive" from ( ,0623, .0037 )
in E'""""" is equivalent to the outcome "positive" from ( ,0568, ,0588)
in E'"-.

Thus v/e have found that, as a frame of reference for inter-
preting outcomes, "the selected simple experiment" in any mixtiire
experiment is clearly more relevant and objective than the
structure of the mixture experiment as a whole j and yet the
objectivity of ".the selected simple experiment" is in a sense
Illusory, since in different but equivalent mathematical models we
find different simple experiments serving equally well as "objective"
frames of reference for interpretations of the same outcome. V/hat
is in fact both objective and essentially relevant for such inter-

.■ ;r-3A;

".^- * cr Rfi^


pretations is only the numerical value of the sufficient statistic
r on the observed outcome, ^^fith its objective interpretations as
given above.

The generality of the features illustrated in this example is
established in the

Theorem , Each binary experiment is equivalent to a mixture
of simple binary experiments, (Most binary experiments, including
the binoraial example above, can be represented in an infinite
number of different forms as mixtures of simple experiments, )

Such analysis leads to the following conclusion: For problems
of statistical inference of the kind described above, given the
n"umerical values of the likelihood function determined on the
observed outcome of any specified binary experiment (that is, given
f,(x) and fpCx) for the observed x^ or, more concisely, given
r(x) = log f2(x)/f^(x) ), the structure of the experiment as a
whole is irrelevant.

One result of this analysis Is that a long-standing point of
difference between Bayesian and non-Bayesian statisticians can be
in part resolved as follows: for problems of the kind considered
here, Bayesian statisticians can agree with non-Bayesians who follow
the above analysis that r-values express in an objective sense the
relevant evidence from the experimental outcome Itself; the remain-
ing questions concern only the various possible m.odes of inter-
pretation of r-values in various inference situations.

The structxire of any experiment _is crucial for many other
kinds of problems of inference or decision-making dealt with by
mathematical statistics. And even for the kind of inference



problem considered here, the structure of an experiment Is crucial
in the sense that it represents the design of an experiment: even
if the interpretation of outcomes vjill leave aside the structure of
an experiment, there remain the crucial problems of appraising,
comparing, and choosing experimental designs for use in this way,
A highly informative experiment is one which gives with high
probability highly informative outcom^es (large values of lr|,
under each hypothesis). It is not clear that a numerical measure
of informatlveness of an experiment in this sense is necessary or
that it could be fully adequate, since the distributions
of r under respective hypotheses are basic and directly inter-
pret able •


[1] Birnbaum, A, "A unified theory of estimation, I,", (revised

and extended, I960), Technical Report, IMM-iWJ 266, Institute

of Mathematical Sciences, New York University,
[2] Birnbaum, A, "On the foundations of statistical inference, I."

(I960), Technical Report, IIOi-I^IYU 267, Institute of Mathematical

Sciences, New York University,

Get article est un expose des quelques extensions de la
theorie d 'estimation statistique et des fondements de la theorie
de la statistique math^matique.


1« Introductlon « There Is Increasing awareness among applied
and theoretical statisticians that many problems customarily
formulated in terms of testing statistical hypotheses can be
formiolated more appropriately as problems of estimation. The recent
expository paper by Natrella [1] describes this trend and some
principal reasons for it, and Illustrates how the close relation-
ship betxveen confidence intervals and tests facilitates a smooth
shift of emphasis from the techniques and concepts of testing to
those of estimation. The purpose of the present note is to describe
a technique of estimation by confidence curve S j which more formally
incorporates the practical techniques of testing, along with those
of point estimation and estimation by confidence limits and
confidence intervals at various levels. In one-parameter problems,
a confidence curve estimate can be interpreted flexibly, in any
context of application for general-p^^rpose informative inferences,
so as to provide conveniently any number of valid Inferences of
the following forms: (a) confidence limits and confidence inter-
vals, at various confidence levels, and a point estimate;
(b) significance tests, one- or ti\ro-sided, of particular parameter
values representing any hypothesis of interest; and for the latter
tests, (c) the critical level of Type I (that is, the customary
"P-level", the significance level at which the observed data x^jould
Indicate rejection) j and also (d) at each parameter value repres-
enting an alternative hypothesis of interest, a critical level of
Type II (that is, the analogue of the customary "P-level" which


, -> J. OV Qj.


corresponds to errors of Type II), which represents the power of

the test in a form which can be interpreted conveniently as part

of the over-all interpretation of observed data.

2, Definition of a confidence curve estimate, and an example >
For typical problems in which one parameter is of primary

interest, a confidence curve estimate is defined simply as a set

of confidence limits at various confidence levels.

It is convenient to use the notation t for the observed value

of the appropriate basic statistic in any specified experiment.

For example, if n independent observations y. are obtained, and if

the sample mean is the appropriate statistic, then

t = y = ) ' y./n» Let Q denote the unknovjn value of the parameter

1=1 ^
of interest. Let y denote any fixed nuiaber, g y ^ 1, For each

Y > ,5, let 0(t,Y) denote a lower confidence limit for ©, at the

Y confidence level, based on the observed value t. For each

Y < ,5, let 0(t,Y) denote an upper confidence limit for Q, at the
(1-y) confidence level, based on t. For y = •5, the corresponding
mathematical definition of S(t,Y) = '3(t,,5) can be interpreted more
usefully as follows: ©(t,,5) is a point-estimator of © which is
median-unbiased . To avoid ambiguity, it is convenient to replace
the usual terra "unbiased" by mean-unbiased , to refer to the property
that an estimator's mean value is the true parameter value being
estimated, A median-unbiased estimator is one whose median is the
true value being estimated; that is, a median-unbiased estimator
has probabilities of overestimation and of underestimation each
equal to "jo

O I. \ \. *. s


All of these definitions are suinmed up by stating: For each
y, whatever may be the true value o of the parameter, the estimator
©{t,Y) has the basic property that its value is less than with
probability equal to y (and hence its value exceeds Q with

probability (1-y) we leave aside the minor technicalities of

cases where estimators have discontinuous distributions). In
typical problems, the usual definitions of confidence limits
provide the following additional property: For each possible
observed t, as y decreases from 1 to 0, the respective values of
the estimates ^{t,x) increase continuously through the range of
possible values of d*

The manner of computing and reporting such sets of estimates
will naturally vary with problems and purposes. One form which is
often convenient, and for which the terra confidence curve seems
particularly appropriate, may be defined as follows for typical
problems; If a standard confidence limit method is applied to a
given observed value t, each of the possible values of 6 will be
a lower confidence limit at some confidence level and also an
upper confidence limit at some corresponding level (1-y); for
each ©, let c(P,t) denote the smaller of these ti/o values, y or
(1-y)« Then, for any observed value t^ as increases through its
range, the confidence curv e c(Q,t) will ii.creas:^ continuously from
to ^, and then decrease continuously to 0, An alternative
definition of the confidence curve c(0,t) is the following: given
the observed t, for each Q the value of c(0,t) is the smaller of


Online LibraryAllan BirnbaumTwo expository notes on statistical inference: Generalized maximum likelihood methods; Confidence curves → online text (page 1 of 2)