Dale Weldeau Jorgenson.

# Nonlinear three stage least squares pooling of cross section and average time series data online

. (page 1 of 2)
Font size

WORKING PAPER
ALFRED P. SLOAN SCHOOL OF MANAGEMENT

NONLINEAR THREE STAGE LEAST SQUARES POOLING OF
CROSS SECTION AND AVERAGE TIME SERIES DATA

Dale W. Jorgenson"
and
Thomas M. Stoker""

April 1982
(Revised: August I983)

Sloan School of Management Working Paper #1293-82 ^.S3

MASSACHUSETTS

INSTITUTE OF TECHNOLOGY

50 MEMORIAL DRIVE

CAMBRIDGE, MASSACHUSETTS 02139

NONLINEAR THREE STAGE LEAST SQUARES POOLING OF
CROSS SECTION AND AVERAGE TIME SERIES DATA

Dale W. Jorgenson"
and
Thomas M. Stoker""

April 1982
(Revised: August 1983)

Sloan School of Management Working Paper #1293-82

"Department of Economics, Harvard University
Cambridge, Massachusetts 02138

'"Sloan School of Management, Massachusetts Institute of Technology
Cambridge, Massachusetts 02139

MONLIMEAR THREE STAGE LF^ST SQUARES POOLING
OF CROSS SECTION AND TIME SERIES OBSERVATIONS

by
l^ale V/. Jorgenson and Thomas U. Stolcer

l_, Iiitroduc t ion . The purpose of this paper is to discuss the pooling of
cross section and average time series data by the method of nonlinear three
stage least squares introduced by Jorgenson and Laffont (1974). ^ \,e consider
applications of this method to exact aggregation models, where there is a
unique correspondence between individual and aggregate behavior. This
correspondence makes exact aggregation models appropriate for the analysis of
individual data, average data, or both in coi.bina t ion. "

\lc consider observations on K individuals, indexed by k = 1, 2 ... Iw for
T time periods, indexed by t = 1, 2 ... T. V/e can represent the structural
form of an exact aggregation model for the kth individual in the tth time
period by:

^nkt = ^kt Pn^Pf ^'^' (n = 1. 2 ... N).

The observations y^^j.^ and Xu^^ vary over both individuals and time periods,
while the vector of observations p^ varies over time periods, but is the same
for all individuals in a given time period. The coefficients P (p , O) are
functions of the observations p^ and the vector of L structural parameters

6' = ^^1' ^2 â– â€¢â–  ^L^ â€¢ Restrictions on the parameters are embodied in the
forms of these functions.

We can write the exact aggregation model for the kth individual in vector
form :

yj.,. = (I., Â» Xj^^) p(p^, e), (1)

- 2

where y is a vector of N observations, fiip^, Â©) is a vector of N coeffi-
cients, and I. is tlie identity matrix of order N. By averaging the model (1)
over all individuals for each time period, we obtain the structural form of
the eact aggregation model for averaged data.

^t = (^N Â® ^P P^i't' ^^

(2)

where y and x' are vectors of M observations on averages of y, . and x,' over
all individuals.

The models for individual cross section and average time series observa-
tions contain the same parameter vector 9 and the same coefficient vector
P(p^, Â©) . This reflects the correspondence between individual and aggregate
behavior that characterizes exact aggregation models. The forms of the indivi-
dual and aggregate i.iodel (1) and (2) are necessary and sufficient for exact
aggregation, provided that the population distribution of x is unres-
tricted. ^

As an example of exact aggregation models we first consider the linear
model that underlies previous discussions of pooling cross section and time
series data : ^

^nkt = Pt Â«ln ' Kt Â«2n ' (n = 1. 2 ... N) .

where G, and 0^ are vectors of parameters. In this example the vector of
In Z n

parameters of models (1) and (2) includes the elements of and 0t (n =

^ In zn

1, 2 ... N) . The vector of coefficients (i (p e) ' is (p^ 0j,j , 0oj,) and the

vector of observations x/ is (1 z' )

kt " ^ ' kt *

Deciand analysis provides many examples of nonlinear exact aggregation
models. In each of these examples the theory of consumer beliavior implies
constraints on the parameters of the model that are incorporated through the

- 3 -

form of the coefficients P^^Cp^, e) (n = 1. 2 ... N). Demand systems generated
by the Gorman polar form of the indirect utility function are nonlinear exact
aggregation models. Specific examples include the linear expenditure system
introduced by Klein and Rubin (1947-1948) and implemented by Stone (1954), the
S-branch utility tree of JJrown and Helen (1972), and the generalization of the
S-branch utility tree of Dlackorby, Boyce, and Russell (1978).

As an illustration, the linear expenditure system can be v/ritten in exact
aggregation form as follows:

'nkt = (Pâ€žt -^n - ^, ^ "^j Pjt^ ^ \ ''kt â€¢

(n = 1, 2 ... N).

where y n*. denotes expenditure on the nth commodity by the kth individual in
period t and p^^^ is the price of this comi.iodity (n = 1, 2 ... N) ; M, is total
expenditure on all coiianodities. The vector of parameters 9 includes the
parameters b^^ and c^^ (n = 1, 2 ... N) , the vector of coefficients nâ€ž(P(, Â«) '
is (Pjjj. c^ - bjj 1 c- P;t;'^ii^ *"'' *^^^ vector of observations x' is (1, ''],.)â€¢

More complex nonlinear exact aggregation models liave recently been intro-
duced by Deaton and Muellbauer (1980a, 19G0b) and by Jorgenson, Lau, and
Stoker (1980, 1981, 1982). The AIDS models of Deaton and Duellbauer can be
wri tten:

^nkt = (aâ€ž . I câ€žj in p.^)M,^ . ^^7^7^'^" = 1. 2 ... N).
where Yj^j^^, '1^*, and p ^ are defined as in the linear expenditure system and:
InP^ = la. in p . ^ ^ I" I I câ€ž . Inp^^ ' In p^^.

is a price index. The vector of parameters d includes the parameters

*t,> bâ€ž, câ€ž â€¢ (n, j = 1, 2 ... N), the vector of coefficients B (p., O) ' is
nnnj*^ nt

- 4 -

(a + 1 c . In p.^. :; ;;") and the vector of observations x,' is

n nj Jt' In P kt

The translog model of Jorgenson, Lau and Stoker can be represented in the
form :

a + I b In p. . b ,. b^' n, ^ A , ,
y _ /_a 'li l-Ll) â€ž _ Ul m . m .5- _ns kt skt

^nkt - ^ D(p^) ' "kt n(p^) â–  kt ^Â° ' kt "" ^ n(p^)

(n = 1, 2 ... M),

where y^j,^, Mj.^ and p^^ are defined as above, A , ^ (s = 1, 2 ... S) represents
demographic characteristics such as family size, age of head of household, and
so on, and:

D(p^) = -1 + I b,jj In p.j.

In this example the vector Â© consists of the parameters a b . b b'' (n,

n' nj ' ;:j' ns

j = 1, 2 ... N; s = 1, 2 ... S) , the vector of coefficients P (p , G) ' is

aâ€ž + lb. In p.. b,,. bâ€ž, bâ€ž- b^e
^ ^) ' ^ - ^ ' ^ â€¢â€¢â–  ^' ^"" ''^ ^^'^'"^ Â°' Â°'"^-

vations x-^ is (M^^. H^^ In V.^^. Mj^^ A^^^. M,^^ A^j^, ... M,^, A^.^^y

In this paper we focus on the implications of nonlinearity for the pool-
ing of cross section and average time series data. In Section 2 we consider
the stochastic specification of exact aggregation models (1) and (2). In Sec-
tion 3 we present and characterize the nonlinear three stage least squares
estimator for pooled time series and cross section observations. In Section 4
we discuss hypothesis testing and in Section 5 we consider estimation subject
to inequality constraints. V/e close with a brief summary of the results and a
discussion of applications.

2_. Stochast ic ?'>i)cc if ic at ion . V/e begin by considering average

- 5 -

observations for T time periods and a single cross section of K inJividual
observations. We assune that the observations are generated by exact a^^gre^a-
tion models (1) and (2) with additive disturbance terms. Given the stochastic
specification of the disturbance terms, the observations must be transformed
to obtain disturbances that are honoscedastic and uncorrelated across observa-
tions.

For pooling of cross section and average time series data the transforma-
tion of observations to obtain homoscedast ic and uncorrelated disturbances can
be divided into two steps. The first step separates the data sets by
transforming the average data so that time series disturbances are uncorre-
lated with cross section disturbances. The second iiej) transforms the result-
ing data sets to a form where disturbances in each data set are hoMOScedast ic
and uncorrelated. V/e present the transformation for the first step expli-
citly, indicating the features of this transformation that result in increased
efficiency. Tlie second step involves standard techniques for transformation,
which we illustrate by example.

We assume that individual observations are generated by the exact aggre-
gation model (1) with an additive random component, say e .

^kt = (In Â® ^kt) P(Pf Â®) ^ =kf (!')

We assume that the disturbance term e^,^ is distributed with mean zero and is
uncorrelated across individuals, so that:

^^'kt ^k't') =0. k it k'.

Any systematic correlation among individuals is assumed to be captured by
selection of the variables x^^^. The disturbance term e^^ is assumed to have
variance Q and time series covariance structure E(e, e,' , ) = C , !f . A

- 6

wide variety of alternative time series structures for e can be represented
by choosing an appropriate form for the matrix C ,

We could obtain a stochastic version of the exact aggregation model (2)
by averaging the individual observations in (1') for each time period. This
would be the appropriate procedure if the average data were constructed by
averaging the individual observations. However, we must allow for alternative
methods for constructing the aggregate data. In demand analysis, for example,
data on aggregate personal consumption expenditures are obtained from produc-
tion accounts for the economy as a whole rather than by direct observation of
quantities consumed by the entire population of individual households.

To allow for differences in methods of construction of the individual and
aggregate data we introduce an additive random component V into the exact
aggregation model (2) for each time period. The model relating the averaged
data y ^q ^ anj p jj then:

y^ = (Ifj 8 ^[) P(Pt, e) + Uj, (2')

where u = \) + e. and e^^ is a vector of N averaged disturbances (e, ) . Tlie

stochastic term ^) is assumed to be distributed independently of ej^^ with mean

ft ' 1 1 '

zero, variance J-K Â» and time series covariance structure E(\) \) , ) = n\ for

t ^ t'. To accommodate a variety of time series covariance structures for u

we have:

E(u ii')=a +â€” r o
t \'> "\) K ^tt' ^e-

In order to present methods for pooling cross section and time series
data we consider a sample of K' individual observations. We can "stack" the
equations (1') to obtain:

Y = (I^j 9 X) p(pj. . 9) + e, (3)

- 7

where Y is the vector of observations (y . ), X is the matrix with

nkt

rows

{x,' } and e is the vector of disturbances with mean zero and covarianco
kt
o

matrix H 9 I,.,. Similarly, we can represent the equations (2') in the form:

6 K

Y = f(e) + u.

(4)

where Y is the vector of averaged observations {y } ,

f(e) =

xj p(Pi .6)
I^ p(P2 .e)

and u is the vector of disturbances.

The first step in the transformation of observations eliminates the
correlation between of e and u

K'

E(u e, ') = TT- C n ,
t ^kt ' K tt ''e '

(k = 1. 2 ... K'; t = 1. 2 ... T). (5)

This correlation is removed by a nonsingular transformation of (3) and (4),
which is equivalent to replacing y , 7 and u^^ in (2') by:

yo

= Y. _

'i- C v

1^ ^tt Yes.

(6)

t

X - i^ c

K "tt ^es'
o

u" = u

t I

III C

:: tt ^cs'

o

where y , x and e denote the cross section averages of y, , x, and

o o

Ej. , The resulting disturbances uÂ° are now uncorrelated with e^^^ (k = 1, 2

- 8 -

K'). but have a more coraplicated time series structure than the original
disturbances :

E(u- uÂ«:) = 9^ ^ ^ C^,. P., - K' [C^^^. . C^,^ - II "e â€¢ ^"'^
The second step in the transformation of observations is to apply a non-
singular transform to the average data in (4) to obtain disturbances that are
homoscedastic and uncorrelated. We illustrate this transformation below by
example. We assume that the transformation has been performed, altering the
model (4) to:

â€” (8)

Y* = f*(e*) + u* ,

where u* is distributed with mean zero and variance 9.^^ A I^. For estimation,
we stack the equation systems (3) and (8):

( 9)
Y = tÂ» (eÂ«) + U.

where U' = (e' . u*'). which is distributed with mean zero and variance:

- - u., = pr^;,j .

The implementation of the transformations described above requires con-
sistent estimates of the variances and covariances n^. C^^,. 9^ it.f = 1. 2
... T). In general, these estimates require specific models of the processes
generating the disturbances. The purpose of the transformations is to assure
efficiency in estimation. Equation (2') shows that the contribution of the
individual errors e^^^ to the covariance structure of u^. is likely to be negli-
gible unless the matrices Â£2*^* ' are the same order of magnitude as ^^C^^. o^,
where K is population size. The benefits of performing the transformation (6)

- 9 -

depend on the size of the cross section relative to the population. In many
applications, K'/K will be extremely small so that the transformation (6)
leaves the observations unaffected. Typical numbers for an analysis of U.S.
household demand behavior are K' = 10,000 and K =70 million. Consequently,
only when the cross section sample size is of the same order of magnitude as
the size of the population will the correction yield significant benefits;
otherwise it can be ignored.

The following examples illustrate different error structures, where we
assume K'/K is very small. We take C^^, = q , t^t r ^ f^r simplicity, defer-
ring further discussion of this time series structure until we have presented
the examples. In Examples 1 and 2 we take flK^ = for t ^ t'.

Example 1 (Random Individual Errors): Suppose that \} arises because of
an additional random component ^ at the individual level, which is dis-
tributed with mean and variance 9..., so that \). = I \), /K. Then u =

V t K t t

I(\j. + ej.j.)/K. with:

E(u^ up =

t#f .

The second stage transformation is just a grouping correction, with u of

(8) given as u = 'X/IT'u^ , with Q ^ = fi., +
t ^ t u* V

9. .'

Â£

Example 2 (Common Time Effect): Suppose that ^ represents a conmion dis-
turbance in the aggregate data with n^*^ = fi for all t. In practice one
will usually encounter K SI > > n , so that u = \) for purposes of esti-
mation. Here no second stage correction is necessary, with

Example 3 (Autocorrelated Conmion Time Effect): Suppose that Example 2 is

- 10 -

modified to \)^ = y ^ , + w , where lo. is distributed with nean zero,

variance fi and uncorrelated over time, with K fl > > . llien
w we

1 1 2

n,. = Ji /I - Y â€¢ As above, the contribution of "Le.^/K to u is neeliai-
V (1) ' kt t Â° "

ble, so that u = \) . The second stage correction is now quasi-first
differencing, replacing y and x by y - y y . and x - y x (with

the standard adjustment to the first observation). Of course, u = u
and CI ^ = n in this case.

U* (1)

Now suppose that C , ^ so that we have a nontrivial time series corre-
lation structure for e In Examples 2 and 3 above, the effect of C , jtQ

Ik. L L L

would be negligible, due to the unimportance of X Eh^/K in u. . In Example 1,

K t t

however, the time series structure is potentially important, since '\Ji; u will

have the same time series covariance structure as e, ^ and would require con-

kt ^

sideration in the second stage of the transformation of observations.

Example 3 illustrates the cost of pooling with very general error struc-
tures. In particular. Example 3, the parameter y is best relabeled as a com-
ponent of Â©, with the transformed error covariance structure now determined by

^- and fil Â» = ,Q . The treatment of autocorrelation will involve augmenting the
e u* w DO

list of parameters to be estimated with the remaining error structure charac-
terized by Q and Q ^. This modeling approach is standard practice in time
series analysis. Consequently, in Section 3 we discuss only the consistent

estimation of the parameters 9. and CI *, which we will regard as positive

e u*

definite but otherwise unrestricted.

Before discussing the additional assumptions required for estimation of
the complete model, we introduce instrumental variables. It is often
appropriate to treat the variables x^^ and p^^ as endogenous for the individual
observations, the aggregate observations, or both. Tliis can occur when the
model is a simultaneous equations model in exact aggregation form or part of a

- 11 -

.arger system of simultaneous equations. For example, in demand analysis
observations on prices can reflect both supply and demand influences, requir-
ing aggregate instruments. Alternatively, in a study of savings, errors in
variables may necessitate instruments for the individual data, while in the
average data such errors may be negligible.

We assxune that there are vectors of observations on instruiaental vari-
ables, say t^j ). Denote as Z, and Z the matrices with rows z, and z'
respectively, and as Z the matrix:

\\^0

Finally, we must introduce regularity assumptions in order to character-
ize the NL3SLo estimator. We include these in the Appendix. The assumptions
are that the coefficient functions P(p e) are twice continuously differenti-
ablo in the components of fl, that the moment matrices defining the NL3SLS
objective function converge to stable, well behaved limits, and that the
parameter vector Â© is identified. We collect all components of ^ identified
in the cross section in a set Â§^ , all parameters identified in the time series
in a set 9, and all the remaining parameters in a set 9 .

3.. The Nonlinear Three Stage Least Squares Estimator . The t-ILSSLS estima-
tor 6 of Â©* is found as the value of 6 which minimizes:

S(e) = il - (i (9)) ' [t^ % Z(Z'Z)^Z'] lY - 4(9)).

â€¢ V \ ^7 '

(11)

where :

E T.

u* T,

- 12 -

is a consistent estimator of I as IC , T â€” >

1