Ch. 11 Panel Data Model Data sets that combine time series and cross sections are common in econo- metrics. For example, the published statistics of the OECD contain numerous series of economic aggregate observed yearly for many countries. The PSID is a studies of roughly 6000 families and 15000 individuals who has been interviews periodically from 1968 to the present. Panel data sets are more oriented toward cross-section analysis; they are wide but typically short (relatively). Hetero- geneity across units is an integral part of the analysis. Recall that the (multiple) linear model is used to study the relationship be- tween a dependent variable and several independent variables. That is y = f(x1; x2; :::; xk) + " = 1x1 + 2x2 + ::: + kxk + " = x0 + " where y is the dependent or explained variable, xi; i = 1; :::; k are the independent or the explanatory variables and i; i = 1; :::; k are unknown coe cients that we are interested in learning about, either through estimation or through hypothesis testing. The term " is an unobservable random disturbance. In the following, we will see the panel data sets provide a richer source of information and the needing of some complex stochastic speci cations. The fundamental advantage of a panel data set over a cross section is that it will allow the researcher greater exibility in model di erence in behavior across individuals. The basic framework for this statistical model is of the form yit = x0it + z0i + "it; i = 1; 2; :::; N; t = 1; 2; :::; T: There are k regressor in xit, not including a constant term. The heterogene- ity, or individual e ect is z0i where z0i contains a constant term and a set of individual or group speci c variables, which may be observed, such as race, sex, location, an so on or unobserved, such as family speci c characteristics, individ- ual heterogeneity in skill or preference, and so on, all of which are taken to be constant over time t. The various cases we will consider are: 1 1. Pooled Regression: If z0i contains only a constant term, then there is no individual speci c characteristics in this model. All we need is pooling the data, yit = x0it + + "it; i = 1; 2; :::; N; t = 1; 2; :::; T: and OLS provides consistent and e cient estimate of the common and . 2. Fixed E ects: If z0i = i, then it is the xed e ect approach to take i as a group-speci c constant term in the regression model. yit = x0it + i + "it; i = 1; 2; :::; N; t = 1; 2; :::; T: 3. Random e ects: If the unobserved individual heterogeneity can be assumed to be uncorrelated with the included variables, then the model may be formulated as yit = x0it + E(z0i ) + [z0i E(z0i )] + "it = x0it + + ui + "it; i = 1; 2; :::; N; t = 1; 2; :::; T: The random e ect approach speci es that ui is a group speci c random element, similar to "it except that for each group, there is but a single draw that enters the regression identically in each period. 1 Fixed E ects This formulation of the model assume that di erences across units can be cap- tured in di erence in the constant term. Each i is treated as an unknown parameter to be estimated. Let yi and Xi be the T observations the ith unit, i be a T 1 column of ones, and let "i be associated T 1 vector of disturbance. Then yi = Xi + i i + "i; i = 1; 2; :::; N: It is also assumed that the disturbance terms are well behaved, that is E("i) = 0; E("i"0i) = 2IT; and E("i"0j) = 0 if i 6= j: 2 Observations on all the cross-section can be rewritten as 2 66 66 66 66 4 y1 y2 : : : : yN 3 77 77 77 77 5 = 2 66 66 66 66 4 X1 X2 : : : : XN 3 77 77 77 77 5 + 2 66 66 66 66 4 i 0 : : : 0 0 i 0 : : 0 : : : : : : : : : : : : : : : : : : : : : : : : 0 : : : 0 i 3 77 77 77 77 5 2 66 66 66 66 4 1 2 : : : : N 3 77 77 77 77 5 + 2 66 66 66 66 4 "1 "2 : : : : "N 3 77 77 77 77 5 ; or in more compact form y = X + D + "; where y and " are NT 1, X is NT k, is k 1, and D = [d1 d2 :::dN] is NT N with di is a dummy variables indicating the ith unit. This model is usually referred to as the least squares dummy variable (LSDV) model. Since this model satisfy the ideal conditions, OLS estimator is BLUE. By using the familiar partitioned regression of Ch. 6, the slope estimator would be ^ = (X0MDX) 1X0MDy; where MD = INT D(D0D) 1D0: Lemma: MD = INT D(D0D) 1D0 = 2 66 66 66 66 4 M0 0 : : : 0 0 M0 0 : : 0 : : : : : : : : : : : : : : : : : : : : : : : : 0 : : : 0 M0 3 77 77 77 77 5 ; where M0 = IT 1=T(ii0) is the demean-matrix. 3 Proof: By de nition, D0D = 2 66 66 66 66 4 i0 0 : : : 0 0 i0 0 : : 0 : : : : : : : : : : : : : : : : : : : : : : : : 0 : : : 0 i0 3 77 77 77 77 5 2 66 66 66 66 4 i 0 : : : 0 0 i 0 : : 0 : : : : : : : : : : : : : : : : : : : : : : : : 0 : : : 0 i 3 77 77 77 77 5 = 2 66 66 66 66 4 i0i 0 : : : 0 0 i0i 0 : : 0 : : : : : : : : : : : : : : : : : : : : : : : : 0 : : : 0 i0i 3 77 77 77 77 5 = 2 66 66 66 66 4 T 0 : : : 0 0 T 0 : : 0 : : : : : : : : : : : : : : : : : : : : : : : : 0 : : : 0 T 3 77 77 77 77 5 N N ; and therefore INT D(D0D) 1D0 = 2 66 66 66 66 4 IT 0 : : : 0 0 IT 0 : : 0 : : : : : : : : : : : : : : : : : : : : : : : : 0 : : : 0 IT 3 77 77 77 77 5 2 66 66 66 66 4 i 0 : : : 0 0 i 0 : : 0 : : : : : : : : : : : : : : : : : : : : : : : : 0 : : : 0 i 3 77 77 77 77 5 2 66 66 66 66 4 T 0 : : : 0 0 T 0 : : 0 : : : : : : : : : : : : : : : : : : : : : : : : 0 : : : 0 T 3 77 77 77 77 5 12 66 66 66 66 4 i0 0 : : : 0 0 i0 0 : : 0 : : : : : : : : : : : : : : : : : : : : : : : : 0 : : : 0 i0 3 77 77 77 77 5 = 2 66 66 66 66 4 IT 1T ii0 0 : : : 0 0 IT 1T ii0 0 : : 0 : : : : : : : : : : : : : : : : : : : : : : : : 0 : : : 0 IT 1T ii0 3 77 77 77 77 5 = 2 66 66 66 66 4 M0 0 : : : 0 0 M0 0 : : 0 : : : : : : : : : : : : : : : : : : : : : : : : 0 : : : 0 M0 3 77 77 77 77 5 : 4 It is easy to see that the matrix MD is idempotent and that MDy = 2 66 66 66 66 4 M0 0 : : : 0 0 M0 0 : : 0 : : : : : : : : : : : : : : : : : : : : : : : : 0 : : : 0 M0 3 77 77 77 77 5 2 66 66 66 66 4 y1 y2 : : : : yN 3 77 77 77 77 5 = 2 66 66 66 66 4 y1 y1i y2 y2i : : : : yN yNi 3 77 77 77 77 5 and MDX = 2 66 66 66 66 4 M0 0 : : : 0 0 M0 0 : : 0 : : : : : : : : : : : : : : : : : : : : : : : : 0 : : : 0 M0 3 77 77 77 77 5 2 66 66 66 66 4 X1 X2 : : : : XN 3 77 77 77 77 5 = 2 66 66 66 66 4 M0X1 M0X2 : : : : M0XN 3 77 77 77 77 5 ; where the scalar yi = 1=T PTt=1 yit; i = 1; 2; :::; N, and let Xi = [xi1 xi2 ::::xik], then M0Xi = [M0xi1 M0xi2 ::::: M0xik] . Therefore M0xij = xij xiji; j = 1; 2; :::; k with xij = 1=T PTt=1 xijt. Denote xi = [ xi1 xi2 :::: xik]0, the least squares regression of Mdy on MDX is equivalently to regression of [yit yi] on [xit xi]. The dummy variables coe cient can be recovered from D0y = D0X^ + D0D^ + D0e; or ^ = (D0D) 1D0(y X^ ) since D0e = 0. 5 This implies that 2 66 66 66 66 4 ^ 1 ^ 2 : : : : ^ N 3 77 77 77 77 5 = 1T 2 66 66 66 66 4 i0 0 : : : 0 0 i0 0 : : 0 : : : : : : : : : : : : : : : : : : : : : : : : 0 : : : 0 i0 3 77 77 77 77 5 2 66 66 66 66 4 y1 X1 ^ y2 X2 ^ : : : : yN XN ^ 3 77 77 77 77 5 = 2 66 66 66 4 1 T [ PT t=1(y1t x 01t ^ )] 1 T [ PT t=1(y2t x 02t ^ )]: : : : 1 T [ PT t=1(yNt x 0Nt ^ )] 3 77 77 77 5 = 2 66 66 66 66 4 y1 x01 ^ y2 x02 ^ : : : : yN x0N ^ 3 77 77 77 77 5 : Exercise: Let the xed e ect model be partitioned as y = X^ + D^ + e; show that the variance of ^ is V ar(^ ) = 2(X0MDX) 1. Proof: ^ = (X0MDX) 1X0MDy = + (X0MDX) 1X0MD"; 6 therefore, V ar(^ ) = E[(^ )(^ )0] = E[((X0MDX) 1X0MD")((X0MDX) 1X0MD")0] = E[(X0MDX) 1X0MD""0MDX(X0MDX) 1] = 2[(X0MDX) 1X0MDINTMDX(X0MDX) 1] = 2[(X0MDX) 1X0MDX(X0MDX) 1] = 2(X0MDX) 1: With the above results, the appropriate estimator of V ar( ^ ) is therefore Est(V ar(^ )) = s2(X0MDX) 1; where the disturbance variance estimator is s2 s2 = (y X ^ D^ )0(y X^ D^ ) NT N K = PN i=1 PT t=1(yit x 0 it ^ ^ i) 2 NT N k : Exercise: Show that V ar(^ i) = 2 T + x 0 iV ar(^ ) xi: 1.1 Testing the Signi cance of the Group E ects Consider the null hypothesis that H0 : 1 = 2 = ::: = N = . Under this null hypothesis, the e cient estimator is the pooled least squares. The F ration used for the test would be FN 1;NT N k = (R 2 LSDV R 2 Pooled)=(N 1) (1 R2LSDV )=(NT N k); where R2LSDV indicates the R2 from the dummy variables model and R2Pooled in- dicates the R2 from the pooled or restricted model with only a single overall 7 constant. Example: Example 13.2 at p.292 of Greene’s, where N=6, k=3 and T=15 (see Ex. 7.2 on p.118). Exercise: Reproduce rst, third and fourth rows of the results at Table 13.1 on p.292 of Greene. 1.2 The Within and Between Groups Estimators We could formulate a pooled regression model in three ways. First, the original formulation is yit = + x0it + "it; i = 1; 2; :::; N; t = 1; 2; :::; T: (1) In term of deviations from the group means, yit yi = (xit xi)0 + "it "i; i = 1; 2; :::; N; t = 1; 2; :::; T; (2) and in terms of the group means, yi = + x0i + "i; i = 1; 2; :::; N: (3) To estimate by OLS, in (1) we would use the total sum of squares and cross products, Stotalxx = NX i=1 TX t=1 (xit x)(xit x)0 and Stotalxy = NX i=1 TX t=1 (xit x)(yit y); where x = 1NT PNi=1PTt=1 xit and y = 1NT PNi=1PTt=1 yit. In (2), the moments matrices we use are within-group (i.e., deviations from the group means) sums of squares and cross products, SWithinxx = NX i=1 TX t=1 (xit xi)(xit xi)0 8 and SWithinxy = NX i=1 TX t=1 (xit xi)(yit yi); Finally, for (3), the means of group mean are the overall mean (i.e., 1=N PNi=1 yi = y). Therefore the moment matrices are the between-groups sums of squares and cross products, SBetweenxx = NX i=1 T( xi x)( xi x)0 and SBetweenxy = NX i=1 T( xi x)( yi y); It is easy to verify that STotalxx = SWithinxx + SBetweenxx and STotalxy = SWithinxy + SBetweenxy : Therefore, there are three possible least square estimator of corresponding to theses decomposition. The least squares estimator in the pooling regression is ^ Total = (STotalxx ) 1STotalxy = (SWithinxx + SBetweenxx ) 1(SWithinxy + SBetweenxy ): The within-groups estimator is ^ W ithin = (SWithinxx ) 1SWithinxy : This id the LSDV estimator computed earlier. An alternative estimator would be the between-groups estimator, ^ Between = (SBetweenxx ) 1SBetweenxy : This is the least square estimator based on the N sets of group means. From the preceding expression, SWithinxy = SWithinxx ^ Within 9 and SBetweenxy = SBetweenxx ^ Between; we have ^ Total = (SWithinxx + SBetweenxx ) 1(SWithinxx ^ Within + SBetweenxx ^ Between) = (SWithinxx + SBetweenxx ) 1SWithinxx ^ Within + (SWithinxx + SBetweenxx ) 1SBetweenxx ^ Between = F Within ^ Within + F Between ^ Between; where F Within = (SWithinxx +SBetweenxx ) 1SWithinxx and F Within+F Between = (SWithinxx + SBetweenxx ) 1(SWithinxx +SBetweenxx ) = I. That is the pooling OLS estimator is a matrix weighted average of the within- and between-groups estimator. 2 Random E ects Consider the model yit = x0it + + ui + "it; i = 1; 2; :::; N; t = 1; 2; :::; T: where there are k regressors including a constant and now the single constant term is the mean of the unobserved heterogeneity, E(z0i ). The component ui is the random heterogeneity speci c to the ith observation and is constant through time. We assume further E("it) = E(ui) = 0; E("2it) = 2"; E(u2i ) = 2u; E("ituj) = 0 for all i; t; and j; E("it"js) = 0 if t 6= s or i 6= j; E(uiuj) = 0 if i 6= j: Denote it = "it + ui; let yi and Xi (including the constant term) be the T observations the ith unit, i be a T 1 column of ones, and let i = [ i1; i2; ::::; iT ]0; 10 then yi = X0i + i; i = 1; 2; :::; N; and the variance of the disturbance would be = E( i 0i) = E 2 66 66 66 4 "i1 + ui "i2 + ui : : : "iT + ui 3 77 77 77 5 " i1 + ui "i2 + ui : : "iT + ui = 2 66 66 66 4 2" + 2u 2u 2u : : : 2u 2u 2" + 2u 2u : : : 2u : : : : : : 2u 2u 2u : : : 2" + 2u 3 77 77 77 5 = 2"IT + 2uiTi0T: Observations on all the cross-section can be rewritten as 2 66 66 66 66 4 y1 y2 : : : : yN 3 77 77 77 77 5 = 2 66 66 66 66 4 X1 X2 : : : : XN 3 77 77 77 77 5 + 2 66 66 66 66 4 1 2 : : : : N 3 77 77 77 77 5 ; or in more compact form y = X + ; where y and are NT 1, X is NT k, is k 1 and = E( 0) = 2 66 66 66 66 4 0 : : : 0 0 0 : : 0 : : : : : : : : : : : : : : : : : : : : : : : : 0 : : : 0 3 77 77 77 77 5 = IN : 11 2.1 Generalized Least Squares The generalized least squares estimator of the slope parameters is ~ = (X0 1X) 1X0 1y = NX i=1 X0i 1Xi ! 1 NX i=1 X0i 1yi) ! : As with many generalized least squares problem it is convenient to nd a transform matrix 1=2 = [In ] 1=2 so that the OLS can be applied to the transformed model. Fuller and Battese (1973) suggest 1=2 = 1 " IT T iTi0T ; where = 1 "p 2 " + T 2u : The transformation of yi and Xi for GLS is therefore 1=2yi = 1 " 2 66 66 66 4 yi1 yi yi2 yi : : : yiT yi 3 77 77 77 5 and likewise for the rows of Xi. Note that the similarity of this procedure to the computation in the LSDV model, which use = 1. One would interpret as the e ect that would remain if " = 0, because the only e ect then would be ui. In this case, the xed and the random e ects model would be indistinguishable, so this results make sense. 2.2 FGLS when is unknown If the variance component are known, generalized least squares can be computed as shown earlier. Of course, this is unlikely, so as usual, we must rst estimates the disturbance variance and then use an FGLS procedure. A heuristic approach to estimate the variance components is as follows: yit = + x0it + "it + ui; i = 1; 2; :::; N; t = 1; 2; :::; T: (4) 12 and in term of group means, yi = + x0i + "i + ui; i = 1; 2; :::; N; (5) Therefore, taking deviation from the group means removes the heterogeneity: yit yi = (xit xi)0 + "it "i; i = 1; 2; :::; N; t = 1; 2; :::; T; (6) Since E TX t=1 ("it "i)2 # = (T 1) 2"; if is observable (and therefore " is observed), then an unbiased estimator of 2" based on T observations in group i would be ^ 2"(i) = PT t=1("it "i) 2 T 1 : Since must be estimated, we may use the residual from the LSDV estimator (which is consistent and unbiased in general) and correct the degree of freedom to form the estimator: s2e(i) = PT t=1(eit ei) 2 T k 1 : We have N such estimators, so we average them to obtain s2e = 1N NX i=1 s2e(i) = 1N "PT t=1(eit ei) 2 T k 1 # = PN i=1 PT t=1(eit ei) 2 NT Nk N : The degree of freedom correction in s2e is excessive because it assume that and are reestimated for each i. The unbiased estimator would be ^ 2" = s2LSDV = PN i=1 PT t=1(eit ei) 2 NT N K : where ei should be 0. It remains to estimate 2u. Back to the original model: yit = + x0it + "it + ui; i = 1; 2; :::; N; t = 1; 2; :::; T: (7) In spite of the correlation across observation, this is a classical regression model in which the OLS estimation slope and variance estimator are both consistent 13 and, in general, unbiased. Therefore, using the OLS residual from the model with only a single overall constant, we have plim s2Pooled = plim e 0e NT k 1 = 2 " + 2 u: This provides the two estimators needed for the variance components; the second would be ^ 2u = s2Pooled s2LSDV : Example: Example 13.4 of Greene at p.299. 2.3 Husman’s Speci cation Test for the Random E ect Model Fixed E ect Model{Costly in terms of degree of freedom lost. Random E ect Model{little justi cation for treating the individual e ects as un- correlated with other regressors. The speci cation test developed by Hausman (1978) is used to test for or- thogonality of the random e ect and the regressors . Under the null hypothesis of no correlation, both OLS in the LSDV ^ model and GLS in the random e ect ~ model are consistent, but OLS is ine cient,1 whereas under the alternative, OLS is consistent, but GLS is not. Therefore, under the null hypothesis, the two estimates should not di erent systematically, and a test can be based on the di erence. The essential ingredient for the test is the covariance matrix of the di erence vector, [^ ~ ]: V ar[^ ~ ] = V ar[^ ] + V ar[~ ] Cov[^ ; ~ ] Cov[^ ; ~ ]0: Hausman’s essential results is that the covariance of an e cient estimator with its di erence from an ine cient estimator is zero, which implies Cov[(^ ~ ); ~ ] = Cov[^ ; ~ ] V ar[~ ] = 0 (8) 1Referring to the GLS matrix weighted average given earlier, we see that the e cient weight uses , where OLS sets = 1 14 or that Cov[^ ; ~ ] = V ar[~ ]: Inserting this result to (8) produces the required covariance matrix for the test V ar[^ ~ ] = V ar[^ ] V ar[~ ] = : The chi-squared test is based on the Wald criterion: W = [^ ~ ]0^ 1[^ ~ ]: For ^ , we use the estimated covariance matrices of the slope estimator is the LSDV model and estimated covariance matrix in the random e ects model, ex- cluding the constant term. Under the null hypothesis, W is asymptotically dis- tributed as chi-squared with k degree of freedom. Exercise: Reproduce the results of Example 13.5 and Table 13.2 on p. 302 of Greene. 15