Ch. 7 Violations of the Ideal Conditions 1 Speci cation 1.1 Selection of Variables Consider a initial model, which we assume that Y = X1 1 + "; It is not unusual to begin with some formulation and then contemplate adding more variable (regressors) to the model: Y = X1 1 + X2 2 + ": Let R21 be the R-square of the model with fewer regressor, and R212 be the R-square of the model with more regressors. It is apparent as we have shown earlier that R212 > R21. Clearly, it would be possible to push R2 as high as desired by adding regressors. This problem motivates the use of the adjusted R-square, R2 = 1 T 1 T k(1 R 2) It has been suggested that the adjusted R-square does not penalize the loss of degree of freedom heavily, two alternative have been proposed for comparing models are ~R2j = T + kj T Kj (1 R 2 j) and Akaike’s information criterion: AICj = ln e0 jej T + 2kjT = ln ^ 2j + 2kjT : Although intuitively appealing, these measures are a bit unorthodox in that they have no rm basis in theory (unless that are used in time series analysis model). Perhaps a somewhat more palatable alternative is the method of step- wise regression; However, economists have tends to avoid stepwise regression method for the break down of inference procedures. 1 1.2 Omission of Relevant variables Suppose that a correctly speci ed regression model would be Y = X1 1 + X2 2 + "; where the two parts of X have k1 and k2 columns, respectively. If we regress Y on X1 without including X2, that is you have estimate the model Y = X1 1 + "; and obtain the estimator as ^ 1 = (X01X1) 1X01Y = (X01X1) 1X01(X1 1 + X2 2 + ") = 1 + (X01X1) 1X01 2 + (X01X1) 1X01": Taking the expectation, we see that unless X01X2 = 0 or 2 = 0, ^ 1 is biased: E( ^ 1) = 1 + (X01X1) 1X01X2 2: The variance of ^ 1 is V ar( ^ 1) = 2(X01X1) 1: If we had computed the correct regression, including X2, then the slope estimator on X1, denoted by ^ 12 would have a covariance matrix equal to the upper left block of 2(X0X) 1, i.e. V ar(^ ) = V ar(^ 12) V ar(^ 22) = 2(X0X) 1 = 2 X0 1X1 X 0 1X2 X02X1 X02X1 1 = 2 [X0 1X1 X 0 1X2(X 0 2X2) 1X0 2X1] 1 ; or V ar(^ 12) = 2[X01X1 X01X2(X02X2) 1X02X1] 1: We can compare the covariance matrix of ^ 1 and ^ 12 more easily by comparing their inverse: V ar(^ 1) 1 V ar(^ 12) 1 = (1= 2)X01X2(X02X2) 1X02X1; 2 which is nonnegative de nite. We conclude that although ^ 1 is biased, it has a smaller variance than ^ 12. Lemma: Let A be a positive de nite (n n) matrix and let B denote any nonzero (n m) matrix. Then B0AB is nonnegative de nite. Proof: Let x be ant nonzero vector. De ne ~x Bx: Then ~x can be any vector including the zero vector. Then x0B0ABx = ~x0A~x 0 from the positive de niteness of matrix A. For statistical inference, it would be necessary to estimate 2. Proceeding as usual, we would use s2 = e 0 1e1 T k1 : But e1 = M1Y = M1(X1 1 + X2 2 + ") = M1X2 2 + M1": Thus, E[e01e1] = 20X02M1X2 2 + 2tr(M1) = 20X02M1X2 2 + 2(T k1): It is simple to see that 20X02M1X2 2 is positive (how ?) so s2 is biased upward. The conclusion is that if we omit relevant variables from the regression, then our estimate of both 1 and 2 are biased although it is possible that ^ 1 is more precise than ^ 12. 1.3 Inclusion of Irrelevant Variables If the regression model is correct given by Y = X1 1 + "; 3 and we estimate it by Y = X1 1 + X2 2 + "; from partitioned regression estimator, we obtain that ^ 1 = (X01M2X1) 1X01M2Y = (X01M2X1) 1X01M2(X1 1 + ") = 1 + (X01M2X1) 1X01M2"; and ^ 2 = (X02M1X2) 1X02M1Y = (X02M1X2) 1X02M1(X1 1 + ") = 0 + (X02M1X2) 1X02M1": Therefore, E(^ 1) = 1 and E(^ 2) = 0. Exercise: Show that s2 is unbiased: E e0e T k1 k2 = 2: Then what’s the problem ? It would seem that one would generally want to "over t" the model. However the cost is the reduction of the precision of the estimate. As we have seen that the covariance matrix of the shorter regressors in never larger than the covariance matrix for the estimator obtained in the presence of the super uous variables. 2 Functional Form 2.1 Dummy Variables One of the most useful devices in regression analysis is the binary, or dummy variables, which takes value of only 0 and 1. 4 2.1.1 Comparing Two Mean If a model describe the salary-paid function by y = + x0 + "; where can be regard as "initial-pay" to anyone (even) with di erent academic degree. This model can be made more realistic by dividing the "initial-pay" into two category: individuals attending college and not attending college. Formally y = + di + x0 + "; where d i = 1; if attending college di = 0; if not attending college: Logically, > 0 and di is the dummy variable. The above model can also be written equivalently as y = d1i + d2i + x0 + " where d 1i = 1; if attending college d1i = 0; if not attending college and d 2i = 0; if attending college d2i = 1; if not attending college but not y = + d1i + d2i + x0 + "; to avoid dummy trap. Therefore, to remove seasonal e ect, we need 4 dummy without a common mean or use 3 dummy with a common mean (see eq. 7-1 at p. 118). 2.2 Nonlinearity in the Variables The linear model we proposed is not as "limited’ as the rst glance. By using logarithms, exponential, reciprocal, transcendental functions and polynomials, and so on, this "linear model is also useful the general form: g(y) = 1f1(z) + 2f2(z) + ::: + kfk(z) + " = 1x1 + 2x2 + ::: + kxk + " = x0 + ": which can be tailored to any number of situations. 5 2.2.1 Log-Linear Model A commonly used form of regression model is the log-linear model: y = Y k z kk e" or ln y = ln + X k k lnzk + " = 1 + X k kxk + ": All you have to do is take natural logarithms of the data before the regression. 3 Stochastic Regressors This section will consider the linear regression model Y = X + ". It will be assumed that the full ideal conditions hold except that the regressor matrix X is random. 3.1 Independent Stochastic Linear regression Model First of all, consider the case in which X and " are independent. In this case the distribution of " conditional on X is the same as its marginal distribution; speci cally, f("jX) = f(") N(0; 2I) and E("jX) = R "f("jX)d" = 0. We now investigate the statistical properties of OLS estimator under this assumption. 3.1.1 Unbiasedness ? Using laws of iterated expectation, the expected value of ^ is E(^ ) = EXfE[ + (X0X) 1X0"jX]g = EX[ + (X0X) 1X0E("jX)] = EX( ) = : 6 The variance-covariance matrix of ^ is slightly di erent from previous model, however. V ar(^ ) = E[(^ )(^ )0] = EXfE[(X0X) 1X0""0X(X0X) 1jX]g = EXf(X0X) 1X0E[""0jX]X(X0X) 1g = EXf(X0X) 1X0( 2I)(X0X) 1g = 2EX(X0X) 1 = 2E(X0X) 1; provided, of course, that 2E(X0X) 1 exists. The variance-covariance matrix of ^ is 2 times the expected value of (X0X) 1 since (X0X) 1 takes di erent values with new random samples. The OLS estimator of the disturbance variance, s2 = e 0e T k remains unbiased since E(e0e) = EX[E("0M"jX)] = EX( 2(T k)) = 2(T k); therefore E(s2) = 2. 3.1.2 E ciency ? The Gauss-Markov theorem can be established logically from the results of the preceding paragraph. We have showed that V ar(^ jX) < V ar(~ jX) for ant ^ 6= ~ and for the speci c X in our sample. But if this inequality holds for every particular X, then it must hold for V ar(^ ) = EX[V ar(^ jX)]: That is, if it holds for every particular X, then it must hold over the average value of X. Theorem: (Gauss-Markov Theorem with stochastic Regressors) In the classical linear regression model, the least squares estimator ^ is the min- imum variance linear unbiased estimator of whether X is stochastic or non- stochastic. 7 3.1.3 Consistency ? From notation X = 2 66 66 66 4 X01 X02 : : : X0T 3 77 77 77 5 ; then 1 T X 0X = 1 T TX t=1 XtX0t: If we assume that plim 1T X0X = plim 1T TX t=1 XtX0t = Q is nite and nonsingular, then by law of large number, Q = E(XtX0t), that is the second moments of the regressors is nite (this assumptions is violated when X is a "I(1)" or an unit root process). The independence assumption imply that plim(X0") = 0. This follows from the fact that E(X0"T ) = 0 and E X0" T X0" T 0 = 2 T E(X0X) T = 2 T E(PTt=1 XtX0t) T ! = 2 T PT t=1 E(XtX 0 t) T ! = 2 T TQ T = 2 T Q: lim T!1 E X0" T X0" T 0 = lim T!1 2 T Q = 0: But the fact that E(X0"T ) = 0 and limT!1 E(X0"T )(X0"T )0 = 0 imply that plimX0"T = 0. Recall that ^ = (X0X) 1X0Y = + (X0X) 1X0" = + (X0X T ) 1X 0" T ; 8 therefore plim ^ = + Q 1plim X 0" T = : Remark: To prove the consistency of OLS under stochastic regressor, we need to show that plim(X0") = 0. However to show that plim(X0") = 0 we only require that E(X0") = PTt=1 E(Xt"t) = 0 or E(Xti"t) = 0; i = 1; 2; :::; k. That is each regressor is uncorrelated with the disturbance at time t. i.e. not contemporaneous correlation. There are three common circumstance that one of the stochastic regressor at time t is correlated that with the disturbance terms of the same period, i.e. E(Xti"t) 6= 0: the lagged dependent variables with serial correlation disturbance, unobservable model and simultaneous equation model. Under these circumstance, The OLS is not a consistent estimator and the InstrumentalV ariables (IV) estimators is proposed to encounter this problems. 3.1.4 Distribution of the Estimators Since ^ = + (X0X) 1X0"; The distribution of ^ therefore depend on the stochastic properties of X. It is pessimistic to say that the usual test of hypothesis will not be valid. However as we will see in the following , it is not the case. 3.1.5 Hypothesis Test We now consider the validity of our sample test statistics and inference procedures when X is stochastic. First consider the conventional t test statistics for testing H0 : i = 0i . Under the null hypothesis tjX = ( ^ i 0i ) [s2(X0X) 1ii ]1=2 tT k: However, what interest us is the marginal, that is the unconditional distribution of t. Remember that if W t(n) then the density function of W would be: f(w; n) = 1p(n ) n+1 2 n2 1 1 + w2n [(n+1)=2] n > 0; w 2R: 9 Therefore we see that the distribution f(tjx) of the random variables (tjX) is not a function of X. Let g(x) be the density function of X, and the joint pdf of X and t are f(t;x) = f(tjx)g(x); therefore the marginal density of t are f(t) = Z f(t;x)dx = Z f(tjx)g(x)dx since f(tjx) is not a functionof X = f(tjx) Z g(x)dx = f(tjx): We have the surprising results that, regardless of the distribution of X, or even of whether X is stochastic or nonstochastic, the marginal distribution of t (Statis- tics) is still t (distribution). The same reason can be used to deduce that the usual F ratio used for testing linear restrictions are valid whether X is stochastic or not. Remark: This conclusion only happens in the assumption that the disturbance is normally distributed or we can deduce that ^ is asymptotically normal. 4 Non-Normal Disturbance In this section we will suppose that all of the ideal conditions hold except that the disturbances are not normally distributed. In particular, we still suppose that the disturbance "t are independent and identically distributed with zero mean and nite variance 2; however, we no longer suppose that their distribution is normal. 4.1 Unbiasedness, E ciency, and consistency? It is easy to show the OLS properties in the following. Theorem: ^ is unbiased, BLUE, consistent, and has covariance matrix 2(X0X) 1; s2 is unbiased and consistent. 10 4.2 Hypothesis testing Since " is not normally distributed, ^ is not normally distributed and therefore (T k)s2= 2 doesn’t have a 2 distribution. Theorem: The test hypothesis developed in section 4 of Chapter 6 are not valid when " is not normally distributed. 4.3 Asymptotically Normality The following results show that the usual test procedures are asymptotically justi ed whether or not the disturbance are normal. Lemma: Assume that "t; t = 1; 2; :::; T are independent and identically distributed with mean 0 and variance 2 < 1, and limT!1((X0X) 1=T) = Q, a nite positive de nite matrix, then 1 pT X0" d ! N(0; 2Q): Proof: 1 pT X0" = p T( w E( w)); where w = 1T TX t=1 Xt"t is the average of T independent random vector Xt"t with mean E( w) = 0 and variance V ar(Xt"t) = 2XtX0t = 2Qt: The variance of pT( w) is 2QT = 2 1 T [Q1 + Q2 + ::: + QT ] = 2 1 T TX t=1 XtX0t = 2 X0X T : 11 Assume that limT!1((X0X) 1=T) = Q, a positive de nite matrix, then lim T!1 2QT = 2Q: By Lindberg-Feller CLT to the vector pT w, we have the results. Theorem: The asymptotic distribution ofpT(^ ) is N(0; 2Q 1), where Q = limT!1(X0XT ). Proof: Note that ^ = + (X0X) 1X0" = + (X0X T ) 1X 0" T ; (^ ) = (X 0X T ) 1X 0" T pT(^ ) = (X0X T ) 1pT X 0" T = ( X0X T ) 1 1p T X 0": Then as T !1, pT(^ ) d ! Q 1N(0; 2Q) N(0; 2Q 1QQ 1) = N(0; 2Q 1): The above results shows that ^ has the same asymptotic distribution whether or not the distribution are normal, as long as they are iid with mean zero and nite variance. The following results show that this implies that the usual t tests are asymptotically valid; the t distribution is of course replaced by the N(0; 1) distribution, however. Theorem: Suppose that the element of " are iid with zero mean and nite variance, and 12 that Q is nite and nonsingular. Let R be a known 1 k vector, and r be a known scalar. Then under the null hypothesis that R = r, the test statistics R^ rp s2R(X0X) 1R0 N(0; 1): Proof: Under the null hypothesis, R^ r = R(^ ), so the test statistic is pTR(^ ) ps2R(X0X=T) 1R0: Now, since pT(^ ) d ! N(0; 2Q 1), it is clear that p TR(^ ) d ! N(0; 2RQ 1R0): Also, the denominator of the test statistics has probability limit q 2RQ 1R0; since s2 ! 2 and ((X0X=T) 1 ! Q 1. But this is a scalar which is in fact the standard deviation of the asymptotic distribution of the numerator random vari- able pTR(^ ). Hence the test statistics does indeed converge asymptotically to N(0; 1). 5 Instrumental Variables Estimator We again consider the model Y = X + ". However, we will now be concerned with the case in which the regressor matrix X is random and that plim 1T X0" 6= 0; so that the OLS estimator ^ = (X0X) 1X0Y is inconsistent. Example: (Lagged Dependent variables with autocorrelated disturbance) Let the regression be yt = yt 1 + "t; "t = "t 1 + ut: 13 In this model the regressor and the disturbance are correlated: Cov(yt 1; "t) = Cov(yt 1; "t 1 + ut) = Cov(yT 1; "t 1) = 2 u (1 )(1 2): Since plim^ = + Cov(yt 1; "t)var(y t) ; therefore, plim^ = + (1 2) 1 + : Theorem: Suppose that there exists a set of variables Zt such that QZX = plim 1T Z0X is nite and nonsingular, and such that Z0"p T converge in distribution to N(0; ). Then the estimator ~ = (Z0X) 1Z0Y is consistent, and the asymptotic distribution of pT(~ ) is N(0; Q 1ZX Q 1XZ). Proof: Note that ~ = + Z0X T 1 Z0" T : Since Z0"p T has a well-de ned asymptotic distribution, it follows that plim Z 0" T = 0: 14 Hence plim ~ = . To prove the second part of the theorem, note that pT(~ ) = Z0X T 1 Z0" pT : The result follows immediately. De nition: The above estimator ~ is the instrumental variables (IV)estimator of ; The ma- trix Z is the set of instruments for X. Comment: The typical case is the one in which Z0"p T has asymptotic distribution N(0; 2QZZ), where QZZ = plim 1T Z0Z: If Z contains more variable than X, we may chose the projection of the column of X in the column space of Z: ^X = Z(Z0Z) 1Z0X: With the choice of instrumental variable, ^X for Z we have ~ = ( ^X0X) 1 ^X0Y = [X0Z(Z0Z) 1Z0X] 1X0Z(Z0Z) 1Z0Y: 15