国立中山大学：计量经济学（英文版）：Chapter 6_Linear Model

分类：经济格式：pdf 日期：2006年06月26日

Ch. 6 The Linear Model Under Ideal Conditions The (multiple) linear model is used to study the relationship between a dependent variable (Y) and several independent variables (X1,X2,...,Xk). That is Y = f(X1,X2,...,Xk) + ε assume linear function = β1X1 + β2X2 + ... + βkXk + ε = x′β + ε where Y is the dependent or explained variable, x = [X1 X2.....;Xk]′ are the independent or the explanatory variables and β = [β1 β2..... βk]′ are unknown coefficients that we are interested in learning about, either through estimation or through hypothesis testing. The term ε is an unobservable random disturbance. Suppose we have a sample of size T (allowing for non-random) observations 1 on the scalar dependent variable Yt and the vector of explanatory variables xt = (Xt1,Xt2,...,Xtk)′, i.e. Yt = x′tβ + εt, t = 1,2,...,T. In matrix form, this relationship is written as y = ? ?? ?? ?? ? Y1 Y2 . . . YT ? ?? ?? ?? ? = ? ?? ?? ?? ? X11 X12 . . . X1k X21 X22 . . . X2T . . . . . . . . . . . . . . . . . . XT1 XT2 . . . XTk ? ?? ?? ?? ? ? ?? ?? ?? ? β1 β2 . . . βk ? ?? ?? ?? ? + ? ?? ?? ?? ? ε1 ε2 . . . εT ? ?? ?? ?? ? = ? ?? ?? ?? ? x′1 x′2 . . . x′T ? ?? ?? ?? ? ? ?? ?? ?? ? β1 β2 . . . βk ? ?? ?? ?? ? +ε = Xβ +ε, where y is T × 1 vector, X is an T ×k matrix with rows x′t and ε is an T × 1 vector with element εt. 1Recall from Chapter 2 that we cannot postulate the probability model Φ if the sample is non-random. The probability model must be defined in terms of their sample joint distribution. 1 Our goal is to regard last equation as a parametric probability and sampling model, and try to inference the unknown βi’s and the parameters contained in ε. 1 The Probability Model: Gauss Linear Model Assume that ε ～ N(0,Σ), if X are not stochastic, then by results from ”func- tions of random variables” (n ? n transformation) we have y ～ N(Xβ, Σ). That is, we have specified a probability and sampling model for y to be (Probability and Sampling Model) y ～ N ? ?? ?? ?? ? ? ?? ?? ?? ? X11 X12 . . . X1k X21 X22 . . . X2T . . . . . . . . . . . . . . . . . . XT1 XT2 . . . XTk ? ?? ?? ?? ? ? ?? ?? ?? ? β1 β2 . . . βk ? ?? ?? ?? ? , ? ?? ?? ?? ? σ21 σ12 . . . σ1T σ21 σ22 . . . σ2T . . . . . . . . . . . . . . . . . . σT1 σT2 . . . σ2T ? ?? ?? ?? ? ? ?? ?? ?? ? ≡ N(Xβ,Σ), That is the sample joint density function is f(y;θ) = (2pi)?T/2|Σ|?1/2 exp(?1/2)(y?Xβ)′Σ?1(y?Xβ), where θ = (β1,β2,...,βk,σ21,σ12,...,σ2T)′. It is easily seen that the number of pa- rameters in θ is large than the sample size, T. Therefore, some restrictions must be imposed in the probability and sampling model for the purpose of estimation as we shall see in the subsequence. One kind of restriction on θ is that Σ is a scalar matrix, then maximize the likelihood of the sample model f(θ;x) (w.r.t. β) is equivalent to minimize the equation (y?Xβ)′(y?Xβ) (=ε′ε = summationtextTt=1 ε2t, a sums of squared residuals), this constitutes the foundation of ordinary least square estimation. To generalize the discussions so far, we have made the following assumptions that (a) The model y = Xβ +ε is correct; (no problem of model misspecification) 2 (b) X is nonstochastic; (therefore, regression comes first from experimental sci- ence) (c) E(ε) = 0; (can easily be satisfied by adding a constant in the regression) (d) Var(ε) = E(εε′) = σ2·I; (disturbance have same variance and are not auto- correlated) (e) Rank(X) = k; (for model identification) (f) ε is normal distributed. The above six assumptions are usually called the classical ordinary least squares assumption or the ideal conditions. 2 Estimation: Ordinary Least Squares Estima- tor 2.1 Estimation of β Let us first consider the ordinary least square estimator (OLS) which is the value for β that minimizes the sum of squared errors denoted as SSE (or residuals, remember the principal of estimation at Ch. 3) SSE(β) = (y?Xβ)′(y?Xβ) = Tsummationdisplay t=1 (yt ?x′tβ)2 = y′y?2y′Xβ +β′X′Xβ. The first order conditions for a minimum are ?SSE(β) ?β = ?2X ′y + 2X′Xβ = 0. If X′X is nonsingular (which is satisfied by the assumption (e) of ideal condition and Ch.1 Sec. 3.5), the system of k equations in k unknown can be uniquely solved for the ordinary least squares (OLS) estimator ?β = (X′X)?1X′y = bracketleftBigg Tsummationdisplay t=1 xtx′t bracketrightBigg?1 Tsummationdisplay t=1 x′tyt. (1) 3 To ensure that ?β is indeed a solution of minimization, we require that ?2SSE(β) ?β?β′ = 2X ′X must be a positive definite matrix. This condition is satisfied by assumption 5 and Ch1. Sec. 5.6.1. Denote the T ×1 vector e, of least squares residual be e = y?X?β, then it is obvious that X′e = X′(y?X?β) = X′y?X′X(X′X)?1X′y = 0, (2) i.e., the regressors is orthogonal to the OLS residual. Therefore, if one of the regressors is a constant term, the sum of the residuals is zero since the first element of X′e would be bracketleftbig 1 1 . . . 1 bracketrightbig ? ?? ?? ?? ? e1 e2 . . . eT ? ?? ?? ?? ? = Tsummationdisplay t=1 et = 0. (a scalar) 2.2 Estimation of σ2 At this moment, we arrive at the following notation: y = Xβ +ε = X?β +e. To estimate the variance of ε, σ2, a simple and intuitive idea is that to use infor- mation from sample e. 4 Lemma: The matrix MX = I?X(X′X)?1X′ is symmetric and idempotent . Furthermore, MXX = 0. Lemma: e = MXy = MXε. That is we can interpret MX as a matrix that produces the vector of least square residuals in the regression of y on X. Proof: e = y?X?β = y?X(X′X)?1X′y = (I?X(X′X)?1X′)y = MXy = MXXβ +MXε = MXε. Using the fact that MX is symmetric and idempotent we have Lemma: e′e = ε′M′XMXε = ε′MXε. Theorem 1: E(e′e) = σ2(T ?k). Proof: E(e′e) = E(ε′MXε) = E[trace (ε′MXε)] (since ε′MXε, is a scalar, equals its trace) = E[trace (MXεε′)] = trace E(MXεε′)] (Why ?) = trace (MXσ2IT) = σ2 trace (MX), 5 but trace MX = trace (IT)?trace (X(X′X)?1X′) = trace (IT)?trace ((X′X)?1X′X) = T ?k. Corollary: An unbiased estimator of σ2 is s2 = e ′e T ?k. Exercise: Reproduce the estimate results at Table 4.2 p. 52, for ?β, s2(X′X)?1 and e′e. 2.3 Partitioned Regression Estimation It is common to specify a multiple regression model, when in fact, interest centers on only one of a subset of the full set of variables. Let k1 +k2 = k we can express the OLS result in isolation as y = X?β +e = bracketleftbig X1 X2 bracketrightbig bracketleftbigg ?β 1? β2 bracketrightbigg +e = X1?β1 +X2?β2 +e, where X1 and X2 are T ×k1 and T ×k2, respectively; ?β1 and ?β2 are k1 ×1 and k2 ×1, respectively. What is the algebraic solution for ?β2 ? Denote M1 = I?X1(X′1X1)?1X′1, then M1y = M1X1?β1 +M1X2?β2 +M1e = M1X2?β2 +e, 6 using the fact that M1X1 = 0 and M1e = e. Multiplying X′2 on the above equation and using the fact that X′e = bracketleftbigg X′ 1 X′2 bracketrightbigg e = bracketleftbigg X′ 1e X′2e bracketrightbigg = 0 we have X′2M1y = X′2M1X2?β2 +X′2e = X′2M1X2?β2. Therefore ?β2 can be expressed in isolation as ?β2 = (X′2M1X2)?1X′2M1y = (X?′2X?2)?1X?′2y?, where X?′2 = M1X2 and y? = M1y, are vectors of residual from the regression of X2 and y on X1, respectively. Theorem 2 (Frisch-Waugh): The subvector ?β2 is the set of coefficients obtained when the residuals from a regression of y on X1 alone are regressed on the set of residuals obtained when each column of X2 is regressed on X1. Example: Consider a simple regression with a constant, then the slope estimator can also be obtained from a data-demeaned regression without constant. 2.4 The Restricted Least Squares Estimators Suppose that we explicitly imposes the restrictions of the hypothesis in the re- gression (take the example of LM test). The restricted least square estimator is obtained as the solution to Minimizeβ SSE(β) = (y?Xβ)′(y?Xβ) subject to Rβ = q, where R is a known J ×k matrix and q is values of these linear restrictions. 7 A Lagrangean function for this problem can be written L?(β,λ) = (y?Xβ)′(y?Xβ)?2λ′(Rβ?q), where λ is J ×1. The solutions ?β? and ?λ will satisfy the necessary conditions ?L? ? ?β? = ?2X ′(y?X?β?)?2R′?λ = 0, ?L? ??λ = 2(R ?β? ?q) = 0 (remember ?a′x ?x = a) Dividing through by 2 and expanding terms produces the partitioned matrix equation bracketleftbigg X′X R′ R 0 bracketrightbiggbracketleftBigg ?β? ?λ bracketrightBigg = bracketleftbigg X′y q bracketrightbigg , or W?d? = v. Assuming that the partitioned matrix in brackets is nonsingular, then ?d? = W?1v. Using the partition inverse rule of bracketleftbigg A 11 A12 A21 A22 bracketrightbigg?1 = bracketleftbigg A?1 11(I+A12F2A21A ?1 11) ?A ?1 11A12F2 ?F2A21A?111 F2 bracketrightbigg , where F2 = (A22 ?A21A?111A12)?1, we have the restricted least squared estimator ?β? = ?β?(X′X)?1R′[R(X′X)?1R′]?1(R?β?q), and ?λ = [R(X′X)?1R′]?1(R?β?q). 8 Exercise: Show that Var( ?β?)?Var(?β) is a nonpositive definite matrix. The above result of exercise holds whether or not the restriction are true. One way to interpret this reduction in variance is as the value of the information contained in the restriction. See Table 6.2 at p. 103. Let e? equal y?X ?β?, i.e., the residuals vector from the restricted least square estimator, then using the familiar device, e? = y?X?β?X( ?β? ? ?β) = e?X( ?β? ? ?β). The ’restricted’ sums of squared residuals is e′?e? = e′e+ ( ?β? ? ?β)′X′X( ?β? ? ?β) ≥ e′e (3) since X′X is a positive definite matrix. 2.5 Measurement of Goodness of Fit Denote the dependent variable’s ’fitted value’ from dependent variables and OLS estimator, ?y, to be ?y = X?β, that is y = ?y +e. Lemma: e′e = y′y??y′?y. Proof: Using the fact that X′y = X′X?β, we have e′e = y′y?2?β′X′y + ?β′X′X?β = y′y??y′?y. Three measurements of variation are defined as following: (a). SST (Sums of Squared Total variation)= summationtextTt=1(Yt ? ˉY)2 = y′M0y, (b). SSR (Sums of Squared Regression variation)=summationtextTt=1(?Yt ? ˉ?Y)2 = ?y′M0?y, 9 (c). SSE (Sums of Squared Error variation)=summationtextTt=1(Yt ? ?Yt)2 = e′e, where ˉY = 1T summationtextTt=1 Yt and ˉ?Y = 1T summationtextTt=1 ?Yt. Lemma: If one of the regressor is a constant, then ˉY = ˉ?Y. Proof: Writing y = ?y +e = Xβ +e = bracketleftbig i X2 bracketrightbig bracketleftbigg ?β 1 ?β2 bracketrightbigg +e = i?β1 +X2?β2 +e, where i is a column of ones, and using the fact that i′e = 0 we obtain the results. Lemma: If one of the regressor is a constant, then SST = SSR + SSE. Proof: Multiplying M0 on y = ?y +e we have M0y = M0?y +M0e = M0?y +e, since M0e = e (why ?). Therefore, y′M0y = ?y′M′0?y+ 2?y′M′0e+e′e = ?y′M′0?y+e′e = SSR + SSE, using the fact that ?y′M′0e = ?β′X′e = 0 Definition: If one of the regressor is a constant, the coefficient of determination is defined as R2 = SSRSST = 1? SSESST . From (3) we know that e′?e? ≥ e′e. One kind of restriction is of the form that Rβ = 0, and we may think it as a model with fewer regressors (but with the same dependent variable). It is apparent that the coefficient of determination from this restricted model, say R2? is smaller. 10 (Thus the R2 in the longer regression cannot be smaller.) It is tempting to exploit this result by just adding variables to the model; R2 will continue to rise to its limit. In view of this result, we sometimes report an adjusted R2, which is computed as follow. ˉR2 = 1? e′e/(T ?k) y′M0y/(T ?1). 11 3 Statistical Properties of the OLS Estimators We now investigate the statistical properties of the estimator of parameters, ?β and s2 from OLS. 3.1 Finite Sample Properties 3.1.1 Unbiasedness Based on the six classical assumptions, the expected value of ?β and s2 are E(?β) = E[(X′X)?1(X′y)] = E[(X′X)?1(X′(Xβ + ε))] = E[β + (X′X)?1X′ε] = β + (X′X)?1X′E(ε) = β, (using Assumption (b) and (c).) and by construction E(s2) = E(e ′e T ?k = (T ?k)σ2 T ?k = σ 2. Therefore both ?β and s2 are unbiased estimators. 3.1.2 Efficiency To investigate the efficiency of these two estimators, we first show their variance- covariance. The variance-covariance matrix of ?β is Var(?β) = E[(?β?β)(?β?β)′] = E[(X′X)?1X′εε′X(X′X)?1] = (X′X)?1X′E[εε′]X(X′X)?1 = (X′X)?1X′(σ2I)(X′X)?1 = σ2(X′X)?1. (using Assumption (b) and (d).) With assumption (f) and using idempotent quadratic form, we have e′e σ2 = ε′MXε σ2 ～ χ 2 (T?k), 12 that is Var parenleftbigge′e σ2 parenrightbigg = 2(T ?k) or Var(e′e) = 2(T ?k)σ4. The variance of s2 (= e′eT?k) is therefore Var parenleftbigg e′e T ?k parenrightbigg = 2σ 4 T ?k. Theorem: (Gauss-Markov) The OLS estimator ?β is the best linear unbiased estimator (BLUE) of β. Proof: Consider any estimator linear in y, say ?β = Cy. Let C = (X′X)?1X′ +D. Then E(?β) = E[((X′X)?1X′ +D)(Xβ +ε)] = β +DXβ, so that for ?β to be unbiased we require DX = 0. Then the covariance matrix of ?β is E[(?β?β)(?β?β)′] = E[(X′X)?1X′ +D]εε′[X(X′X)?1 +D′] = σ2[(X′X)?1X′IX(X′X)?1 +DIX(X′X)?1 + (X′X)?1X′ID′ +DID′] = σ2(X′X)?1 + σ2DD′, since DX = 0. Since DD′ is a positive semidefinite matrix (see Ch1, p.20), which shows that the covariance matrix of ?β equals the covariance matrix of ?β plus a positive semidef- inite matrix. Hence ?β is efficient relative to any other linear unbiased estimator of β. In fact we can go a step further in the discussion of the efficiency of ?β. Theorem: Let the linear regression y = Xβ + ε satisfy classical assumptions, then the Cram′er-Rao lower bounds for the unbiased estimator of β and σ2 are σ2(X′X)?1 13 and 2σ4/T, respectively. Proof: The log-likelihood is lnL(β,σ2;y) = ?T2 ln(2piσ2)? 12σ2(y?Xβ)′(y?Xβ). Therefore, ?L ?β = 1 σ2(X ′y?X′Xβ) = 1 σ2X ′(y?Xβ), ?L ?σ2 = T 2σ2 + 1 2σ4(y?Xβ) ′(y?Xβ), ?2L ?β?β′ = ? 1 σ2X ′X; ?E bracketleftbigg ?2L ?β?β′ bracketrightbigg = X ′X σ2 ; ?2L ?(σ2)2 = T 2σ4 ? 1 σ6(y?Xβ) ′(y?Xβ); ?E bracketleftbigg ?2L ?(σ2)2 bracketrightbigg = T2σ4 (how ?); ?2L ?β?σ2 = ? 1 σ4X ′(y?Xβ); ?E bracketleftbigg ?2L ?β?σ2 bracketrightbigg = 0. Therefore, the information matrix is IT(β,σ2) = bracketleftbigg X′X σ2 00 T 2σ4 bracketrightbigg , in turn, the Cram′er-Rao lower bounds for the unbiased estimator of β and σ2 are σ2(X′X)?1 and 2σ4/T. From above theorem, the OLS ?β is an absolutely efficient estimator while s2 is not. However, it can be shown that s2 is indeed minimum variance unbiased efficient through the alternative approach of complete, sufficient statistics. See for example, Schmidt (1976), p.14. 3.1.3 Distribution (exact) of ?β and s2 We now investigate the finite sample distribution of the OLS estimators. 14 Theorem: ?β has a multivariate normal distribution with mean β and covariance matrix σ2(X′X)?1. Proof: By assumption (c),(d) and (f), we know that ε ～ N(0,σ2I). Therefore by the results from linear function of a normal vector (Ch 2, p.27) we have β + (X′X)?1X′ε ～ N(β,(X′X)?1X′σ2IX(X′X)?1), or ?β ～ N(β,σ2(X′X)?1). Theorem: s2 is distributed as a χ2 distribution multiplied by a constant, s2 ～ σ 2 (T ?k) ·χ 2 T?k. Proof: As we have shown that e′eσ2 ～ χ2(T?k), this result follows immediately. 3.1.4 Independence of ?β and s2 Lemma: Let Q be a symmetric, idempotent T×T matrix, B be an m×T matrix such that BQ = 0, and ε ～ N(0,σ2IT). Then Bε and ε′Qε are distributed independently. Proof: See section 7.2.4 of Chapter 2. 15 Theorem: ?β and s2 are independent. Proof: s2 = ε′MXε/(T ?k) and ?β ?β = (X′X)?1X′ε. Since (X′X)?1X′MX = 0, the above lemma implies that ?β and s2 are independent. 3.2 Asymptotic Properties 3.2.1 Consistency We now investigate the properties of the OLS estimators when the sample go to infinity T →∞. Theorem: The OLS estimator ?β is consistent. Proof: Denote limT→∞(X′X/T) = limT→∞(summationtextTt=1 x′txt)/T by Q and assume that Q is finite and nonsingular. (What does it mean ?) Then limT→∞(X′X/T)?1 is also finite. Therefore lim T→∞ (X′X)?1 = lim T→∞ 1 T parenleftbiggX′X T parenrightbigg?1 = lim T→∞ 1 TQ ?1 = 0. Since ?β is unbiased and its covariance matrix, σ2(X′X)?1, vanishes asymptoti- cally, it converge in probability to β and must be consistent. Alternative proof: Note that ?β = (X′X)?1X′y = β + (X′X)?1X′ε = β + parenleftbiggX′X T parenrightbigg?1 X′ε T . 16 Since E(X′εT ) = 0. Also E(X′εT )(X′εT )′ = σ2T (X′XT ), so that lim T→∞ E parenleftbiggX′ε T parenrightbiggparenleftbiggX′ε T parenrightbigg′ = lim T→∞ σ2 T Q = 0. But the fact that E(X′εT ) = 0 and limT→∞E(X′εT )(X′εT )′ = 0 imply that plimX′εT = 0. Therefore plim ?β = β + Q?1plim X ′ε T = β. 3.2.2 Asymptotically Normality Since by assumption, X′X is Op(T), therefore (X′X)?1 → 0. To express the limiting distribution of ?β, we need the following theorem. Theorem: The asymptotic distribution of√T(?β?β) isN(0,σ2Q?1), whereQ = limT→∞(X′XT ). Proof: For any sample size T, the distribution of √T(?β ?β) is N(0,σ2(X′XT )?1). The above limiting results is therefore trivial. Theorem: The asymptotic distribution of √T(s2 ?σ2) is N(0,2σ4). Proof: Since the distribution of e′e/σ2 is χ2 with (T ?k) degree of freedom. Therefore e′e σ2 = T?ksummationdisplay t=1 v2t, where the v2t are i.i.d. χ2 with one degree of freedom. But this is a sum of i.i.d. with mean 1 and variance 2; according to the Lindberg-Levy central limit theorem it follows that 1√ T ?k T?ksummationdisplay t=1 parenleftbiggv2 t ?1√ 2 parenrightbigg L?→ N(0,1). 17 But this is equivalent to saying that 1√ T ?k parenleftbigge′e σ2 ?(T ?k) parenrightbigg L?→ N(0,2), or that √T ?k(s2 ?σ2) L?→ N(0,2σ4), or that √T(s2 ?σ2) L?→ N(0,2σ4). From above results, we find that although variance of s2 does not attains Cram′er-Rao lower bound in finite sample, however it does in large sample. Theorem: s2 is asymptotically efficient. Proof: The asymptotic variance of s2 is 2σ4/T, which equals the Cram′er-Rao lower bound. 18 4 Hypothesis Testing 4.1 Tests of a Single Linear Restriction on β: Tests based on t Distribution Lemma: Let R be a 1×k vector, and define s? by s? = radicalbig s2R(X′X)?1R′, then R(?β?β)/s? has a t distribution with (T ?k) degrees of freedom. Proof: Clearly R(?β ? β) is a scalar random variable with zero mean and variance σ2R(X′X)?1R′; call this variance σ2?. Then R(?β?β)σ? ～ N(0,1), but this test statistics is not a pivot sine it contains the unknown parameter σ. We need some transformation of this statistics to remove the parameter. We know that (T ?k)s2/σ2 ～ χ2T?k, therefore, s2? σ2? = s2 σ2 ～ χ 2 T?k/(T ?k). Finally, then, R(?β?β) σ? = R(?β?β)/σ?radicalbig s2?/σ2? = N(0,1)radicalBig χ2T?k/(T ?k) ～ tT?k. The above results is established upon the numerator and denominator being in- dependent. This condition is shown to be true at 3.1.4. of this Chapter. Theorem (Test of a single linear restriction on β): Let R be a known 1×k vector, and r be a known scalar. Then under the null hypothesis that Rβ = r, the test statistics R?β?r s? ～ tT?k. 19 Proof: Under the null hypothesis, R?β?r s? = R?β?Rβ s? ～ tT?k. Corollary (Test of significance of βi): Let s?βi = radicalBig s2(X′X)?1ii , then under the null hypothesis that βi = 0, the test statistics ?βi s?βi ～ tT?k. Proof: This is a special case of last Theorem, with r = 0 and R being a vector of zeros except for a one in the i?th position. Example: Reproduce all the results at Greene 5-th. ed., p. 103, Table 6.2. 4.2 Tests of Several Linear Restriction on β: Tests based on F Distribution Theorem: Let R be a known matrix of dimension m × k and rank m, q a known m × 1 vector. Then under the null hypothesis that Rβ = q, the statistics (R?β?q)′[R(X′X)?1R′]?1(R?β?q)/m e′e/(T ?k) = (R ?β?q)′[s2R(X′X)?1R′]?1(R?β?q) m ～ Fm,T?k. 20 Proof: From the liner function of a normal vector, we have R?β ～ N(Rβ,σ2R(X′X)?1R′). Further by the quadratic form of normal vector (Sec. 6.2.2 of Ch. 2) we have (R?β?Rβ)′[σ2R(X′X)?1R′]?1(R?β?Rβ) ～ χ2m. Then under the null hypothesis that Rβ = q, the test statistics (R?β?q)′[σ2R(X′X)?1R′]?1(R?β?q) ～ χ2m. (4) However, this test statistics in (4) is not a pivot sine it contains the unknown parameter σ2. We need some transformation of this statistics to remove the pa- rameter as in the single test. Recall that (T ?k)s2/σ2 = e′e/σ2 ～ χ2T?k, therefore the numerator and the denominator of the statistics in the following is trying to remove out the unknown parameter σ2 from (4) such that (R?β?q)′[σ2R(X′X)?1R′]?1(R?β?q)/m (T ?k)s2/σ2(T ?k) = (R?β?q)′[s2R(X′X)?1R′]?1(R?β?q) m (5) ≡ (R?β?q)′[σ2R(X′X)?1R′]?1(R?β?q)/m e′e/σ2(T ?k) = (R?β?q)′[R(X′X)?1R′]?1(R?β?q)/m e′e/(T ?k) , (6) which are distributed as χ2m/m and χ2T?k/(T ?k), respectively. This statistics in (5) and (6) are distributed as a Fm,T?k if this two χ2 are independent. Indeed, it is the case as can be proven in the same line as the single test. Exercise: Show the two χ2 are independent in the last Theorem. 21 4.2.1 Tests of Several Linear Restriction on β from restricted least squares estimator Recalling that ?β? = ?β?(X′X)?1R′[R(X′X)?1R′]?1(R?β?q) and e′?e? = e′e+ ( ?β? ? ?β)′X′X( ?β? ? ?β), where ?β? and e? are estimators and residuals from the restricted least squares errors. We finds that e′?e? ?e′e = ( ?β? ? ?β)′X′X( ?β? ? ?β) = (R?β?q)′[R(X′X)?1R′]?1R(X′X)?1X′X(X′X)?1R′[R(X′X)?1R′]?1(R?β?q) = (R?β?q)′[R(X′X)?1R′]?1(R?β?q). Therefore, under the null hypothesis that Rβ = q we have the third statistics from (6) that would also distributed as Fm,T?k, that is (R?β?q)′[R(X′X)?1R′]?1(R?β?q)/m e′e/(T ?k) = (e′?e? ?e′e)/m e′e/(T ?k) ～ Fm,T?k, (7) or the fourth statistics from (7) ( e′?e?y′M0y ? e′ey′M0y)/m ( e′ey′M0y)/(T ?k) = (R2 ?R2?)/m (1?R2)/(T ?k) ～ Fm,T?k, (8) where R2? is the R-square under the restriction estimation. Corollary (Test of the significance of a regression): If all the slope’s coefficients (except for constant term) are zero, then R is (k ? 1)×k (m = k?1), q = 0. Under this circumstance, R2? = 0. The test statistics to test the significance of the regression that H0 : Rβ = 0 is therefore from (8) that under the null hypothesis R2/(k ?1) (1?R2)/(T ?k) ～ Fk?1,T?k. 22 Exercise: Using the above four F ? Ratio to compute the test statistics F = 109.84 at Greene 5-th. Ed. P. 99. 4.2.2 Tests of Structural Change One of the more common applications of the F tests is in tests of structural change. In specifying a regression model, we assume that its assumptions apply to all the observations in our sample. It is straightforward, however, to test the hypothesis that some or all of the regression coefficients are different in different subsets of the data. Theorem (Chow Test): Suppose that one has T1 observations on a regression equation y1 = X1β1 + ε1, and T2 observations on another regression equation y2 = X2β2 + ε2. Suppose that X1 and X2 are made up of k regressors. Let SSE1 denote the sum of squared errors in the regression of y1 on X1 and SSE2 denote the sum of squared errors in the regression of y2 on X2. Finally let the ”joint regression” equation be bracketleftbigg y 1 y2 bracketrightbigg = bracketleftbigg X 1 X2 bracketrightbigg β + bracketleftbigg ε 1 ε2 bracketrightbigg (9) and SSE be the sum of squared errors in the joint regression. Then under the null hypothesis that β1 = β2 and that ε = bracketleftbigg ε 1 ε2 bracketrightbigg is distributed as N(0,σ2IT), the statistics (SSE ?SSE1 ?SSE2)/k (SSE1 + SSE2)/(T ?2k) is distributed as Fk,T?2k. 23 Proof: The ’separated regression’ model can be written as bracketleftbigg y 1 y2 bracketrightbigg = bracketleftbigg X 1 0 0 X2 bracketrightbiggbracketleftbigg β 1 β2 bracketrightbigg + bracketleftbigg ε 1 ε2 bracketrightbigg = Xβ + ε. (10) The OLS of the separated model is therefore bracketleftbigg ?β 1? β2 bracketrightbigg = ?β = (X′X)?1X′y = bracketleftbigg X′ 1X1 0 0 X′2X′2 bracketrightbigg?1bracketleftbigg y 1 y2 bracketrightbigg = bracketleftbigg (X′ 1X1) ?1X′ 1y1 (X′2X′2)?1X′2y2 bracketrightbigg ,(11) and the residual is bracketleftbigg e 1 e2 bracketrightbigg = e = bracketleftbigg y 1 y2 bracketrightbigg ? bracketleftbigg X 1 0 0 X2 bracketrightbiggbracketleftbigg ?β 1 ?β2 bracketrightbigg = bracketleftbigg y 1 ?X1 ?β1 y2 ?X2 ?β2 bracketrightbigg . The sum of squared residual of the separate regression is e′e = e′1e1 +e′2e2 = SSE1 + SSE2, which is the sums of squared residuals from the addition of the ’separated regression’ and can be regarded as ’errors from unrestricted model’ relative to the joint regression (9). Regarding the sums of squared error SSE from the joint regression’ as error from ”restricted” model, the results is apparent from (7). 24 5 Prediction Let us consider a set of T0 observations not included in the original sample of T observations. Specifically, Let X0 denote these T0 observations on the regressors, and y0 these observations on y. Now let y0 be forecasted by ?y0 = X0?β, where ?β = (X′X)?1X′y is the OLS estimator based on the original T observa- tions. Finally let v0 be the set of forecast errors defined by v0 = y0 ?X0?β = y0 ?X0(X′X)?1X′y. Theorem: E(v0) = 0 and E(v0v′0) = σ2(I+X0(X′X)?1X′0). Proof: E(v0) = E(y0 ?X0?β) = E[X0(β? ?β) +ε0] = 0, and E(v0v′0) = E{[X0(β? ?β) +ε0][X0(β? ?β) +ε0]′} = E{[X0((X′X)?1X′ε) +ε0][((X′X)?1X′ε)′X′0 +ε′0]} = σ2(X0(X′X)?1X′0) + σ2IT0 since E(εε′0 = 0), = σ2(IT0 +X0(X′X)?1X′0) Theorem: Suppose that we wish to predict a single value of Y0 (T0 = 1) associated with a regressor X0(1×k) = x′0, then v0 s2(I+x0(X′X)?1x′0) ～ tT?k, 25 where v0 = Y0 ?x′0?β. Proof: As is shown that since v0 is a linear function of normal vector, v0 ～ N(0,σ2(I+X0(X′X)?1X′0), and v0 ～ N(0,σ2(1 +x0(X′X)?1x′0). Then v0/[σ2(1 +x0(X′X)?1x′0)] (T ?k)s2/σ2(T ?k) = v0 s2(1 +x0(X′X)?1x′0) ～ tT?k. Corollary: The forecast interval for Y0 would be formed using forecast interval = ?Y0 ±tα/2 ·s2(1 +x0(X′X)?1x′0) Example: Greene 5-th, p. 111. 5.1 Measuring the Accuracy of Forecasts Various measures have been proposed for assessing the predictive accuracy of forecast models. Two that are based on the residuals from the forecasts are the root mean squared error RMSE = radicalBigg 1 T0 summationdisplay i (Yi ? ?Yi)2 and the mean absolute error MAE = 1T 0 summationdisplay i |Yi ? ?Yi|, 26 where T0 is the number of periods being forecasted. It is needed to keep in mind that however the RMSE and MAE are also random variables. To compare predictive accuracy we need a ”test statistics” to test equality of forecast accuracy. See for example Diebold and Mariano (1995, JBES, p. 253). 27

课件简介

课件名称：	国立中山大学：计量经济学（英文版）
课件分类：	经济
课件类型：	电子图书
文件大小：	2.94MB
下载次数：	2
评论次数：	1
用户评分：	9

用户列表