协方差分析一个协变量的协方差分析例:为研究三种饲料(A1(g=1),A2(g=2),A3(g=3))对猪催肥效果,用每种饲料喂养8头猪,实验用猪的初始体重(x)未控制。喂养一段时间后,观察小猪的增重(y)。所得资料如表2-1,试分析三种饲料对猪催肥效果是否相同。
资料结构:(文件名covariance1.dta)
x
y
g
15
85
1
13
83
1
11
65
1
12
76
1
12
80
1
16
91
1
14
84
1
17
90
1
17
97
2
16
90
2
18
100
2
18
95
2
21
103
2
22
106
2
19
99
2
18
94
2
22
89
3
24
91
3
20
83
3
23
95
3
25
100
3
27
102
3
30
105
3
32
110
3
对于不考虑初始体重影响而评价三种饲料的统计分析为单因素方差分析(One-way ANOVA),由于小猪的增重与初始体重有关,因此在分析三种饲料对增重的关系时,应该考虑校正初始体重对增重的影响。并假定初始体重与增重呈线性统计关系以及要求初始体重与饲料不构成交互作用。称校正变量(初始体重)为协变量,分组变量为因子变量。因此可用协方差分析上述统计问题,相应的角模型如下:
A1(g=1)
A2(g=2)
A3(g=3)
不校正初始体重
校正初始体重
用STATA命令为:
anova y g x g*x,class(g)
Number of obs = 24 R-squared = 0.9297
Root MSE = 3.15855 Adj R-squared = 0.9102
Source | Partial SS df MS F Prob > F
Model | 2376.3819 5 475.27638 47.64 0.0000
|
g | 24.4661579 2 12.233079 1.23 0.3168
x | 830.415407 1 830.415407 83.24 0.0000
g*x | 48.0381359 2 24.019068 2.41 0.1184
|
Residual | 179.576433 18 9.97646848
Total | 2555.95833 23 111.128623
由g*x项的P值=0.1184>0.05,说明初始体重与饲料不构成交互作用。
anova y g x,class(g)
Number of obs = 24 R-squared = 0.9109
Root MSE = 3.37353 Adj R-squared = 0.8976
Source | Partial SS df MS F Prob > F
Model | 2328.34376 3 776.114588 68.20 0.0000
g | 707.218765 2 353.609382 31.07 0.0000
x | 1010.76043 1 1010.76043 88.81 0.0000
Residual | 227.614568 20 11.3807284
Total | 2555.95833 23 111.128623
regress
Source | SS df MS Number of obs = 24
-------------+------------------------------ F( 3,20) = 68.20
Model | 2328.34376 3 776.114588 Prob > F = 0.0000
Residual | 227.614568 20 11.3807284 R-squared = 0.9109
-------------+------------------------------ Adj R-squared = 0.8976
Total | 2555.95833 23 111.128623 Root MSE = 3.3735
y Coef,Std,Err,t P>|t| [95% Conf,Interval]
_cons 35.93518 6.575471 5.47 0.000 22.21899 49.65137
g
1 12.79324 3.408989 3.75 0.001 5.682214 19.90427
2 17.33559 2.409151 7.20 0.000 12.31019 22.36099
3 (dropped)
x 2.401569,2548332 9.42 0.000 1.869996 2.933142
,,,
A1 vs A3:,H0:(1=0 vs H1:(1(0
对应的P值为0.001<0.05,因此认为两组总体均数不同,由(1的95%可信区间可认为A1的均数大于A3的均数,差别有统计意义。
A2 vs A3:,H0:(2=0 vs H1:(2(0
对应的P值为0.001<0.05,因此认为两组总体均数不同,由(2的95%可信区间可认为A2的均数大于A3的均数,差别有统计意义。
A1 vs A2:,H0:(1-(2=0 vs H1:(1-(2(0
test _b[g[1]]-_b[g[2]]=0
( 1) g[1] - g[2] = 0.0
F( 1,20) = 4.70
Prob > F = 0.0424
对应的P值为0.0424<0.05,因此认为两组总体均数不同,由于点估计为:<,P值小于0.05,因此可认为A2的均数大于A1的均数,差别有统计意义。
结论:
1)A2饲料喂养的小猪增重最高,A1饲料喂养的小猪增重也高于A3饲料喂养的小猪的增重,差别均有统计意义,P值均小于0.05。
2)小猪的增重与初始的呈正相关,P<0.05。
两种干预的效果评价中校正混杂因素有2种干预治疗高血压,现仅以收缩压为例,讨论评价疗效的方法。
分组
治疗前
治疗后
group
x1
x2
1
1
131.4
125.2
2
1
140.2
133.5
3
1
138.8
132.7
4
1
139.5
132.4
5
1
140.8
133.9
6
1
130.5
124.5
7
1
139.8
133.4
8
1
128.7
122.9
9
1
138.8
131.8
10
1
144.7
137
11
1
134
127.3
12
1
127.7
121.5
13
1
136.8
130.5
14
1
145.6
140.1
15
1
138.3
131.2
16
2
144
136.1
17
2
133.1
126.2
18
2
138.9
131.2
19
2
134.2
127.1
20
2
147.7
139.3
21
2
134.4
127
22
2
130.8
123.5
23
2
136.6
129.7
24
2
141.5
134.5
25
2
144.8
136.9
26
2
137.4
129.9
27
2
136.8
128.8
28
2
145.5
139
29
2
128.5
121.4
30
2
140.7
132.6
一、计算治疗前后的改变量:gen d=x1-x2
二、计算两组的平均改变量:tab group,su(d)
| Summary of d
group | Mean Std,Dev,Freq.
------------+------------------------------------
1 | 6.5133347,58415673 15
2 | 7.4466665,53966211 15
------------+------------------------------------
Total | 6.9800006,72843602 30
第2组比第1组多下降0.9333318mmHg(即:两组疗效的差异为0.9333318)
校正治疗前的影响,则用协方差模型
anova d group x1,class(group)
Number of obs = 30 R-squared = 0.5159
Root MSE =,525257 Adj R-squared = 0.4801
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 7.93878728 2 3.96939364 14.39 0.0001
|
group | 5.74578681 1 5.74578681 20.83 0.0001
x1 | 1.40547531 1 1.40547531 5.09 0.0323
|
Residual | 7.44916457 27,275894984
-----------+----------------------------------------------------
Total | 15.3879518 29,530619029
说明:疗效与基线情况有关,并且两组干预的疗效有差异。
,regress
Source | SS df MS Number of obs = 30
-------------+------------------------------ F( 2,27) = 14.39
Model | 7.93878728 2 3.96939364 Prob > F = 0.0001
Residual | 7.44916457 27,275894984 R-squared = 0.5159
-------------+------------------------------ Adj R-squared = 0.4801
Total | 15.3879518 29,530619029 Root MSE =,52526
-----------------------------------------------------------------------------
d Coef,Std,Err,t P>|t| [95% Conf,Interval]
-----------------------------------------------------------------------------
_cons 1.876583 2.471592 0.76 0.454 -3.194705 6.947871
group
1 -.8815209,1931656 -4.56 0.000 -1.277864 -.4851778
2 (dropped)
x1,0402676,0178409 2.26 0.032,0036612,076874
-----------------------------------------------------------------------------
校正了基线以后,两组疗效的差异为0.8815209,并且有统计学意义。
(注:未校正前的两组疗效的差异为0.9333318)
一个协变量、二个因子的协方差分析例2-2 某园艺家研究鲜花的种类(因子A:花种LP(a=1)和花种WB(a=2))和湿度(因子B:湿度低(b=1)和湿度高(b=2))对出售鲜花量(y)的影响。因为试验田的大小不等,故把试验田的大小(x)作为协变量,每个试验田重复6次,资料如书(p28)所述,试分析出售鲜花量与这2个因子的关系。
数据结构:
y
x
a
b
98
15
1
1
60
4
1
1
77
7
1
1
80
9
1
1
95
14
1
1
64
5
1
1
71
10
1
2
80
12
1
2
86
14
1
2
82
13
1
2
46
2
1
2
55
3
1
2
55
4
2
1
60
5
2
1
75
8
2
1
65
7
2
1
87
13
2
1
78
11
2
1
76
11
2
2
68
10
2
2
43
2
2
2
47
3
2
2
62
7
2
2
70
9
2
2
由于试验田大小(x)同样可以影响出售鲜花量(y),所以可用协方差分析:
花种
湿度低(b=1)
湿度高(b=2)
无x
(ANOVA)
LP(a=1)
WB(a=2)
含x
(Co-ANOVA)
LP(a=1)
WB(a=2)
anova y a b a*b x a*b*x,class(a b)
Number of obs = 24 R-squared = 0.9786
Root MSE = 2.60706 Adj R-squared = 0.9693
Source | Partial SS df MS F Prob > F
Model | 4977.25219 7 711.036027 104.61 0.0000
|
a | 38.8083118 1 38.8083118 5.71 0.0295
b | 44.0347332 1 44.0347332 6.48 0.0216
a*b |,027656949 1,027656949 0.00 0.9499
x | 3703.3091 1 3703.3091 544.87 0.0000
a*b*x | 10.7333748 3 3.5777916 0.53 0.6704
|
Residual | 108.747808 16 6.796738
Total | 5086.00 23 221.130435
由于协变量x与因子a和b的交互项a*b*x的检验的P值=0.6704>0.05,所以可以认为因子a和b与协变量x无交互作用。
Number of obs = 24 R-squared = 0.9765
Root MSE = 2.50768 Adj R-squared = 0.9716
Source | Partial SS df MS F Prob > F
Model | 4966.51882 4 1241.6297 197.45 0.0000
|
a | 96.6018263 1 96.6018263 15.36 0.0009
b | 323.849473 1 323.849473 51.50 0.0000
x | 3994.51882 1 3994.51882 635.21 0.0000
a*b | 16.0422442 1 16.0422442 2.55 0.1267
|
Residual | 119.481183 19 6.28848331
Total | 5086.00 23 221.130435
anova y a b x,class(a b)
Number of obs = 24 R-squared = 0.9734
Root MSE = 2.60311 Adj R-squared = 0.9694
Source | Partial SS df MS F Prob > F
Model | 4950.47657 3 1650.15886 243.52 0.0000
|
a | 97.5515084 1 97.5515084 14.40 0.0011
b | 324.433906 1 324.433906 47.88 0.0000
x | 3978.47657 1 3978.47657 587.13 0.0000
|
Residual | 135.523427 20 6.77617135
Total | 5086.00 23 221.130435
regress
y
Coef.
Std.
Err.
t
P>|t|
[95% Conf.
Interval]
_cons
37.33802
1.341875
27.83
0.000
34.53892
40.13712
a
1
4.104418
1.08175
3.79
0.001
1.847928
6.360908
2 (dropped)
b
1
7.368139
1.064846
6.92
0.000
5.146909
9.58937
2 (dropped)
x
3.263722
.1346936
24.23
0.000
2.982756
3.544687
,,,
说明:在同样的试验田数的情况下,花种LP的鲜花销售量高于花种WB的鲜花销售量,差别有统计意义。P=0.001<0.05
在同样的试验田数的情况下,低湿度的鲜花销售量高于高湿度的鲜花销售量,差别有统计意义。P=0.001<0.05
试验田越大,销售量越大,差别有统计意义,P<0.001
test _b[a[1]]=_b[b[1]]
( 1) a[1] - b[1] = 0.0
F( 1,20) = 4.68
Prob > F = 0.0428
相比之下,湿度因子影响销售量高于花种的影响销售量。
资料结构:(文件名covariance1.dta)
x
y
g
15
85
1
13
83
1
11
65
1
12
76
1
12
80
1
16
91
1
14
84
1
17
90
1
17
97
2
16
90
2
18
100
2
18
95
2
21
103
2
22
106
2
19
99
2
18
94
2
22
89
3
24
91
3
20
83
3
23
95
3
25
100
3
27
102
3
30
105
3
32
110
3
对于不考虑初始体重影响而评价三种饲料的统计分析为单因素方差分析(One-way ANOVA),由于小猪的增重与初始体重有关,因此在分析三种饲料对增重的关系时,应该考虑校正初始体重对增重的影响。并假定初始体重与增重呈线性统计关系以及要求初始体重与饲料不构成交互作用。称校正变量(初始体重)为协变量,分组变量为因子变量。因此可用协方差分析上述统计问题,相应的角模型如下:
A1(g=1)
A2(g=2)
A3(g=3)
不校正初始体重
校正初始体重
用STATA命令为:
anova y g x g*x,class(g)
Number of obs = 24 R-squared = 0.9297
Root MSE = 3.15855 Adj R-squared = 0.9102
Source | Partial SS df MS F Prob > F
Model | 2376.3819 5 475.27638 47.64 0.0000
|
g | 24.4661579 2 12.233079 1.23 0.3168
x | 830.415407 1 830.415407 83.24 0.0000
g*x | 48.0381359 2 24.019068 2.41 0.1184
|
Residual | 179.576433 18 9.97646848
Total | 2555.95833 23 111.128623
由g*x项的P值=0.1184>0.05,说明初始体重与饲料不构成交互作用。
anova y g x,class(g)
Number of obs = 24 R-squared = 0.9109
Root MSE = 3.37353 Adj R-squared = 0.8976
Source | Partial SS df MS F Prob > F
Model | 2328.34376 3 776.114588 68.20 0.0000
g | 707.218765 2 353.609382 31.07 0.0000
x | 1010.76043 1 1010.76043 88.81 0.0000
Residual | 227.614568 20 11.3807284
Total | 2555.95833 23 111.128623
regress
Source | SS df MS Number of obs = 24
-------------+------------------------------ F( 3,20) = 68.20
Model | 2328.34376 3 776.114588 Prob > F = 0.0000
Residual | 227.614568 20 11.3807284 R-squared = 0.9109
-------------+------------------------------ Adj R-squared = 0.8976
Total | 2555.95833 23 111.128623 Root MSE = 3.3735
y Coef,Std,Err,t P>|t| [95% Conf,Interval]
_cons 35.93518 6.575471 5.47 0.000 22.21899 49.65137
g
1 12.79324 3.408989 3.75 0.001 5.682214 19.90427
2 17.33559 2.409151 7.20 0.000 12.31019 22.36099
3 (dropped)
x 2.401569,2548332 9.42 0.000 1.869996 2.933142
,,,
A1 vs A3:,H0:(1=0 vs H1:(1(0
对应的P值为0.001<0.05,因此认为两组总体均数不同,由(1的95%可信区间可认为A1的均数大于A3的均数,差别有统计意义。
A2 vs A3:,H0:(2=0 vs H1:(2(0
对应的P值为0.001<0.05,因此认为两组总体均数不同,由(2的95%可信区间可认为A2的均数大于A3的均数,差别有统计意义。
A1 vs A2:,H0:(1-(2=0 vs H1:(1-(2(0
test _b[g[1]]-_b[g[2]]=0
( 1) g[1] - g[2] = 0.0
F( 1,20) = 4.70
Prob > F = 0.0424
对应的P值为0.0424<0.05,因此认为两组总体均数不同,由于点估计为:<,P值小于0.05,因此可认为A2的均数大于A1的均数,差别有统计意义。
结论:
1)A2饲料喂养的小猪增重最高,A1饲料喂养的小猪增重也高于A3饲料喂养的小猪的增重,差别均有统计意义,P值均小于0.05。
2)小猪的增重与初始的呈正相关,P<0.05。
两种干预的效果评价中校正混杂因素有2种干预治疗高血压,现仅以收缩压为例,讨论评价疗效的方法。
分组
治疗前
治疗后
group
x1
x2
1
1
131.4
125.2
2
1
140.2
133.5
3
1
138.8
132.7
4
1
139.5
132.4
5
1
140.8
133.9
6
1
130.5
124.5
7
1
139.8
133.4
8
1
128.7
122.9
9
1
138.8
131.8
10
1
144.7
137
11
1
134
127.3
12
1
127.7
121.5
13
1
136.8
130.5
14
1
145.6
140.1
15
1
138.3
131.2
16
2
144
136.1
17
2
133.1
126.2
18
2
138.9
131.2
19
2
134.2
127.1
20
2
147.7
139.3
21
2
134.4
127
22
2
130.8
123.5
23
2
136.6
129.7
24
2
141.5
134.5
25
2
144.8
136.9
26
2
137.4
129.9
27
2
136.8
128.8
28
2
145.5
139
29
2
128.5
121.4
30
2
140.7
132.6
一、计算治疗前后的改变量:gen d=x1-x2
二、计算两组的平均改变量:tab group,su(d)
| Summary of d
group | Mean Std,Dev,Freq.
------------+------------------------------------
1 | 6.5133347,58415673 15
2 | 7.4466665,53966211 15
------------+------------------------------------
Total | 6.9800006,72843602 30
第2组比第1组多下降0.9333318mmHg(即:两组疗效的差异为0.9333318)
校正治疗前的影响,则用协方差模型
anova d group x1,class(group)
Number of obs = 30 R-squared = 0.5159
Root MSE =,525257 Adj R-squared = 0.4801
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 7.93878728 2 3.96939364 14.39 0.0001
|
group | 5.74578681 1 5.74578681 20.83 0.0001
x1 | 1.40547531 1 1.40547531 5.09 0.0323
|
Residual | 7.44916457 27,275894984
-----------+----------------------------------------------------
Total | 15.3879518 29,530619029
说明:疗效与基线情况有关,并且两组干预的疗效有差异。
,regress
Source | SS df MS Number of obs = 30
-------------+------------------------------ F( 2,27) = 14.39
Model | 7.93878728 2 3.96939364 Prob > F = 0.0001
Residual | 7.44916457 27,275894984 R-squared = 0.5159
-------------+------------------------------ Adj R-squared = 0.4801
Total | 15.3879518 29,530619029 Root MSE =,52526
-----------------------------------------------------------------------------
d Coef,Std,Err,t P>|t| [95% Conf,Interval]
-----------------------------------------------------------------------------
_cons 1.876583 2.471592 0.76 0.454 -3.194705 6.947871
group
1 -.8815209,1931656 -4.56 0.000 -1.277864 -.4851778
2 (dropped)
x1,0402676,0178409 2.26 0.032,0036612,076874
-----------------------------------------------------------------------------
校正了基线以后,两组疗效的差异为0.8815209,并且有统计学意义。
(注:未校正前的两组疗效的差异为0.9333318)
一个协变量、二个因子的协方差分析例2-2 某园艺家研究鲜花的种类(因子A:花种LP(a=1)和花种WB(a=2))和湿度(因子B:湿度低(b=1)和湿度高(b=2))对出售鲜花量(y)的影响。因为试验田的大小不等,故把试验田的大小(x)作为协变量,每个试验田重复6次,资料如书(p28)所述,试分析出售鲜花量与这2个因子的关系。
数据结构:
y
x
a
b
98
15
1
1
60
4
1
1
77
7
1
1
80
9
1
1
95
14
1
1
64
5
1
1
71
10
1
2
80
12
1
2
86
14
1
2
82
13
1
2
46
2
1
2
55
3
1
2
55
4
2
1
60
5
2
1
75
8
2
1
65
7
2
1
87
13
2
1
78
11
2
1
76
11
2
2
68
10
2
2
43
2
2
2
47
3
2
2
62
7
2
2
70
9
2
2
由于试验田大小(x)同样可以影响出售鲜花量(y),所以可用协方差分析:
花种
湿度低(b=1)
湿度高(b=2)
无x
(ANOVA)
LP(a=1)
WB(a=2)
含x
(Co-ANOVA)
LP(a=1)
WB(a=2)
anova y a b a*b x a*b*x,class(a b)
Number of obs = 24 R-squared = 0.9786
Root MSE = 2.60706 Adj R-squared = 0.9693
Source | Partial SS df MS F Prob > F
Model | 4977.25219 7 711.036027 104.61 0.0000
|
a | 38.8083118 1 38.8083118 5.71 0.0295
b | 44.0347332 1 44.0347332 6.48 0.0216
a*b |,027656949 1,027656949 0.00 0.9499
x | 3703.3091 1 3703.3091 544.87 0.0000
a*b*x | 10.7333748 3 3.5777916 0.53 0.6704
|
Residual | 108.747808 16 6.796738
Total | 5086.00 23 221.130435
由于协变量x与因子a和b的交互项a*b*x的检验的P值=0.6704>0.05,所以可以认为因子a和b与协变量x无交互作用。
Number of obs = 24 R-squared = 0.9765
Root MSE = 2.50768 Adj R-squared = 0.9716
Source | Partial SS df MS F Prob > F
Model | 4966.51882 4 1241.6297 197.45 0.0000
|
a | 96.6018263 1 96.6018263 15.36 0.0009
b | 323.849473 1 323.849473 51.50 0.0000
x | 3994.51882 1 3994.51882 635.21 0.0000
a*b | 16.0422442 1 16.0422442 2.55 0.1267
|
Residual | 119.481183 19 6.28848331
Total | 5086.00 23 221.130435
anova y a b x,class(a b)
Number of obs = 24 R-squared = 0.9734
Root MSE = 2.60311 Adj R-squared = 0.9694
Source | Partial SS df MS F Prob > F
Model | 4950.47657 3 1650.15886 243.52 0.0000
|
a | 97.5515084 1 97.5515084 14.40 0.0011
b | 324.433906 1 324.433906 47.88 0.0000
x | 3978.47657 1 3978.47657 587.13 0.0000
|
Residual | 135.523427 20 6.77617135
Total | 5086.00 23 221.130435
regress
y
Coef.
Std.
Err.
t
P>|t|
[95% Conf.
Interval]
_cons
37.33802
1.341875
27.83
0.000
34.53892
40.13712
a
1
4.104418
1.08175
3.79
0.001
1.847928
6.360908
2 (dropped)
b
1
7.368139
1.064846
6.92
0.000
5.146909
9.58937
2 (dropped)
x
3.263722
.1346936
24.23
0.000
2.982756
3.544687
,,,
说明:在同样的试验田数的情况下,花种LP的鲜花销售量高于花种WB的鲜花销售量,差别有统计意义。P=0.001<0.05
在同样的试验田数的情况下,低湿度的鲜花销售量高于高湿度的鲜花销售量,差别有统计意义。P=0.001<0.05
试验田越大,销售量越大,差别有统计意义,P<0.001
test _b[a[1]]=_b[b[1]]
( 1) a[1] - b[1] = 0.0
F( 1,20) = 4.68
Prob > F = 0.0428
相比之下,湿度因子影响销售量高于花种的影响销售量。