2.5 常用实验设计方差分析的SAS程序
在这本教材中我们只介绍了完全随机化实验设计和交叉分组实验设计的方差分析。除这两种实验设计外,还有很多实验设计需要用方差分析的方法处理数据。如随机化完全区组设计、拉丁方设计、裂区设计、套设计、正交设计等。这些实验设计方法在很多教材中都可以找到,限于篇幅在这里就不做更多的介绍了,只给出线性统计模型、均方期望和检验统计量。完全随机化实验设计的SAS程序在§ 2.4中已经做过介绍,这一节将给出其它一些实验设计方差分析的SAS程序。在阅读以下内容之前,请先阅读第一章“SAS软件基本操作”。
2.5.1 三因素交叉分组实验的方差分析
在课本9.5.3中已经给出了一个混合模型(A、C固定,B随机)三因素交叉分组实验设计的均方期望及检验统计量。下面以一个一般化的三因素交叉分组实验为例说明方差分析的SAS程序。
例 2.10 由A、B、C三个因素构成一个三因素交叉分组实验,其中A、C固定,B随机。A因素有三个水平,记为A1-A3;B因素有四个水平,记为B1-B4;C因素有五个水平,记为C1-C5,实验重复两次。记录了R1和R2两个因变量(即实验结果,如作物的株高、穗长,人的血压、血黏度等),原始数据不再给出。按每一观测的A、B、C、R1、R2的顺序建立外部数据文件,路径和文件名为a:\2-6data.dat。
1 1 1 18.0 24.1 1 1 2 19.6 24.7 1 1 3 17.5 24.7 1 1 4 17.9 25.8
1 1 5 19.1 25.2 1 2 1 23.4 33.4 1 2 2 23.0 33.2 1 2 3 23.9 32.9
1 2 4 23.2 34.3 1 2 5 27.0 35.0 1 3 1 24.5 29.6 1 3 2 23.7 30.8
1 3 3 23.5 31.7 1 3 4 21.2 32.2 1 3 5 25.7 31.9 1 4 1 19.4 27.6
1 4 2 17.3 27.8 1 4 3 18.1 28.0 1 4 4 18.8 28.7 1 4 5 18.8 28.4
2 1 1 18.8 28.7 2 1 2 19.6 28.6 2 1 3 18.6 29.8 2 1 4 18.2 30.1
2 1 5 20.8 31.0 2 2 1 24.2 38.2 2 2 2 24.4 37.9 2 2 3 25.3 38.3
2 2 4 24.0 38.6 2 2 5 27.3 33.7 2 3 1 25.9 35.1 2 3 2 23.6 34.4
2 3 3 23.8 36.1 2 3 4 21.1 35.9 2 3 5 26.4 36.4 2 4 1 18.9 34.2
2 4 2 21.9 31.9 2 4 3 23.5 32.3 2 4 4 20.0 33.0 2 4 5 20.4 33.3
3 1 1 19.2 31.2 3 1 2 19.6 30.6 3 1 3 19.2 32.5 3 1 4 18.9 33.1
3 1 5 20.0 32.3 3 2 1 22.6 38.7 3 2 2 23.4 39.4 3 2 3 25.5 41.0
3 2 4 24.2 41.2 3 2 5 28.3 42.4 3 3 1 25.3 36.3 3 3 2 23.9 37.2
3 3 3 23.8 36.9 3 3 4 21.2 38.4 3 3 5 25.4 37.4 3 4 1 17.2 30.9
3 4 2 17.9 32.0 3 4 3 20.8 31.8 3 4 4 18.2 33.1 3 4 5 16.4 31.5
1 1 1 18.3 24.4 1 1 2 19.2 24.2 1 1 3 18.4 25.5 1 1 4 18.1 26.3
1 1 5 19.2 25.3 1 2 1 23.3 33.2 1 2 2 23.0 32.9 1 2 3 25.1 34.2
1 2 4 24.6 35.6 1 2 5 26.0 34.0 1 3 1 24.5 29.5 1 3 2 23.1 30.0
1 3 3 23.0 31.1 1 3 4 20.3 31.3 1 3 5 25.5 31.4 1 4 1 19.6 27.4
1 4 2 19.8 25.9 1 4 3 22.2 27.3 1 4 4 19.5 28.5 1 4 5 19.6 28.1
2 1 1 18.0 28.0 2 1 2 19.6 28.4 2 1 3 19.3 30.6 2 1 4 18.0 30.0
2 1 5 20.1 30.3 2 2 1 24.0 38.8 2 2 2 23.8 37.4 2 2 3 24.2 36.9
2 2 4 24.2 38.9 2 2 5 27.8 37.0 2 3 1 25.6 34.7 2 3 2 23.4 34.0
2 3 3 23.7 35.7 2 3 4 20.6 35.3 2 3 5 26.1 35.9 2 4 1 20.4 32.3
2 4 2 24.6 34.6 2 4 3 23.9 32.8 2 4 4 21.1 34.1 2 4 5 20.0 33.0
3 1 1 18.3 30.1 3 1 2 19.8 31.0 3 1 3 17.6 30.6 3 1 4 17.9 31.9
3 1 5 20.8 32.8 3 2 1 23.4 39.8 3 2 2 23.4 39.4 3 2 3 26.5 41.7
3 2 4 24.4 41.6 3 2 5 27.1 41.3 3 3 1 25.6 36.6 3 3 2 23.5 37.0
3 3 3 23.7 37.9 3 3 4 21.4 38.4 3 3 5 25.5 37.5 3 4 1 17.5 31.5
3 4 2 19.5 31.6 3 4 3 21.7 32.4 3 4 4 18.4 33.4 3 4 5 16.5 31.5
解:SAS程序如下:
options linesize=76;
data example;
infile ‘a:\2-6data.dat’;
input a b c r1 r2 @@;
run;
proc anova;
class a b c;
model r1 r2 = a b c a*b a*c b*c a*b*c;
test h = a e = a*b;
test h = c e = b*c;
test h = a*c e = a*b*c;
means a / duncan e = a*b alpha = 0.01;
means c / lsd e = b*c alpha = 0.01;
run;
与单因素方差分析的SAS程序相比,大同小异。在这里由于因素由1个变为3个,因此分类变量相应变为3个。在MODEL语句中r1 r2 = a b c a*b a*c b*c a*b*c; 的含义是,需要分析a、b、c三个主效应,两两交互作用及三重交互作用对因变量r1和r2的贡献。实际上,这里是两次方差分析,得到两个方差分析表,一个是对r1进行的方差分析,一个是对r2进行的方差分析。当然也可以只计算其中的一部分,如r1 r2 = a b c b*c或r2 = a b c a*b a*b*c 等。
TEST语句中h = a e = a*b 的含义是用A´B交互作用检验A因素效应,即FA =MSA / MSAB,另外两个TEST语句含义为FC=MSC / MSBC,FAC=MSAC / MSABC。在没有特别说明时,因素的效应都是用MSe检验的(见课本9.5.3)。当然,随着模型的改变,检验统计量会相应改变,这里的TEST语句也要改变。
MEANS语句中选项e = a*b是指明在做DUNCAN检验时,应使用MSAB作为误差均方检验因素A的效应,否则将使用MSe做检验。
实验结果中,若有缺失数据,缺失的数据在方差分析中将被忽略掉,因此实验结果中的数据应完整。
执行上述程序,输出的结果见表2-14。
表2-14 例2.10方差分析输出的结果
The SAS System
Analysis of Variance Procedure
Class Level Information
Class |
Levels |
Values |
A |
3 |
1 2 3 |
B |
4 |
1 2 3 4 |
C |
5 |
1 2 3 4 5 |
Number of observations in data set = 120
The SAS System
Analysis of Variance Procedure
Dependent Variable: R1
|
|
Sum of |
Mean |
|
|
Source |
DF |
Squares |
Square |
F Value |
Pr > F |
|
|
|
|
|
|
Model |
59 |
1028.71625 |
17.43587 |
35.88 |
0.0001 |
Error |
60 |
29.15500 |
0.48592 |
|
|
Correted Total |
119 |
1057.87125 |
|
|
|
R-Square |
C.V. |
Root MSE |
R1 Mean |
0.972440 |
3.199437 |
0.69708 |
21.7875 |
Source |
DF |
Anova SS |
Mean Square |
F Value |
Pr > F |
|
|
|
|
|
|
A |
2 |
21.608000 |
10.804000 |
22.23 |
0.0001 |
B |
3 |
748.776917 |
249.592306 |
513.65 |
0.0001 |
C |
4 |
68.006667 |
17.001667 |
34.99 |
0.0001 |
A*B |
6 |
34.511333 |
5.751889 |
11.84 |
0.0001 |
A*C |
8 |
6.035333 |
0.754417 |
1.55 |
0.1586 |
B*C |
12 |
129.352667 |
10.779389 |
22.18 |
0.0001 |
A*B*C |
24 |
20.425333 |
0.851056 |
1.75 |
0.0412 |
Tests of Hypotheses using the Anova MS for A*B as an error term
Source |
DF |
Anova SS |
Mean Square |
F Value |
Pr > F |
A |
2 |
21.6080000 |
10.8040000 |
1.88 |
0.2326 |
Tests of Hypotheses using the Anova MS for B*C as an error term
Source |
DF |
Anova SS |
Mean Square |
F Value |
Pr > F |
C |
4 |
68.0066667 |
17.0016667 |
1.58 |
0.2432 |
Tests of Hypotheses using the Anova MS for A*B*C as an error term
Source |
DF |
Anova SS |
Mean Square |
F Value |
Pr > F |
A*C |
8 |
6.03533333 |
0.75441667 |
0.89 |
0.5421 |
The SAS System
Analysis of Variance Procedure
Dependent Variable: R2
|
|
Sum of |
Mean |
|
|
Source |
DF |
Squares |
Square |
F Value |
Pr > F |
|
|
|
|
|
|
Model |
59 |
2224.52967 |
37.70389 |
85.85 |
0.0001 |
Error |
60 |
26.35000 |
0.43917 |
|
|
Corrected Total |
119 |
2250.87967 |
|
|
|
R-Square |
C.V. |
Root MSE |
R2 Mean |
0.988293 |
2.014173 |
0.66270 |
32.9017 |
Source |
DF |
Anova SS |
Mean Square |
F Value |
Pr > F |
|
|
|
|
|
|
A |
2 |
779.20117 |
389.60058 |
887.14 |
0.0001 |
B |
3 |
1314.66700 |
438.22233 |
997.85 |
0.0001 |
C |
4 |
38.03300 |
9.50825 |
21.65 |
0.0001 |
A*B |
6 |
53.47350 |
8.91225 |
20.29 |
0.0001 |
A*C |
8 |
5.84050 |
0.73006 |
1.66 |
0.1266 |
B*C |
12 |
7.51300 |
0.62608 |
1.43 |
0.1798 |
A*B*C |
24 |
25.80150 |
1.07506 |
2.45 |
0.0027 |
Tests of Hypotheses using the Anova MS for A*B as an error term
Source |
DF |
Anova SS |
Mean Square |
F Value |
Pr > F |
A |
2 |
779.201167 |
389.600583 |
43.72 |
0.0003 |
Tests of Hypotheses using the Anova MS for B*C as an error term
Source |
DF |
Anova SS |
Mean Square |
F Value |
Pr > F |
C |
4 |
38.0330000 |
9.5082500 |
15.19 |
0.0001 |
Tests of Hypotheses using the Anova MS for A*B*C as an error term
Source |
DF |
Anova SS |
Mean Square |
F Value |
Pr > F |
A*C |
8 |
5.84050000 |
0.73006250 |
0.68 |
0.7052 |
The SAS System
Analysis of Variance Procedure
Duncan's Multiple Range Test for variable: R1
NOTE: This test controls the type I comparisonwise error rate, not
the experimentwise error rate
Alpha=0.01 df=6 MSE=5.751889
Number of Means |
2 |
3 |
Critical Range |
1.988 |
2.063 |
Means with the same letter are not significantly different.
Duncan Grouping |
Mean |
N |
A |
|
|
|
|
A |
22.3775 |
40 |
2 |
A |
|
|
|
A |
21.5875 |
40 |
3 |
A |
|
|
|
A |
21.3975 |
40 |
1 |
The SAS System
Analysis of Variance Procedure
Duncan's Multiple Range Test for variable: R2
NOTE: This test controls the type I comparisonwise error rate, not
the experimentwise error rate
Alpha=0.01 df=6 MSE=8.91225
Number of Means |
2 |
3 |
Critical Range |
2.475 |
2.567 |
Means with the same letter are not significantly different.
Duncan Grouping |
Mean |
N |
A |
|
|
|
|
A |
35.3975 |
40 |
3 |
A |
|
|
|
A |
33.9050 |
40 |
2 |
|
|
|
|
B |
29.4025 |
40 |
1 |
The SAS System
Analysis of Variance Procedure
T tests (LSD) for variable: R1
NOTE: This test controls the type I comparisonwise error rate not
the experimentwise error rate.
Alpha=0.01 df=12 MSE=10.77939
Critical Value of T=3.05
Least Significant Difference=2.895
Means with the same letter are not significantly different.
T Grouping |
Mean |
N |
C |
|
|
|
|
A |
22.9083 |
24 |
5 |
A |
|
|
|
A |
22.2000 |
24 |
3 |
A |
|
|
|
A |
21.6917 |
24 |
2 |
A |
|
|
|
A |
21.4958 |
24 |
1 |
A |
|
|
|
A |
20.6417 |
24 |
4 |
The SAS System
Analysis of Variance Procedure
T tests (LSD) for variable: R2
NOTE: This test controls the type I comparisonwise error rate not
the experimentwise error rate.
Alpha=0.01 df=12 MSE=0.626083
Critical Value of T=3.05
Least Significant Difference= 0.6977
Means with the same letter are not significantly different.
T Grouping |
Mean |
N |
C |
|
|
|
|
A |
33.7375 |
24 |
4 |
A |
|
|
|
B A |
33.1917 |
24 |
5 |
B |
|
|
|
B |
33.0292 |
24 |
3 |
|
|
|
|
C |
32.2875 |
24 |
2 |
C |
|
|
|
C |
32.2625 |
24 |
1 |
两因素交叉分组实验的SAS程序比三因素交叉分组实验的SAS程序更简单,在这里不再举例了。
2.5.2 随机化完全区组实验的方差分析
完全随机化实验设计要求实验条件或实验材料必须具同质性,否则,由于实验误差过大,有可能掩盖处理间真正存在的差异。对于一些处理较多的实验,同质性这一要求有时很难满足。为了保证结果的可靠性,于是把全部实验分成若干区组,每一区组内必须保证实验条件或实验材料的同质性,而且必须包含一次全部处理。将完全随机化实验的n次重复变成n个区组。由于设置了区组,从完全随机化实验的误差平方和中分离出区组平方和,从而提高了实验结果的可靠性,这样的实验设计称为随机化完全区组设计(randomized complete block design)。随机化完全区组设计仍属于单因素实验设计。设计区组的目的,是为了从完全随机化实验设计的误差平方和中分离出因区组(非同质性)所产生的平方和。其结果,降低了误差平方和,提高了对处理效应的检验效率。
随机化完全区组实验设计的线性统计模型为:
由于随机化完全区组设计中,处理间的随机化只能在区组内进行,而不能在全部ab次实验间进行,方差分析只能检验处理效应,而不能检验区组效应。因此,统计假设为:
至少有一个
处理一般都属固定型:
而区组则有固定型与随机型之分,当区组为固定型时
当区组为随机型时
由课本9.5.2和9.5.3可以导出各均方期望及检验统计量。当处理为固定型,区组为随机型时,结果如下:
因素 |
|
|
|
均方期望 |
|
|
|
均方期望 |
a |
b |
n |
a |
b |
n |
i |
j |
k |
i |
j |
k |
αi |
0 |
a |
n |
|
0 |
a |
n |
|
δj |
b |
1 |
n |
|
b |
1 |
n |
|
(αδ)ij |
1 |
1 |
n |
|
1 |
1 |
1 |
|
ε(ij)k |
1 |
1 |
1 |
上表中的k代表区组内的重复,因k = 1,这时σ2 无法估计(因误差自由度dfe = 0),在假设区组与主效应间不存在交互作用时,则可得出检验各效应的统计量。
FA = MSA / MSe
对于MS区组的检验应十分慎重。前面已经讲过,由于分阶段随机化,对区组检验缺乏充分的统计学依据。当然,也可以计算出一个F区组,这个值只能提供一个区组间是否存在差异的信息,作为以后设计类似实验时是否设计区组的参考。随机化完全区组实验方差分析的SAS程序,类似于两因素交叉分组实验的SAS程序。
例 2.11 一个采用随机化完全区组设计的品种比较试验,有五个品种参加产量评比,试验共设计了三个区组,结果如CARDS语句所示。
解:方差分析的SAS程序如下:
options linesize = 76;
data wheat;
input block variety yield @@;
cards;
1 |
1 |
18 |
1 |
2 |
36 |
1 |
3 |
31 |
1 |
4 |
21 |
1 |
5 |
30 |
2 |
1 |
23 |
2 |
2 |
30 |
2 |
3 |
34 |
2 |
4 |
18 |
2 |
5 |
30 |
3 |
1 |
22 |
3 |
2 |
30 |
3 |
3 |
34 |
3 |
4 |
18 |
3 |
5 |
42 |
proc anova;
class block variety;
model yield = variety block;
means variety / duncan;
run;
输出结果见表2-15。
表2-15 品种比较试验方差分析的结果
The SAS System
Analysis of Variance Procedure
Class Level Information
Class |
Levels |
Values |
BLOCK |
3 |
1 2 3 |
VARIETY |
5 |
1 2 3 4 5 |
Number of observations in data set = 15
The SAS System
Analysis of Variance Procedure
Dependent Variable: YIELD
|
|
Sum of |
Mean |
|
|
Source |
DF |
Squares |
Square |
F Value |
Pr > F |
|
|
|
|
|
|
Model |
6 |
635.200000 |
105.866667 |
6.46 |
0.0096 |
Error |
8 |
131.200000 |
16.400000 |
|
|
Corrected Total |
14 |
766.400000 |
|
|
|
R-Square |
C.V. |
Root MSE |
YIELD Mean |
0.828810 |
14.56724 |
4.04969 |
27.8000 |
Source |
DF |
Anova SS |
Mean Square |
F Value |
Pr > F |
|
|
|
|
|
|
VARIET |
4 |
620.400000 |
155.100000 |
|