The Randomized Block Design
When introducing ANOVA , we mentioned that this model will allow us to include more than one
categorical factor(explanatory) or confounding variables in the model.
In a first step we will now include a block variable (factor). This is usually considered a variable that
is a confounding variable, i.e. not of interest by itself but has an influence on the response variable
and should for this reason be included.
Sometimes a study is designed to include such a variable in order to reduce the variability in the
response variable and therefore to require a smaller sample size. Generally each treatment is used
exactly once within each block, in conclusion:
if we have k treatments and b block, then the total sample size is n = b · k.
The concept origins from agricultural studies, when studying yields of certain grain, e.g. grain under
different conditions.
Example 1
(Yield and Early Growth Responses to Starter Fertilizer in No-Till Corn Assessed with Precision Agriculture Technologies, Manuel Bermudez and Antonio P. Mallarino (2002)) Several trials were conducted in the 1990’s to evaluate corn yield and early growth responses to starter fertilizer in Iowa farmers’ fields that had 8 to 14 yr of no-till management. Soil series represented in the experimental areas varied across fields and were among typical agricultural soil series of Iowa and neighboring states.
To illustrate, assume three different starters were used. In order to limit the number of field that needed to be planted, 10 locations were chosen, where each field was divided into three parts, then each part was treated with a different starter (randomly assigning the treatment to each part). Beside the fertilizer all other agricultural practises were to be the same.
For this example:
response variable=yield kg/m2
treatment(factor) variable= starter (A, B, C)
block(confounding) variable= location (1,2,3,4,5,6,7,8,9,10)
Example 2
The cutting speeds of four types of tools are being compared in an experiment. Five cutting materials
of varying degree of hardness are to be used as experimental blocks. The data giving the measurement
Block
|
||||||
Treatment
|
1
|
2
|
3
|
4
|
5
|
Treatment Means
|
1
|
12
|
2
|
8
|
1
|
7
|
x̄T1=6
|
2
|
20
|
14
|
17
|
12
|
17
|
x̄T2=16
|
3
|
13
|
7
|
13
|
8
|
14
|
x̄T3=11
|
4
|
11
|
5
|
10
|
3
|
6
|
x̄T4=7
|
Block means
|
x̄B1=14
|
x̄B2=7
|
x̄B3=12
|
x̄B4=6
|
x̄B5=11
|
x̄=10
|
The Randomized Block Design Model
xij= the measurement for treatment i in block j (remember there is precisely one such measurement)
then
xij = µ + αi + βj + eij
where: µ = overall mean
αi = effect of treatment i (difference with µ)
βj = effect of block j (difference with µ)
eij = error in measurement for treatment i and block j.
A positive value for αi indicates that the mean of the response variable is greater than the overall mean for treatment i.
Assume eij ∼ N (0, σ) for all measurements.
In the analysis of variance instead of only explaining the variance through error and treatment, we also include the block as a possible source for variance in the data. For that reason we now also include
Sum of Squares for Block (SSB) with our analysis:
xij = µ + αi + βj + eij
where: µ = overall mean
αi = effect of treatment i (difference with µ)
βj = effect of block j (difference with µ)
eij = error in measurement for treatment i and block j.
A positive value for αi indicates that the mean of the response variable is greater than the overall mean for treatment i.
Assume eij ∼ N (0, σ) for all measurements.
In the analysis of variance instead of only explaining the variance through error and treatment, we also include the block as a possible source for variance in the data. For that reason we now also include
Sum of Squares for Block (SSB) with our analysis:
- Sum of Squares for treatment:
\[SST=\sum\limits_{i=1}^{k}{b(\overline{{{x}_{Ti}}}}-\overline{x}{{)}^{2}},d{{f}_{T}}=k-1\]
- Sum of Squares for block:
\[SSB=\sum\limits_{j=1}^{b}{k(\overline{{{x}_{Bj}}}}-\overline{x}{{)}^{2}},d{{f}_{B}}=b-1\]
- Total Sum of Squares:
\[TotalSS=\sum\limits_{i,j}^{{}}{k({{x}_{ij}}}-\bar{x}{{)}^{2}},d{{f}_{Total}}=n-1\]
- Sum of Squares for error:
SSE = TotalSS − SST − SSB , dfE = n = b − k + 1
ANOVA Table for a
Randomized Block Design
|
||||
Source
|
df
|
SS
|
MS
|
F
|
Treatments
|
k − 1
|
SST
|
MST = SST/(k − 1)
|
MST/MSE
|
Blocks
|
b − 1
|
SSB
|
MSB = SSB/(b − 1)
|
MSB/MSE
|
Error
|
n − k − b + 1
|
SSE
|
MSE = SSE/(n − k − b + 1)
|
|
Total
|
n − 1
|
TotalSS
|
Example 3
Let us find the ANOVA table for the cutting example:- Sum of Squares for treatment:
\[SST=\sum\limits_{i=1}^{k}{b{{({{{\bar{x}}}_{Ti}}-\bar{x})}^{2}}}\]= 5(6−10)2+5(16−10)2+5(11−10)2+5(7−10)2 = 310,dfT = k−1 = 3
- Sum of Squares for block:
\[SSB=\sum\limits_{j=1}^{b}{k{{({{{\bar{x}}}_{Bj}}-\bar{x})}^{2}}}\] = 4(14 − 10)2 + . . . + 4(11 − 10)2 = 184, dfB = b − 1 = 4
- Total Sum of Squares:
\[TotalSS=\sum\limits_{i,j}^{{}}{k{{({{{\bar{x}}}_{ij}}-\bar{x})}^{2}}}\] = (12−10)2+(2−10)2+. . .+(6−10)2 = 518, dfTotal = n−1 = 5(4)−1 = 19
- Sum of Squares for error:
SSE = TotalSS − SST − SSB = 518 − 310 − 184 = 24, dfE = n − b − k + 1 = 12
ANOVA Table for a Randomized Block DesignSourcedfSSMSFTool3310MST = 103.351.7Meterial4184MSB = 4623Error1224MSE = 2Total19TotalSS=518
Based on the statistics in the ANOVA table, we can now test for a treatment effect and for a block
effect.
The ANOVA F-Test(Randomized Block Design)
- The Hypotheses are
H0 : α1 = α2 = . . . = αk = 0 versus
Ha : at least one of the values differs from the others
- Assumption: The population follows a normal distribution with means µ1, µ2, . . . , µk and equal
variance σ2
for all bk combinations of treatments and blocks. The samples are independent
random samples in b independent blocks from each population.
- Test statistic:
F0 =MST/MSE
based on df1 = (k − 1) and df2 = (n − k − b + 1)
- P-value: P(F > F0 ), where F follows an F-distribution with df1 = (k − 1) and df2 = (n − k −
b + 1).
- Decision:
If P-value ≤ α, then reject H0
If P-value> α, then do not reject H0
- Put into context. In order to test for a difference in the blocks,
- The Hypotheses are
H0 : β1 = β2 = . . . = βb = 0 versus
Ha : at least one of the values differs from the others
- Assumption: same as above
- Test statistic:
F0 =MSB/MSE
based on df1 = (b − 1) and df2 = (n − k − b + 1).
- P-value: P(F > F0), where F follows an F-distribution with df1 = (b − 1) and df2 = (n − k −
b + 1).
- Decision: same as above
- Put into context.
Once we determined with the F test that a difference between the treatment means exist, we will
use a multiple comparison analysis, to determine where the differences occur.
Use Bonferroni
The CIs for the pairwise comparisons are the same like for the one-way ANOVA, only change are
the degrees of freedom
H0 : α1 = α2 = . . . = αk = 0 versus
Ha : at least one of the values differs from the others
F0 =MST/MSEbased on df1 = (k − 1) and df2 = (n − k − b + 1)
If P-value ≤ α, then reject H0
If P-value> α, then do not reject H0
- The Hypotheses are
H0 : β1 = β2 = . . . = βb = 0 versus
Ha : at least one of the values differs from the others
- Assumption: same as above
- Test statistic:
F0 =MSB/MSE
based on df1 = (b − 1) and df2 = (n − k − b + 1).
- P-value: P(F > F0), where F follows an F-distribution with df1 = (b − 1) and df2 = (n − k − b + 1).
- Decision: same as above
- Put into context.
Let γ = C1µT1+C2µT2+. . .+CkµTk a contrast with sample contrast c = C1xT1+C2xT2+. . .+Ck ̄xTk (1 − α)100% Confidence Interval for γ
c ± tdf1−α/2SE(c)
tdf1−α/2 is the 1 − α/2 percentile of the t-distribution with df = n − k − b + 1.
Continue Example
We test that the mean cutting speed is independent from the tool used, that is- The Hypotheses are
H0 : α1 = α2 = α3 = α4 = 0 versus
Ha : at least one of the values differs from the others α = 0.05
- Assumption: The cutting speed follows a normal distribution and equal variance σ2
for all bk combinations of treatments and blocks.
It seems to be reasonable that the cutting speed follows a normal distribution under for a given tool cutting a certain material.
It also seems to be reasonable to assume that the standard deviations are the same, the variation will be caused by the same factors.
The samples were independent cuts, so the samples are independent. - Test statistic:
F0 =MST/MSE = 51.7
based on df1 = 3 and df2 = 20 − 4 − 5 + 1 = 12.
- P-value: P(F > F0) < 0.005, using table IX (since 51.7>7.23).
- Decision: Since P-value< α = 0.05 reject H0.
- The mean cutting speeds are not the same for the four tools
ME = t120.005 * √(MSE * √(1/b + 1//b) = 3.055 * √2 * 2/5 = 2.73
Now rank the sample means and underline those, which differ less than 2.73
means 6 7 11 16
tools T1 T4 T2 T3
With an experiment wise error rate of 0.05 the data provide sufficient evidence that the mean cutting speed for tools 1 and 4 are significantly different from the mean cutting speed for tool 2 as well as tool 3, further the mean cutting speed for tools 2 and 3 are significantly different. Mean cutting speeds for tools 1 and 4 are not significantly different.
Also, the mean cutting speeds for tools 1 and 4 are significantly shorter than for the two other tools.
A test for a difference in the means of the blocks is of no interest, because the materials tested are not relevant, they were only helpful for testing the tools for a variety of materials.
No comments:
Post a Comment