Twoway Analysis of Variance
© 1998 by Dr. Thomas W. MacFarland -- All Rights Reserved
************
two_anov.doc
************
Background: Analysis of Variance (ANOVA) methodology is
quite effective in determining if two or more
group means differ due to chance, or if observed
differences are indeed the result of true
difference between phenomena. As useful as
it may be to determine singular differences
between multiple groups:
-- ANOVA analysis is not limited only to
studies involving one single variable.
-- On the contrary, ANOVA can be used to examine
differences with two or more factors (i.e.,
independent variables) at the same time.
A common use of ANOVA methodology is to use a
Twoway ANOVA statistical test to determine
differences (and possible interactions) when
variables have two or more categories. When
Twoway ANOVA is used, it is possible to determine
if:
1. Is there a difference because of variables
acting independently of each other?
2. Is there a difference because of joint
effects (i.e., interaction)?
Twoway ANOVA designs can become quite complex, not
only to effect but also to interpret. Yet, this
highly useful methodology should not be avoided
merely because it is not "user friendly." On the
contrary, Twoway ANOVA should be used perhaps more
than it it is, due to the advantage of greater
utilization of resources while modeling real-world
scenarios.
Twoway ANOVA designs are often presented in a
manner similar to other factorial analyses, such
as the Chi-square analysis. Like the chi-square
analysis, Twoway ANOVA uses a factorial organization
with data placed in cells. The information within
each cell provides the necessary data for later
analysis.
A graphic representation of a factorial design is
presented below in Figure 1. When reviewing this
representation, be sure to recall that interval or
ratio data are used with a Twoway ANOVA design, as
opposed to the use of nominal data with a chi-square
analysis, which is also organized along the format
of a factorial design.
Variable A
Category 1 Category 2
______________________________________
| | |
| | |
Category 1 | n1, n2, ... | n1, n2, ... |
| | |
| | |
Variable B |------------------|-----------------|
| | |
| | |
Category 2 | n1, n2, ... | n1, n2, ... |
| | |
|__________________|_________________|
Figure 1
Comparative Study of Two Variables,
(Variable A and Variable B), with Two
Categories (Category 1 and Category 2)
for each Variable
Thus, when using a Twoway ANOVA, be sure to remember
that it is possible to examine three separate
hypotheses:
1. if means for Variable A are equal to the
population
2. if means for Variable B are equal to the
population
3. if there is interaction between Variable A
and Variable B
As such, Twoway ANOVA are often used to help
explain "real-world" scenarios, where interaction
is often found. This more complex design is
different from simplistic designs that can only
explain scenarios designed for simplistic modeling.
The decision to use a Twoway ANOVA is the decision
to see if complex issues can be understood, and
possibly acted upon.
Scenario: This study examines if there are differences in
final examination test scores (the dependent
variable) for students in a university senior-
level software engineering course on two separate
factors:
-- The first factor addresses the method
of instruction, with:
-- The first group of students was taught
by traditional lecture (Method Code
= 1).
-- The second group of students was taught
by Computer Based Training (Method Code
= 2).
-- The third group of students was taught
by the use of instructional videotapes
(Method Code = 3).
-- The fourth group of students was
enrolled through independent study
(Method Code = 4).
-- The second factor addresses each student's
possible prior graduation from a community
college:
-- Some students in the senior-level course
had previously attended and graduated
from a community college (Grd_CC Code
= 1).
-- Other students in the senior-level course
did not graduate from a community college
(Grd_CC Code = 2).
This coding scheme (Grd_CC Code = 2) is
discrete and therefore includes students who
may have attended a community college but
did not complete the full curriculum needed
to receive an associate's degree.
Students were all from a university senior-
level software engineering course who were
assigned, through random selection, to
placement into one of four groups: instruction
by traditional lecture, instruction by CBT
(Computer Based Training), instruction by
the use of instructional videotapes, and
independent study. The teacher worked with
the registrat's office to obtain information
about prior graduation from a community
college.
The teacher was confident that final examination
scores represented interval data (i.e., the data
are parametric, with the difference between "89"
and "90" equal to the difference between "75"
and "76"). The teacher also wanted to learn
more about the effects of teaching method,
prior graduation from a community college, and
the possible effect of interaction between these
two factors on final examination scores. As
such, Twoway ANOVA (Analysis of Variance) was
correctly judged to be the appropriate test for
this analysis.
Data from this study are summarized in Table 1.
Table 1
Final Examination Test Scores in a Senior-Level
Software Engineering Course by Instructional
Method (Traditional Lecture, Computer Based
Training, Instructional Videotape, and Independent
Study) and by Prior Graduation from a Community
College
====================================================
Instructional
Method
=============
1 = Lecture CC Graduate
2 = CBT ===========
3 = Video 1 = Yes
Student Number 4 = IDS 2 = No Final Score
----------------------------------------------------
01 1 1 089
02 1 1 081
03 1 2 073
04 1 1 084
05 1 2 070
06 1 2 056
07 1 1 070
08 1 2 081
09 1 2 078
10 1 1 069
11 1 1 089
12 1 2 088
13 1 2 045
14 1 2 083
15 1 1 095
16 1 2 077
17 1 1 069
18 1 1 080
19 2 2 093
20 2 1 086
21 2 1 089
22 2 2 095
23 2 2 089
24 2 1 088
25 2 1 098
26 2 1 089
27 2 2 094
28 2 1 095
29 2 2 095
30 2 2 098
31 2 2 087
32 2 2 085
33 2 1 098
34 2 1 093
35 2 2 087
36 2 1 095
37 2 1 093
38 2 2 093
39 3 2 095
40 3 1 096
41 3 2 083
42 3 2 089
43 3 1 088
44 3 1 087
45 3 1 094
46 3 2 097
47 3 1 095
48 3 2 093
49 3 2 085
50 3 2 095
51 3 1 092
52 3 2 082
53 3 1 086
54 3 1 087
55 3 2 089
56 3 2 097
57 3 1 100
58 3 2 093
59 3 1 096
60 4 2 084
61 4 1 085
62 4 2 073
63 4 1 092
64 4 2 057
65 4 1 063
66 4 1 069
67 4 2 073
68 4 2 091
69 4 1 065
70 4 1 074
71 4 2 071
72 4 2 068
73 4 2 062
74 4 1 056
75 4 1 085
----------------------------------------------------
Note. Notice how the N (i.e., number of subjects or
group members) for each instructional group
does not have to be equal.
Ho: Null Hypothesis: There is no difference between
instructional method (instruction by traditional
lecture, instruction by Computer Based Training,
instruction by the use of instructional videotapes,
and independent study) and graduation status from
a community college (either graduated from a
community college or did not graduate from a
community college) regarding final examination test
scores of students enrolled in a university senior-
level software engineering course (p <= .05).
Files: 1. two_anov.doc
2. two_anov.dat
3. two_anov.r01
4. two_anov.o01
5. two_anov.con
6. two_anov.lis
Command: At the Unix prompt (%), key:
%spss -m < two_anov.r01 > two_anov.o01
************
two_anov.dat
************
01 1 1 089
02 1 1 081
03 1 2 073
04 1 1 084
05 1 2 070
06 1 2 056
07 1 1 070
08 1 2 081
09 1 2 078
10 1 1 069
11 1 1 089
12 1 2 088
13 1 2 045
14 1 2 083
15 1 1 095
16 1 2 077
17 1 1 069
18 1 1 080
19 2 2 093
20 2 1 086
21 2 1 089
22 2 2 095
23 2 2 089
24 2 1 088
25 2 1 098
26 2 1 089
27 2 2 094
28 2 1 095
29 2 2 095
30 2 2 098
31 2 2 087
32 2 2 085
33 2 1 098
34 2 1 093
35 2 2 087
36 2 1 095
37 2 1 093
38 2 2 093
39 3 2 095
40 3 1 096
41 3 2 083
42 3 2 089
43 3 1 088
44 3 1 087
45 3 1 094
46 3 2 097
47 3 1 095
48 3 2 093
49 3 2 085
50 3 2 095
51 3 1 092
52 3 2 082
53 3 1 086
54 3 1 087
55 3 2 089
56 3 2 097
57 3 1 100
58 3 2 093
59 3 1 096
60 4 2 084
61 4 1 085
62 4 2 073
63 4 1 092
64 4 2 057
65 4 1 063
66 4 1 069
67 4 2 073
68 4 2 091
69 4 1 065
70 4 1 074
71 4 2 071
72 4 2 068
73 4 2 062
74 4 1 056
75 4 1 085
************
two_anov.r01
************
SET WIDTH = 80
SET LENGTH = NONE
SET CASE = UPLOW
SET HEADER = NO
TITLE = Twoway Analysis of Variance (TWOWAY ANOVA)
COMMENT = This file examines if there are differences
in final examination test scores (the
dependent variable) for students in a
university senior-level software engineering
course on two separate factors:
-- The first factor addresses the method
of instruction, with:
-- the first group of students was taught
by traditional lecture (Method Code
= 1).
-- the second group of students was taught
by Computer Based Training (Method Code
= 2).
-- the third group of students was taught
by the use of instructional videotapes
(Method Code = 3).
-- the fourth group of students was
enrolled through independent study
(Method Code = 4).
-- The second factor addresses each student's
possible prior graduation from a community
college:
-- Some students in the senior-level course
had previously attended and graduated
from a community college (Grd_CC Code
= 1).
-- Other students in the senior-level course
did not graduate from a community college
(Grd_CC Code = 2), which includes students
who may have attended a community college
but did not complete the full curriculum
needed to receive an associate's degree.
Students were all from a university senior-
level software engineering course who were
assigned, through random selection, to
placement into one of four groups: instruction
by traditional lecture, instruction by CBT
(Computer Based Training), instruction by
the use of instructional videotapes, and
independent study. The teacher worked with
the registrat's office to obtain information
about prior graduation from a community
college.
The teacher was confident that final examination
scores represented interval data (i.e., the data
are parametric, with the difference between "89"
and "90" equal to the difference between "75"
and "76"). The teacher also wanted to learn
more about the effects of teaching method,
prior graduation from a community college, and
the possible effect of interaction between these
two factors on final examination scores. As
such, Twoway ANOVA (Analysis of Variance) was
correctly judged to be the appropriate test for
this analysis.
DATA LIST FILE = 'two_anov.dat' FIXED
/ Stu_Code 20-21
Method 35
Grd_CC 45
Score 58-60
Variable Labels
Stu_Code "Student Code"
/ Method "Method: Lecture, CBT, Video, IDS"
/ Grd_CC "Graduated from Community College: Y or N"
/ Score "Final Examination Score"
Value Labels
Method 1 'Lecture: Traditional Lecture'
2 'CBT: Computer-Based Training'
3 'Video: Instructional Videotape'
4 'IDS: Independent Study'
/ Grd_CC 1 'Grd_Yes: Graduated from a CC'
2 'Grd_No : Did NOT Graduate from a CC'
ANOVA Score BY Method(1,4) Grd_CC (1,2)
/ STATISTICS = ALL
/ FORMAT = LABELS
COMMENT = Please note in this analysis how I need to
identify which methods (1, 2, 3, and 4)
and community college status (1 and 2) to
analyze.
************
two_anov.o01
************
1 SET WIDTH = 80
2 SET LENGTH = NONE
3 SET CASE = UPLOW
4 SET HEADER = NO
5 TITLE = Twoway Analysis of Variance (TWOWAY ANOVA)
6 COMMENT = This file examines if there are differences
7 in final examination test scores (the
8 dependent variable) for students in a
9 university senior-level software engineering
10 course on two separate factors:
11
12 -- The first factor addresses the method
13 of instruction, with:
14
15 -- the first group of students was taught
16 by traditional lecture (Method Code
17 = 1).
18
19 -- the second group of students was taught
20 by Computer Based Training (Method Code
21 = 2).
22
23 -- the third group of students was taught
24 by the use of instructional videotapes
25 (Method Code = 3).
26
27 -- the fourth group of students was
28 enrolled through independent study
29 (Method Code = 4).
30
31 -- The second factor addresses each student's
32 possible prior graduation from a community
33 college:
34
35 -- Some students in the senior-level course
36 had previously attended and graduated
37 from a community college (Grd_CC Code
38 = 1).
39
40 -- Other students in the senior-level course
41 did not graduate from a community college
42 (Grd_CC Code = 2), which includes students
43 who may have attended a community college
44 but did not complete the full curriculum
45 needed to receive an associate's degree.
46
47
48 Students were all from a university senior-
49 level software engineering course who were
50 assigned, through random selection, to
51 placement into one of four groups: instruction
52 by traditional lecture, instruction by CBT
53 (Computer Based Training), instruction by
54 the use of instructional videotapes, and
55 independent study. The teacher worked with
56 the registrat's office to obtain information
57 about prior graduation from a community
58 college.
59
60 The teacher was confident that final examination
61 scores represented interval data (i.e., the data
62 are parametric, with the difference between "89"
63 and "90" equal to the difference between "75"
64 and "76"). The teacher also wanted to learn
65 more about the effects of teaching method,
66 prior graduation from a community college, and
67 the possible effect of interaction between these
68 two factors on final examination scores. As
69 such, Twoway ANOVA (Analysis of Variance) was
70 correctly judged to be the appropriate test for
71 this analysis.
72 DATA LIST FILE = 'two_anov.dat' FIXED
73 / Stu_Code 20-21
74 Method 35
75 Grd_CC 45
76 Score 58-60
77
This command will read 1 records from two_anov.dat
Variable Rec Start End Format
STU_CODE 1 20 21 F2.0
METHOD 1 35 35 F1.0
GRD_CC 1 45 45 F1.0
SCORE 1 58 60 F3.0
78 Variable Labels
79 Stu_Code "Student Code"
80 / Method "Method: Lecture, CBT, Video, IDS"
81 / Grd_CC "Graduated from Community College: Y or N"
82 / Score "Final Examination Score"
83
84 Value Labels
85 Method 1 'Lecture: Traditional Lecture'
86 2 'CBT: Computer-Based Training'
87 3 'Video: Instructional Videotape'
88 4 'IDS: Independent Study'
89
90 / Grd_CC 1 'Grd_Yes: Graduated from a CC'
91 2 'Grd_No : Did NOT Graduate from a CC'
92
93 ANOVA Score BY Method(1,4) Grd_CC (1,2)
94 / STATISTICS = ALL
95 / FORMAT = LABELS
96 COMMENT = Please note in this analysis how I need to
97 identify which methods (1, 2, 3, and 4)
98 and community college status (1 and 2) to
99 analyze.
100
* * * A N A L Y S I S O F V A R I A N C E * * *
SCORE Final Examination Score
by METHOD Method: Lecture, CBT, Video, IDS
GRD_CC Graduated from Community College: Y or N
UNIQUE sums of squares
All effects entered simultaneously
Sum of Mean Sig
Source of Variation Squares DF Square F of F
Main Effects 5521.491 4 1380.373 18.308 .000
METHOD 5379.834 3 1793.278 23.784 .000
GRD_CC 160.120 1 160.120 2.124 .150
2-Way Interactions 177.981 3 59.327 .787 .505
METHOD GRD_CC 177.981 3 59.327 .787 .505
Explained 5704.155 7 814.879 10.808 .000
Residual 5051.632 67 75.397
Total 10755.787 74 145.348
75 cases were processed.
0 cases (.0 pct) were missing.
************
two_anov.con
************
Outcome: The SPSS Analysis of Variance output file for a
Twoway ANOVA is rather complex and at first it
may appear somewhat difficult to interpret.
Look at the following edited section of the output
file to see just what you need to examine to
determine if the variables differ from sample
means and if there is any interaction between the
variables:
Sig
Source of Variation F of F
Main Effects
METHOD 23.784 .000
GRD_CC 2.124 .150
2-Way Interactions
METHOD GRD_CC .787 .505
Although you could compare the calculated F
statistics to criterion F statistics, it is
usually easier and just as informative to use
the probability (i.e., Significance of F) values
to determine if differences exist. In this
example:
METHOD (Instructional Method)
-- The calculated Method p value is .000.
-- The delcared Method p value is .05.
The calculated p value is less than the declared p
value and there is, accordingly, a difference in
final examination scores based on instructional
method.
GRD_CC (Graduated from a Community College)
-- The calculated Method p value is .150.
-- The delcared Method p value is .05.
The calculated p value exceeds the declared p
value value and there is, accordingly, no
difference in final examination scores based on
prior graduation from a community college.
METHOD by GRD_CC (2-Way Interaction)
-- The calculated Method p value is .505.
-- The delcared Method p value is .05.
The calculated p value exceeds the declared p
value value and there is, accordingly, no
interaction between instructional method and
prior graduation from a community college.
************
two_anov.lis
************
% minitab
MTB > outfile 'two_anov.lis'
Collecting Minitab session in file: two_anov.lis
MTB > # MINITAB Addendum to 'two_anov.dat'
MTB > #
MTB > read 'two_anov.dat' c1 c2 c3 c4
Entering data from file: two_anov.dat
75 rows read.
MTB > print c1 c2 c3 c4
ROW C1 C2 C3 C4
1 1 1 1 89
2 2 1 1 81
3 3 1 2 73
4 4 1 1 84
5 5 1 2 70
6 6 1 2 56
7 7 1 1 70
8 8 1 2 81
9 9 1 2 78
10 10 1 1 69
11 11 1 1 89
12 12 1 2 88
13 13 1 2 45
14 14 1 2 83
15 15 1 1 95
16 16 1 2 77
17 17 1 1 69
18 18 1 1 80
19 19 2 2 93
20 20 2 1 86
21 21 2 1 89
Continue? y
22 22 2 2 95
23 23 2 2 89
24 24 2 1 88
25 25 2 1 98
26 26 2 1 89
27 27 2 2 94
28 28 2 1 95
29 29 2 2 95
30 30 2 2 98
31 31 2 2 87
32 32 2 2 85
33 33 2 1 98
34 34 2 1 93
35 35 2 2 87
36 36 2 1 95
37 37 2 1 93
38 38 2 2 93
39 39 3 2 95
40 40 3 1 96
41 41 3 2 83
42 42 3 2 89
43 43 3 1 88
44 44 3 1 87
Continue? y
45 45 3 1 94
46 46 3 2 97
47 47 3 1 95
48 48 3 2 93
49 49 3 2 85
50 50 3 2 95
51 51 3 1 92
52 52 3 2 82
53 53 3 1 86
54 54 3 1 87
55 55 3 2 89
56 56 3 2 97
57 57 3 1 100
58 58 3 2 93
59 59 3 1 96
60 60 4 2 84
61 61 4 1 85
62 62 4 2 73
63 63 4 1 92
64 64 4 2 57
65 65 4 1 63
66 66 4 1 69
67 67 4 2 73
Continue? y
68 68 4 2 91
69 69 4 1 65
70 70 4 1 74
71 71 4 2 71
72 72 4 2 68
73 73 4 2 62
74 74 4 1 56
75 75 4 1 85
MTB > # Before I attempt a TWOWAY ANOVA on this data set,
MTB > # I first need to determine if the design is balanced
MTB > # or unbalanced.
MTB > #
MTB > # Look carefully at the various groups and you will see
MTB > # that the number of students in each teaching method
MTB > # group is not consistent. Because the numbers are
MTB > # not consistent, this design is unbalanced. The same
MTB > # issue applies to the number of students with a prior
MTB > # community college associate's degree.
MTB > #
MTB > # I will use MINITAB's help command to determine the
MTB > # proper command for a TWOWAY ANOVA on an unbalanced
MTB > # design.
MTB > help
* You are using MINITAB Statistical Software, Standard Version *
To see: Type:
----------------------------- ---------------------------------
A list of all command topics HELP COMMANDS
A list of all overview topics HELP OVERVIEW
Information on a command HELP commandname [subcommandname]
----------------------------- ---------------------------------
For example: HELP COMMANDS
HELP PLOT
HELP PLOT TITLE
To leave Minitab, type STOP.
MTB > help commands
To list the Minitab commands for any category below, type
HELP COMMANDS followed by the appropriate number. For example,
to list available regression commands, type: HELP COMMANDS 7.
1 General Information 10 Nonparametrics
2 Files, Data, and Printing 11 Tables
3 Editing and Manipulating Data 12 Times Series
4 Arithmetic 13 Statistical Process Control
5 Plotting Data 14 Exploratory Data Analysis
6 Basic Statistics 15 Distributions and Random Data
7 Regression 16 Matrices
8 Analysis of Variance 17 Miscellaneous Features
9 Multivariate Analysis 18 Macros
* * * Enhanced Version * * *
19 Professional Graphics
20 Enhanced Statistical Process Control
21 Graphical Options for Control Charts
22 Analysis of Means
23 Design and Analysis of Experiments
MTB > help commands 8
Analysis of Variance
AOVONEWAY.....does one way analysis of variance, with each
group in separate columns
ONEWAYAOV.....does one way analysis of variance, with the
response in one column, subscripts in another
TWOWAYAOV.....does balanced two way analysis of variance
ANOVA.........does univariate and multivariate analysis of
variance with balanced designs
ANCOVA........analyzes orthogonal designs (including latin
squares and crossover designs) with crossed
and nested factors and additive covariates
GLM...........does univariate and multivariate analysis
of variance with balanced and unbalanced
designs, analysis of covariance, and regression
NESTED........experimental command that analyzes fully nested
(hierarchical) designs
INDICATOR.....creates indicator or dummy variables
MTB > # And I see that glm is used when the design is unbalanced.
MTB > #
MTB > # FYI ... the two options here are:
MTB > #
MTB > # -- for a balanced design anova c4 = c2 | c3
MTB > # -- for an unbalanced design glm c4 = c2 | c3
MTB > #
MTB > # I recommend that you stay away from the standard command of
MTB > # twoway c4 c2 c3 when dealing with a balanced design.
MTB > #
MTB > # If you use this command, you will need to do manual
MTB > # calculations to obtain F values.
MTB > #
MTB > # I will now use glm on this unbalanced design.
MTB > #
MTB > glm c4 = c2 | c3
Factor Levels Values
C2 4 1 2 3 4
C3 2 1 2
Analysis of Variance for C4
Source DF Seq SS Adj SS Adj MS F P
C2 3 5372.33 5379.83 1793.28 23.78 0.000
C3 1 153.84 160.12 160.12 2.12 0.150
C2*C3 3 177.98 177.98 59.33 0.79 0.505
Error 67 5051.63 5051.63 75.40
Total 74 10755.79
Unusual Observations for C4
Obs. C4 Fit Stdev.Fit Residual St.Resid
13 45.000 72.333 2.894 -27.333 -3.34R
63 92.000 73.625 3.070 18.375 2.26R
68 91.000 72.375 3.070 18.625 2.29R
74 56.000 73.625 3.070 -17.625 -2.17R
Continue? y
R denotes an obs. with a large st. resid.
MTB > # And you will notice that the significance of F (or
MTB > # the p values in MINITAB's printout) are the same
MTB > # as what you previously saw in the SPSS printout:
MTB > #
MTB > # -- Method p = .000
MTB > # -- Grd_CC p = .150
MTB > # -- 2-Way Interaction (Method * Grd_CC) p = .505
MTB > #
MTB > # Let me demonstrate the use of the twoway command on
MTB > # this unbalanced design, just to show how the analysis
MTB > # will not continue.
MTB > #
MTB > anova c4 = c2 | c3
* ERROR * Unequal cell counts.
MTB > stop
--------------------------
Disclaimer: All care was used to prepare the information in this
tutorial. Even so, the author does not and cannot guarantee the
accuracy of this information. The author disclaims any and all
injury that may come about from the use of this tutorial. As
always, students and all others should check with their advisor(s)
and/or other appropriate professionals for any and all assistance
on research design, analysis, selected levels of significance, and
interpretation of output file(s).
The author is entitled to exclusive distribution of this tutorial.
Readers have permission to print this tutorial for individual use,
provided that the copyright statement appears and that there is no
redistribution of this tutorial without permission.
Prepared 980316
Revised 980914
end-of-file 'two_anov.ssi'