Student's t-Test for Independent Samples
© 1998 by Dr. Thomas W. MacFarland -- All Rights Reserved
*************
student_t.doc
*************
Background: Student's t-test is a very common (and possibly
overused) test for determining differences
between two groups. The t-test (developed in
1915 by Gosset for the Guinness Breweries of
Dublin) is the appropriate test for small samples,
as opposed to samples with greater than 30 or
more observations. And recall:
-- Student's t-test is still the appropriate test
with greater than 30 observations.
-- With n => 30 observations, t approximates z.
When using Student's t-test to determine if
the difference between two groups is indeed
a true difference, or if the difference between
the two groups is due only to chance:
-- Both groups should approximate normal
distribution.
-- Random selection should ideally be used
for all members of each of the two groups.
Scenario: This study examines if there are differences
in final examination test scores between two
groups of students in a data structures programming
course:
-- One group of students was taught by Computer
Based Training.
-- The other group of students was taught by
traditional lecture.
Students were all from a university freshman-
level data structures course (using C++ as the
programming language platform) who were assigned,
through random selection, to placement into one
of two groups: instruction by CBT (Computer
Based Training) vs. instruction by traditional
lecture.
Because the teacher was confident that final
examination scores represented interval data
( i.e., the data are parametric, with the
difference between "89" and "90" equal to the
difference between "75" and "76"), Student's
t-Test for Independent Samples was correctly
judged to be the appropriate test for this
analysis of differences between two groups.
Test scores for both groups of students are
summarized in Table 1.
Table 1
Final Examination Test Scores in a Data Structures
Course by Student Group: Students Taught by Computer
Based Training and Students Taught by Traditional
Lecture
====================================================
Teaching Method
===============
1 = CBT
Student Number 2 = Traditional Final Score
----------------------------------------------------
01 1 089
02 1 081
03 1 092
04 1 094
05 1 074
06 1 056
07 1 077
08 1 085
09 1 078
10 1 069
11 1 089
12 1 088
13 1 045
14 1 083
15 1 095
16 2 091
17 2 057
18 2 089
19 2 083
20 2 080
21 2 083
22 2 091
23 2 084
24 2 084
25 2 094
26 2 096
27 2 088
28 2 097
29 2 091
30 2 094
----------------------------------------------------
Ho: Null Hypothesis: There is no difference in final
examination test scores between students in a data
structures course taught by Computer Based Training
and their counterparts who were taught by the use
of traditional lecture (p <= .05).
Files: 1. studen_t.doc
2. studen_t.dat
3. studen_t.r01
4. studen_t.o01
5. studen_t.con
6. studen_t.lis
Command: At the Unix prompt (%), key:
%spss -m < studen_t.r01 > studen_t.o01
************
studen_t.dat
************
01 1 089
02 1 081
03 1 092
04 1 094
05 1 074
06 1 056
07 1 077
08 1 085
09 1 078
10 1 069
11 1 089
12 1 088
13 1 045
14 1 083
15 1 095
16 2 091
17 2 057
18 2 089
19 2 083
20 2 080
21 2 083
22 2 091
23 2 084
24 2 084
25 2 094
26 2 096
27 2 088
28 2 097
29 2 091
30 2 094
************
studen_t.r01
************
SET WIDTH = 80
SET LENGTH = NONE
SET CASE = UPLOW
SET HEADER = NO
TITLE = Student's t-Test for Independent Samples
COMMENT = This file examines if there are differences
in final examination test scores between two
groups of students in a data structures
programming course: one group of students
was taught by Computer Based Training and the
other group of students was taught by
traditional lecture.
Students were all from a university freshman-
level data structures course (using C++ as the
programming language platform) who were assigned,
through random selection, to placement into one
of two groups: instruction by CBT (Computer
Based Training) vs. instruction by traditional
lecture. Because the teacher was confident that
final examination scores represented interval
data ( i.e., the data are parametric, with the
difference between "89" and "90" equal to the
difference between "75" and "76"), Student's
t-Test for Independent Samples was correctly
judged to be the appropriate test for this
analysis of differences between two groups.
DATA LIST FILE = 'studen_t.dat' FIXED
/ Stu_Code 20-21
Method 35
Score 50-52
Variable Labels
Stu_Code "Student Code"
/ Method "Method: CBT vs. Lecture"
/ Score "Final Examination Score"
Value Labels
Method 1 'CBT: Computer Based Training'
2 'Traditional Lecture'
T-TEST GROUPS = Method(1,2)
/ VARIABLES = Score
************
studen_t.o01
************
1 SET WIDTH = 80
2 SET LENGTH = NONE
3 SET CASE = UPLOW
4 SET HEADER = NO
5 TITLE = Student's t-Test for Independent Samples
6 COMMENT = This file examines if there are differences
7 in final examination test scores between two
8 groups of students in a data structures
9 programming course: one group of students
10 was taught by Computer Based Training and the
11 other group of students was taught by
12 traditional lecture.
13
14 Students were all from a university freshman-
15 level data structures course (using C++ as the
16 programming language platform) who were assigned,
17 through random selection, to placement into one
18 of two groups: instruction by CBT (Computer
19 Based Training) vs. instruction by traditional
20 lecture. Because the teacher was confident that
21 final examination scores represented interval
22 data ( i.e., the data are parametric, with the
23 difference between "89" and "90" equal to the
24 difference between "75" and "76"), Student's
25 t-Test for Independent Samples was correctly
26 judged to be the appropriate test for this
27 analysis of differences between two groups.
28 DATA LIST FILE = 'studen_t.dat' FIXED
29 / Stu_Code 20-21
30 Method 35
31 Score 50-52
32
This command will read 1 records from studen_t.dat
Variable Rec Start End Format
STU_CODE 1 20 21 F2.0
METHOD 1 35 35 F1.0
SCORE 1 50 52 F3.0
33 Variable Labels
34 Stu_Code "Student Code"
35 / Method "Method: CBT vs. Lecture"
36 / Score "Final Examination Score"
37
38 Value Labels
39 Method 1 'CBT: Computer Based Training'
40 2 'Traditional Lecture'
41
42 T-TEST GROUPS = Method(1,2)
43 / VARIABLES = Score
T-TEST requires 72 bytes of workspace for execution.
t-tests for Independent Samples of METHOD Method: CBT vs. Lecture
Number
Variable of Cases Mean SD SE of Mean
-----------------------------------------------------------------------
SCORE Final Examination Score
CBT: Computer Base 15 79.6667 14.130 3.648
Traditional Lecture 15 86.8000 9.748 2.517
-----------------------------------------------------------------------
Mean Difference = -7.1333
Levene's Test for Equality of Variances: F= 1.768 P= .194
t-test for Equality of Means 95%
Variances t-value df 2-Tail Sig SE of Diff CI for
Diff
-------------------------------------------------------------------------------
Equal -1.61 28 .119 4.432 (-16.213,1.946)
Unequal -1.61 24.87 .120 4.432 (-16.265,1.998)
-------------------------------------------------------------------------------
************
studen_t.con
************
Outcome: Computed t = | - 1.61 |
Criterion t = + or - 2.05 (alpha = .05, df = 28)
Computed t |-1.61| < Criterion t |-2.05|
Note. The | and | characters are used to indicate
absolute value.
Therefore, the null hypothesis is accepted and it can
be claimed that there is no difference (p <= .05) in
final examination test scores between students in a
data structures course taught by Computer Based
Training and their counterparts who were taught by
the use of traditional lecture. Any difference
between the two groups is due only to chance.
The p value is another way to view differences in
the three graded activities:
-- The calculated p value is .119.
-- The delcared p value is .05.
The calculated p value exceeds the declared p value
and there is, accordingly, no difference between
the two groups in terms of scores on the final
examination. At p <= .05 any differences in test
scores that exist are due only to chance.
************
studen_t.lis
************
% minitab
MTB > outfile 'studen_t.lis'
Collecting Minitab session in file: studen_t.lis
MTB > # MINITAB addendum to studen_t.dat
MTB > read 'studen_t.dat' c1 c2 c3
Entering data from file: studen_t.dat
30 rows read.
MTB > name c1 'Stu_Code' c2 'Method' c3 'Score'
MTB > print 'Stu_Code' 'Method' 'Score'
ROW Stu_Code Method Score
1 1 1 89
2 2 1 81
3 3 1 92
4 4 1 94
5 5 1 74
6 6 1 56
7 7 1 77
8 8 1 85
9 9 1 78
10 10 1 69
11 11 1 89
12 12 1 88
13 13 1 45
14 14 1 83
15 15 1 95
16 16 2 91
17 17 2 57
18 18 2 89
Continue? y
19 19 2 83
20 20 2 80
21 21 2 83
22 22 2 91
23 23 2 84
24 24 2 84
25 25 2 94
26 26 2 96
27 27 2 88
28 28 2 97
29 29 2 91
30 30 2 94
MTB > # With MINITAB, it is possible to conduct Student's t-Test
MTB > # with stacked and unstacked data.
MTB > #
MTB > # I will unstack the data in c3 and then conduct the
MTB > # t-Test using both methods.
MTB > #
MTB > unstack (c2-c3) into (c5-c6) (c7-c8);
SUBC> subscripts c2.
MTB > print c1-c8
ROW Stu_Code Method Score C5 C6 C7 C8
1 1 1 89 1 89 2 91
2 2 1 81 1 81 2 57
3 3 1 92 1 92 2 89
4 4 1 94 1 94 2 83
5 5 1 74 1 74 2 80
6 6 1 56 1 56 2 83
7 7 1 77 1 77 2 91
8 8 1 85 1 85 2 84
9 9 1 78 1 78 2 84
10 10 1 69 1 69 2 94
11 11 1 89 1 89 2 96
12 12 1 88 1 88 2 88
13 13 1 45 1 45 2 97
14 14 1 83 1 83 2 91
15 15 1 95 1 95 2 94
16 16 2 91
17 17 2 57
18 18 2 89
Continue? y
19 19 2 83
20 20 2 80
21 21 2 83
22 22 2 91
23 23 2 84
24 24 2 84
25 25 2 94
26 26 2 96
27 27 2 88
28 28 2 97
29 29 2 91
30 30 2 94
* NOTE * One or more variables are undefined.
MTB > histogram c6
Histogram of C6 N = 15
Midpoint Count
45 1 *
50 0
55 1 *
60 0
65 0
70 1 *
75 2 **
80 2 **
85 2 **
90 4 ****
95 2 **
MTB > histogram c8
Histogram of C8 N = 15
Midpoint Count
55 1 *
60 0
65 0
70 0
75 0
80 1 *
85 4 ****
90 5 *****
95 4 ****
MTB > describe c6 c8
N MEAN MEDIAN TRMEAN STDEV SEMEAN
C6 15 79.67 83.00 81.15 14.13 3.65
C8 15 86.80 89.00 88.31 9.75 2.52
MIN MAX Q1 Q3
C6 45.00 95.00 74.00 89.00
C8 57.00 97.00 83.00 94.00
MTB > #
MTB > # And now notice how I conduct the t-Test on stacked data.
MTB > #
MTB > twot data in c3 groups in c2
TWOSAMPLE T FOR Score
Method N MEAN STDEV SE MEAN
1 15 79.7 14.1 3.6
2 15 86.80 9.75 2.5
95 PCT CI FOR MU 1 - MU 2: ( -16.3, 2.0)
TTEST MU 1 = MU 2 (VS NE): T= -1.61 P=0.12 DF= 24
MTB > #
MTB > # And now notice how I conduct the t-Test on unstacked data.
MTB > #
MTB > twosamplet c6 c8
TWOSAMPLE T FOR C6 VS C8
N MEAN STDEV SE MEAN
C6 15 79.7 14.1 3.6
C8 15 86.80 9.75 2.5
95 PCT CI FOR MU C6 - MU C8: ( -16.3, 2.0)
TTEST MU C6 = MU C8 (VS NE): T= -1.61 P=0.12 DF= 24
MTB > stop
--------------------------
Disclaimer: All care was used to prepare the information in this
tutorial. Even so, the author does not and cannot guarantee the
accuracy of this information. The author disclaims any and all
injury that may come about from the use of this tutorial. As
always, students and all others should check with their advisor(s)
and/or other appropriate professionals for any and all assistance
on research design, analysis, selected levels of significance, and
interpretation of output file(s).
The author is entitled to exclusive distribution of this tutorial.
Readers have permission to print this tutorial for individual use,
provided that the copyright statement appears and that there is no
redistribution of this tutorial without permission.
Prepared 980316
Revised 980914
end-of-file 'studen_t.ssi'