Statistics Tutorial: Student's t-Test for Independent Samples

Student's t-Test for Independent Samples
*************
student_t.doc 
*************
Background:  Student's t-test is a very common (and possibly
             overused) test for determining differences 
             between two groups.  The t-test (developed in 
             1915 by Gosset for the Guinness Breweries of 
             Dublin) is the appropriate test for small samples, 
             as opposed to samples with greater than 30 or 
             more observations.  And recall: 

             -- Student's t-test is still the appropriate test 
                with greater than 30 observations.

             -- With n => 30 observations, t approximates z.

             When using Student's t-test to determine if 
             the difference between two groups is indeed
             a true difference, or if the difference between
             the two groups is due only to chance:

             -- Both groups should approximate normal
                distribution.

             -- Random selection should ideally be used
                for all members of each of the two groups.


Scenario:    This study examines if there are differences
             in final examination test scores between two
             groups of students in a data structures programming 
             course:  

             -- One group of students was taught by Computer 
                Based Training.

             -- The other group of students was taught by 
                traditional lecture.

             Students were all from a university freshman-
             level data structures course (using C++ as the
             programming language platform) who were assigned, 
             through random selection, to placement into one
             of two groups:  instruction by CBT (Computer 
             Based Training) vs. instruction by traditional 
             lecture.  

             Because the teacher was confident that final 
             examination scores represented interval data 
             ( i.e., the data are parametric, with the 
             difference between "89" and "90" equal to the 
             difference between "75" and "76"), Student's 
             t-Test for Independent Samples was correctly 
             judged to be the appropriate test for this 
             analysis of differences between two groups. 

             Test scores for both groups of students are 
             summarized in Table 1.


             Table 1

             Final Examination Test Scores in a Data Structures 
             Course by Student Group: Students Taught by Computer 
             Based Training and Students Taught by Traditional 
             Lecture
             ====================================================   
                               Teaching Method 
                               ===============
                               1 = CBT 
             Student Number    2 = Traditional  Final Score 
             ----------------------------------------------------    
                   01             1              089
                   02             1              081
                   03             1              092
                   04             1              094
                   05             1              074
                   06             1              056
                   07             1              077
                   08             1              085
                   09             1              078
                   10             1              069
                   11             1              089
                   12             1              088
                   13             1              045
                   14             1              083
                   15             1              095
                   16             2              091
                   17             2              057
                   18             2              089
                   19             2              083
                   20             2              080
                   21             2              083
                   22             2              091
                   23             2              084
                   24             2              084
                   25             2              094
                   26             2              096
                   27             2              088
                   28             2              097
                   29             2              091
                   30             2              094
             ----------------------------------------------------    


Ho:          Null Hypothesis:  There is no difference in final
             examination test scores between students in a data
             structures course taught by Computer Based Training
             and their counterparts who were taught by the use 
             of traditional lecture (p <= .05).


Files:       1.  studen_t.doc

             2.  studen_t.dat

             3.  studen_t.r01

             4.  studen_t.o01

             5.  studen_t.con

             6.  studen_t.lis


Command:     At the Unix prompt (%), key:

             %spss -m < studen_t.r01 > studen_t.o01


************
studen_t.dat
************
                   01             1              089
                   02             1              081
                   03             1              092
                   04             1              094
                   05             1              074
                   06             1              056
                   07             1              077
                   08             1              085
                   09             1              078
                   10             1              069
                   11             1              089
                   12             1              088
                   13             1              045
                   14             1              083
                   15             1              095
                   16             2              091
                   17             2              057
                   18             2              089
                   19             2              083
                   20             2              080
                   21             2              083
                   22             2              091
                   23             2              084
                   24             2              084
                   25             2              094
                   26             2              096
                   27             2              088
                   28             2              097
                   29             2              091
                   30             2              094


************
studen_t.r01
************
SET WIDTH      = 80
SET LENGTH     = NONE
SET CASE       = UPLOW
SET HEADER     = NO
TITLE          = Student's t-Test for Independent Samples
COMMENT        = This file examines if there are differences
                 in final examination test scores between two
                 groups of students in a data structures 
                 programming course:  one group of students 
                 was taught by Computer Based Training and the 
                 other group of students was taught by 
                 traditional lecture.

                 Students were all from a university freshman-
                 level data structures course (using C++ as the
                 programming language platform) who were assigned, 
                 through random selection, to placement into one
                 of two groups:  instruction by CBT (Computer 
                 Based Training) vs. instruction by traditional 
                 lecture.  Because the teacher was confident that 
                 final examination scores represented interval 
                 data ( i.e., the data are parametric, with the 
                 difference between "89" and "90" equal to the 
                 difference between "75" and "76"), Student's 
                 t-Test for Independent Samples was correctly 
                 judged to be the appropriate test for this 
                 analysis of differences between two groups. 
DATA LIST FILE = 'studen_t.dat' FIXED
     / Stu_Code  20-21
       Method       35
       Score     50-52 

Variable Labels
       Stu_Code   "Student Code"
     / Method     "Method:  CBT vs. Lecture"
     / Score      "Final Examination Score"

Value Labels
       Method     1 'CBT:  Computer Based Training'
                  2 'Traditional Lecture'

T-TEST GROUPS         = Method(1,2)
     / VARIABLES      = Score


************
studen_t.o01
************
   1  SET WIDTH      = 80
   2  SET LENGTH     = NONE
   3  SET CASE       = UPLOW
   4  SET HEADER     = NO
   5  TITLE          = Student's t-Test for Independent Samples
   6  COMMENT        = This file examines if there are differences
   7                   in final examination test scores between two
   8                   groups of students in a data structures
   9                   programming course:  one group of students
  10                   was taught by Computer Based Training and the
  11                   other group of students was taught by
  12                   traditional lecture.
  13
  14                   Students were all from a university freshman-
  15                   level data structures course (using C++ as the
  16                   programming language platform) who were assigned,
  17                   through random selection, to placement into one
  18                   of two groups:  instruction by CBT (Computer
  19                   Based Training) vs. instruction by traditional
  20                   lecture.  Because the teacher was confident that
  21                   final examination scores represented interval
  22                   data ( i.e., the data are parametric, with the
  23                   difference between "89" and "90" equal to the
  24                   difference between "75" and "76"), Student's
  25                   t-Test for Independent Samples was correctly
  26                   judged to be the appropriate test for this
  27                   analysis of differences between two groups.
  28  DATA LIST FILE = 'studen_t.dat' FIXED
  29       / Stu_Code  20-21
  30         Method       35
  31         Score     50-52
  32

This command will read 1 records from studen_t.dat

Variable   Rec   Start     End         Format

STU_CODE     1      20      21         F2.0
METHOD       1      35      35         F1.0
SCORE        1      50      52         F3.0

  33  Variable Labels
  34         Stu_Code   "Student Code"
  35       / Method     "Method:  CBT vs. Lecture"
  36       / Score      "Final Examination Score"
  37
  38  Value Labels
  39         Method     1 'CBT:  Computer Based Training'
  40                    2 'Traditional Lecture'
  41
  42  T-TEST GROUPS         = Method(1,2)
  43       / VARIABLES      = Score

T-TEST requires 72 bytes of workspace for execution.

t-tests for Independent Samples of METHOD    Method:  CBT vs. Lecture


                             Number
 Variable                   of Cases       Mean          SD   SE of Mean
 -----------------------------------------------------------------------
 SCORE  Final Examination Score

 CBT:  Computer Base          15        79.6667      14.130        3.648
 Traditional Lecture          15        86.8000       9.748        2.517
 -----------------------------------------------------------------------

          Mean Difference = -7.1333

          Levene's Test for Equality of Variances: F= 1.768  P= .194


       t-test for Equality of Means                                95%
 Variances   t-value       df    2-Tail Sig     SE of Diff        CI for
Diff

-------------------------------------------------------------------------------
 Equal         -1.61       28          .119          4.432   (-16.213,1.946)
 Unequal       -1.61    24.87          .120          4.432   (-16.265,1.998)

-------------------------------------------------------------------------------


************
studen_t.con
************
Outcome:     Computed t  = | - 1.61 |

             Criterion t = + or - 2.05 (alpha = .05, df = 28)
 
             Computed t |-1.61| < Criterion t |-2.05|

             Note.  The | and | characters are used to indicate
                    absolute value.

             Therefore, the null hypothesis is accepted and it can
             be claimed that there is no difference (p <= .05) in 
             final examination test scores between students in a 
             data structures course taught by Computer Based 
             Training and their counterparts who were taught by 
             the use of traditional lecture.  Any difference 
             between the two groups is due only to chance. 

             The p value is another way to view differences in
             the three graded activities:

             -- The calculated p value is .119. 

             -- The delcared p value is .05.

             The calculated p value exceeds the declared p value 
             and there is, accordingly, no difference between
             the two groups in terms of scores on the final
             examination.  At p <= .05 any differences in test
             scores that exist are due only to chance.


************
studen_t.lis
************
% minitab

 MTB > outfile 'studen_t.lis'
 Collecting Minitab session in file: studen_t.lis
 MTB > # MINITAB addendum to studen_t.dat
 MTB > read 'studen_t.dat' c1 c2 c3
 Entering data from file: studen_t.dat
      30 rows read.
 MTB > name c1 'Stu_Code' c2 'Method' c3 'Score'
 MTB > print 'Stu_Code' 'Method' 'Score'
 
 
  ROW  Stu_Code  Method  Score
 
    1         1       1     89
    2         2       1     81
    3         3       1     92
    4         4       1     94
    5         5       1     74
    6         6       1     56
    7         7       1     77
    8         8       1     85
    9         9       1     78
   10        10       1     69
   11        11       1     89
   12        12       1     88
   13        13       1     45
   14        14       1     83
   15        15       1     95
   16        16       2     91
   17        17       2     57
   18        18       2     89
 Continue? y
   19        19       2     83
   20        20       2     80
   21        21       2     83
   22        22       2     91
   23        23       2     84
   24        24       2     84
   25        25       2     94
   26        26       2     96
   27        27       2     88
   28        28       2     97
   29        29       2     91
   30        30       2     94
 
 MTB > # With MINITAB, it is possible to conduct Student's t-Test
 MTB > # with stacked and unstacked data.
 MTB > #
 MTB > # I will unstack the data in c3 and then conduct the
 MTB > # t-Test using both methods.
 MTB > #
 MTB > unstack (c2-c3) into (c5-c6) (c7-c8);
 SUBC> subscripts c2.
 MTB > print c1-c8
 
 
  ROW  Stu_Code  Method  Score   C5    C6   C7    C8
 
    1         1       1     89    1    89    2    91
    2         2       1     81    1    81    2    57
    3         3       1     92    1    92    2    89
    4         4       1     94    1    94    2    83
    5         5       1     74    1    74    2    80
    6         6       1     56    1    56    2    83
    7         7       1     77    1    77    2    91
    8         8       1     85    1    85    2    84
    9         9       1     78    1    78    2    84
   10        10       1     69    1    69    2    94
   11        11       1     89    1    89    2    96
   12        12       1     88    1    88    2    88
   13        13       1     45    1    45    2    97
   14        14       1     83    1    83    2    91
   15        15       1     95    1    95    2    94
   16        16       2     91                      
   17        17       2     57                      
   18        18       2     89                      
 Continue? y
   19        19       2     83                      
   20        20       2     80                      
   21        21       2     83                      
   22        22       2     91                      
   23        23       2     84                      
   24        24       2     84                      
   25        25       2     94                      
   26        26       2     96                      
   27        27       2     88                      
   28        28       2     97                      
   29        29       2     91                      
   30        30       2     94                      
 
 * NOTE  * One or more variables are undefined.
 
 MTB > histogram c6
 
 Histogram of C6   N = 15
 
 Midpoint   Count
       45       1  *
       50       0
       55       1  *
       60       0
       65       0
       70       1  *
       75       2  **
       80       2  **
       85       2  **
       90       4  ****
       95       2  **
 
 MTB > histogram c8
 
 Histogram of C8   N = 15
 
 Midpoint   Count
       55       1  *
       60       0
       65       0
       70       0
       75       0
       80       1  *
       85       4  ****
       90       5  *****
       95       4  ****
 
 MTB > describe c6 c8
 
                 N     MEAN   MEDIAN   TRMEAN    STDEV   SEMEAN
 C6             15    79.67    83.00    81.15    14.13     3.65
 C8             15    86.80    89.00    88.31     9.75     2.52
 
               MIN      MAX       Q1       Q3
 C6          45.00    95.00    74.00    89.00
 C8          57.00    97.00    83.00    94.00
 
 MTB > #
 MTB > # And now notice how I conduct the t-Test on stacked data.
 MTB > #
 MTB > twot data in c3 groups in c2
 
 TWOSAMPLE T FOR Score
 Method   N      MEAN     STDEV   SE MEAN
 1       15      79.7      14.1       3.6
 2       15     86.80      9.75       2.5
 
 95 PCT CI FOR MU 1 - MU 2: ( -16.3,  2.0)
 
 TTEST MU 1 = MU 2 (VS NE): T= -1.61  P=0.12  DF=  24
 
 MTB > #
 MTB > # And now notice how I conduct the t-Test on unstacked data.
 MTB > #
 MTB > twosamplet c6 c8
 
 TWOSAMPLE T FOR C6 VS C8
      N      MEAN     STDEV   SE MEAN
 C6  15      79.7      14.1       3.6
 C8  15     86.80      9.75       2.5
 
 95 PCT CI FOR MU C6 - MU C8: ( -16.3,  2.0)
 
 TTEST MU C6 = MU C8 (VS NE): T= -1.61  P=0.12  DF=  24
 
 MTB > stop

--------------------------
Disclaimer:  All care was used to prepare the information in this 
tutorial.  Even so, the author does not and cannot guarantee the 
accuracy of this information.  The author disclaims any and all 
injury that may come about from the use of this tutorial.  As 
always, students and all others should check with their advisor(s) 
and/or other appropriate professionals for any and all assistance 
on research design, analysis, selected levels of significance, and 
interpretation of output file(s).

The author is entitled to exclusive distribution of this tutorial. 
Readers have permission to print this tutorial for individual use, 
provided that the copyright statement appears and that there is no 
redistribution of this tutorial without permission.

Prepared 980316
Revised  980914
end-of-file 'studen_t.ssi'
Please send comments or suggestions to Dr. Thomas W. MacFarland

There have been visitors to this page since February 1, 1999.