Kruskal-Wallis H-Test for Oneway ANOVA by Ranks
© 1998 by Dr. Thomas W. MacFarland -- All Rights Reserved
************ kruskalw.doc ************ Background: The Kruskal-Wallis H-test is often viewed as the nonparametric equivalent of the parametric One Way Analysis of Variance (Oneway ANOVA), with both tests used to serve the same purpose of comparing possible differences between various "groups." The Kruskal-Wallis test is used when the data do not meet the rigor of interval data associated with the parametric Oneway ANOVA test. It may help to think of the Kruskal-Wallis H test as an ANOVA test by ranks. Scenario: This file examines possible differences in graded performance to three separate activities (e.g., final examination score, composite score for all homework problems, final project score) in a high school Logo programming language class. Because the teacher conducting this analysis has a concern that homework scores and final project scores are ordinal data (data are ordered, but not with the precision of interval data), it is best to use the non-parametric K-W H test instead of the Oneway Analysis of Variance (ANOVA), which is based on the use of interval data. A summary of the study is presented in Table 1. Table 1 Summary Data of Final Examination Scores, Homework Problems, and Final Project Scores in a High School Logo Programming Class ==================================================== Graded Activity =============== 1 = Final Exam 2 = Homework Student Number 3 = Final Project Score ---------------------------------------------------- 01 1 085 01 2 090 01 3 075 02 1 088 02 2 092 02 3 082 03 1 091 03 2 074 03 3 055 04 1 088 04 2 093 04 3 067 05 1 086 05 2 083 05 3 077 06 1 072 06 2 091 06 3 062 07 1 057 07 2 059 07 3 066 08 1 093 08 2 079 08 3 072 09 1 082 09 2 088 09 3 081 10 1 077 10 2 071 10 3 063 11 1 085 11 2 082 11 3 093 12 1 072 12 2 073 12 3 062 13 1 074 13 2 094 13 3 090 14 1 057 14 2 075 14 3 048 15 1 079 15 2 082 15 3 063 16 1 083 16 2 093 16 3 098 17 1 068 17 2 078 17 3 082 18 1 078 18 2 082 18 3 088 19 1 074 19 2 078 19 3 000 20 1 085 20 2 089 20 3 094 21 1 094 21 2 093 21 3 083 22 1 088 22 2 081 22 3 073 23 1 083 23 2 081 23 3 071 24 1 089 24 2 092 24 3 080 25 1 088 25 2 079 25 3 068 26 1 081 26 2 088 26 3 092 27 1 091 27 2 094 27 3 083 28 1 095 28 2 092 28 3 084 29 1 095 29 2 093 29 3 100 30 1 095 30 2 096 30 3 098 31 1 073 31 2 079 31 3 064 32 1 081 32 2 085 32 3 091 33 1 087 33 2 081 33 3 081 34 1 069 34 2 075 34 3 076 35 1 077 35 2 088 35 3 083 ---------------------------------------------------- Ho: Null Hypothesis: There is no difference in graded performance to three separate activities (e.g., final examination score, composite score for all homework problems, final project score) between students in a high school Logo programming language class (p = .05). Files: 1. kruskalw.doc 2. kruskalw.dat 3. kruskalw.r01 4. kruskalw.o01 5. kruskalw.con 6. kruskalw.lis Command: At the Unix prompt (%), key: %spss -m < kruskalw.r01 > kruskalw.o01 ************ kruskalw.dat ************ 01 1 085 01 2 090 01 3 075 02 1 088 02 2 092 02 3 082 03 1 091 03 2 074 03 3 055 04 1 088 04 2 093 04 3 067 05 1 086 05 2 083 05 3 077 06 1 072 06 2 091 06 3 062 07 1 057 07 2 059 07 3 066 08 1 093 08 2 079 08 3 072 09 1 082 09 2 088 09 3 081 10 1 077 10 2 071 10 3 063 11 1 085 11 2 082 11 3 093 12 1 072 12 2 073 12 3 062 13 1 074 13 2 094 13 3 090 14 1 057 14 2 075 14 3 048 15 1 079 15 2 082 15 3 063 16 1 083 16 2 093 16 3 098 17 1 068 17 2 078 17 3 082 18 1 078 18 2 082 18 3 088 19 1 074 19 2 078 19 3 000 20 1 085 20 2 089 20 3 094 21 1 094 21 2 093 21 3 083 22 1 088 22 2 081 22 3 073 23 1 083 23 2 081 23 3 071 24 1 089 24 2 092 24 3 080 25 1 088 25 2 079 25 3 068 26 1 081 26 2 088 26 3 092 27 1 091 27 2 094 27 3 083 28 1 095 28 2 092 28 3 084 29 1 095 29 2 093 29 3 100 30 1 095 30 2 096 30 3 098 31 1 073 31 2 079 31 3 064 32 1 081 32 2 085 32 3 091 33 1 087 33 2 081 33 3 081 34 1 069 34 2 075 34 3 076 35 1 077 35 2 088 35 3 083 ************ kruskalw.r01 ************ SET WIDTH = 80 SET LENGTH = NONE SET CASE = UPLOW SET HEADER = NO TITLE = Kruskal-Wallis Oneway Anova by Ranks COMMENT = This file examines possible differences in graded performance to three separate activities (e.g., final examination score, composite score for all homework problems, final project score) in a high school Logo programming language class. Because the teacher conducting this analysis has a concern that homework scores and final project scores are ordinal data (data are ordered, but not with the precision of interval data), it is best to use the non-parametric K-W H test instead of the Oneway Analysis of Variance (ANOVA) based on the use of interval data. DATA LIST FILE = 'kruskalw.dat' FIXED / Stu_Code 20-21 Activity 36 Score 51-53 Variable Lables Stu_Code "Student Code" / Activity "Graded Activity" / Score "Score on Graded Activity" Value Labels Activity 1 'Final Examination' 2 'Homework Problems' 3 'Final Project' NPAR TESTS K-W = Score by Activity(1,3) ************ kruskalw.o01 ************ 1 SET WIDTH = 80 2 SET LENGTH = NONE 3 SET CASE = UPLOW 4 SET HEADER = NO 5 TITLE = Kruskal-Wallis Oneway Anova by Ranks 6 COMMENT = This file examines possible differences 7 in graded performance to three separate 8 activities (e.g., final examination score, 9 composite score for all homework problems, 10 final project score) in a high school Logo 11 programming language class. 12 13 Because the teacher conducting this analysis 14 has a concern that homework scores and final 15 project scores are ordinal data (data are ordered, 16 but not with the precision of interval data), it 17 is best to use the non-parametric K-W H test 18 instead of the Oneway Analysis of Variance (ANOVA) 19 based on the use of interval data. 20 DATA LIST FILE = 'kruskalw.dat' FIXED 21 / Stu_Code 20-21 22 Activity 36 23 Score 51-53 24 This command will read 1 records from kruskalw.dat Variable Rec Start End Format STU_CODE 1 20 21 F2.0 ACTIVITY 1 36 36 F1.0 SCORE 1 51 53 F3.0 25 Variable Lables 26 Stu_Code "Student Code" 27 / Activity "Graded Activity" 28 / Score "Score on Graded Activity" 29 30 Value Labels 31 Activity 1 'Final Examination' 32 2 'Homework Problems' 33 3 'Final Project' 34 35 NPAR TESTS K-W = Score by Activity(1,3) ***** Workspace allows for 18724 cases for NPAR tests ***** - - - - - Kruskal-Wallis 1-Way Anova SCORE Score on Graded Activity by ACTIVITY Graded Activity Mean Rank Cases 54.47 35 ACTIVITY = 1 Final Examination 60.49 35 ACTIVITY = 2 Homework Problems 44.04 35 ACTIVITY = 3 Final Project --- 105 Total Corrected for ties Chi-Square D.F. Significance Chi-Square D.F. Significance 5.2238 2 .0734 5.2328 2 .0731 ************ kruskalw.con ************ Outcome: Computed H = 5.2234 (K-W H approximates Chi-Square) df = k-1 = 3-1 = 2 Criterion H (alpha = .05, df = 2) = 5.9915 Computed H (5.2238) < Criterion H (5.9915) Therefore, the null hypothesis is accepted and it can be claimed that there is no difference in the graded performance to three separate activities (e.g., final examination score, composite score for all homework problems, final project score) in a high school Logo programming language class (p = .05). The p value is another way to view differences in the three graded activities: -- The calculated p value is .0731. -- The delcared p value is .05. The calculated p value exceeds the declared p value and there is, accordingly, no difference in scores of the three graded activities at this level of significance (p = .05). Differences in mean rankings of scores for all three graded activities are due only to chance. Note: Although the test statistic for Kruskal-Wallis analysis is "H," you will notice that chi-square values are used for data analysis. H approximates the chi-square distribution. Note: There is often disagreement in the profession about using parametric analysis for data that simply do not follow normal distribution: -- Grades on a standardized test, such as a well- constructed final examination, likely follow a normal distribution along the bell-shaped curve. -- Grades awarded to homework assignments are a typical example of data that are not distributed along a normal curve, since students rarely turn in a continuum of "bad" to "good" homework assignments. Should parametric analyses be used for nonparametric data? There is no easy answer to this question. Yet, if the purpose of statistical analysis is to offer possible answers to the otherwise unknown, then expect to see parametric analyses for data that may not meet the assumptions associated with tests that rely on normal distribution. Even so, this problem provides a "typical" use of nonparametric data in educational research. In turn, the Kruskal-Wallis H test was indeed the appropriate test for this data set. ************ kruskalw.lis ************ % minitab MTB > outfile 'kruskalw.lis' Collecting Minitab session in file: kruskalw.lis MTB > # MINITAB addendum to kruskalw.dat MTB > read 'kruskalw.dat' c1 c2 c3 Entering data from file: kruskalw.dat 105 rows read. MTB > print c1 c2 c3 ROW C1 C2 C3 1 1 1 85 2 1 2 90 3 1 3 75 4 2 1 88 5 2 2 92 6 2 3 82 7 3 1 91 8 3 2 74 9 3 3 55 10 4 1 88 11 4 2 93 12 4 3 67 13 5 1 86 14 5 2 83 15 5 3 77 16 6 1 72 17 6 2 91 18 6 3 62 Continue? y 19 7 1 57 20 7 2 59 21 7 3 66 22 8 1 93 23 8 2 79 24 8 3 72 25 9 1 82 26 9 2 88 27 9 3 81 28 10 1 77 29 10 2 71 30 10 3 63 31 11 1 85 32 11 2 82 33 11 3 93 34 12 1 72 35 12 2 73 36 12 3 62 37 13 1 74 38 13 2 94 39 13 3 90 40 14 1 57 41 14 2 75 Continue? y 42 14 3 48 43 15 1 79 44 15 2 82 45 15 3 63 46 16 1 83 47 16 2 93 48 16 3 98 49 17 1 68 50 17 2 78 51 17 3 82 52 18 1 78 53 18 2 82 54 18 3 88 55 19 1 74 56 19 2 78 57 19 3 0 58 20 1 85 59 20 2 89 60 20 3 94 61 21 1 94 62 21 2 93 63 21 3 83 64 22 1 88 Continue? y 65 22 2 81 66 22 3 73 67 23 1 83 68 23 2 81 69 23 3 71 70 24 1 89 71 24 2 92 72 24 3 80 73 25 1 88 74 25 2 79 75 25 3 68 76 26 1 81 77 26 2 88 78 26 3 92 79 27 1 91 80 27 2 94 81 27 3 83 82 28 1 95 83 28 2 92 84 28 3 84 85 29 1 95 86 29 2 93 87 29 3 100 Continue? y 88 30 1 95 89 30 2 96 90 30 3 98 91 31 1 73 92 31 2 79 93 31 3 64 94 32 1 81 95 32 2 85 96 32 3 91 97 33 1 87 98 33 2 81 99 33 3 81 100 34 1 69 101 34 2 75 102 34 3 76 103 35 1 77 104 35 2 88 105 35 3 83 MTB > name c1 'Stu_Code' c2 'Activity' c3 'Score' MTB > kruskal-wallis test for data in c3, levels in c2 LEVEL NOBS MEDIAN AVE. RANK Z VALUE 1 35 83.00 54.5 0.35 2 35 83.00 60.5 1.78 3 35 80.00 44.0 -2.13 OVERALL 105 53.0 H = 5.22 d.f. = 2 p = 0.074 H = 5.23 d.f. = 2 p = 0.074 (adj. for ties) MTB > stop -------------------------- Disclaimer: All care was used to prepare the information in this tutorial. Even so, the author does not and cannot guarantee the accuracy of this information. The author disclaims any and all injury that may come about from the use of this tutorial. As always, students and all others should check with their advisor(s) and/or other appropriate professionals for any and all assistance on research design, analysis, selected levels of significance, and interpretation of output file(s). The author is entitled to exclusive distribution of this tutorial. Readers have permission to print this tutorial for individual use, provided that the copyright statement appears and that there is no redistribution of this tutorial without permission. Prepared 980316 Revised 980914 end-of-file 'kruskalw.ssi'