
The Kruskal-Wallis test is an alternative for a one-way ANOVA if the assumptions of the latter are violated. We'll show in a minute why that's the case with creatine.sav, the data we'll use in this tutorial. But let's first take a quick look at what's in the data anyway.
Quick Data Description

Our data contain the result of a small experiment regarding creatine, a supplement that's popular among body builders. These were divided into 3 groups: some didn't take any creatine, others took it in the morning and still others took it in the evening. After doing so for a month, their weight gains were measured. The basic research question is
does the average weight gain depend on
the creatine condition to which people were assigned?
That is, we'll test if three means -each calculated on a different group of people- are equal. The most likely test for this scenario is a one-way ANOVA but using it requires some assumptions. Some basic checks will tell us that these assumptions aren't satisfied by our data at hand.
Data Check 1 - Histogram
A very efficient data check is to run histograms on all metric variables. The fastest way for doing so is by running the syntax below.
frequencies gain
/formats notable
/histogram.
Histogram Result

First, our histogram looks plausible with all weight gains between -1 and +5 kilos, which are reasonable outcomes over one month. However, our outcome variable is not normally distributed as required for ANOVA. This isn't an issue for larger sample sizes of, say, at least 30 people in each group. The reason for this is the central limit theorem. It basically states that for reasonable sample sizes the sampling distribution for means and sums are always normally distributed regardless of a variable’s original distribution. However, for our tiny sample at hand, this does pose a real problem.
Data Check 2 - Descriptives per Group
Right, now after making sure the results for weight gain look credible, let's see if our 3 groups actually have different means. The fastest way to do so is a simple MEANS command as shown below.
means gain by group.
SPSS MEANS Output

First, note that our evening creatine group (4 participants) gained an average of 961 grams as opposed to 120 grams for “no creatine”. This suggests that creatine does make a real difference.
But don't overlook the standard deviations for our groups: they are very different but ANOVA requires them to be equal.The assumption of equal population standard deviations for all groups is known as homoscedasticity. This is a second violation of the ANOVA assumptions.
Kruskal-Wallis Test
So what should we do now? We'd like to use an ANOVA but our data seriously violates its assumptions. Well, a test that was designed for precisely this situation is the Kruskal-Wallis test which doesn't require these assumptions. It basically replaces the weight gain scores with their rank numbers and tests whether these are equal over groups. We'll run it by following the screenshots below.
Running a Kruskal-Wallis Test in SPSS

We use
if we compare 3 or more groups of cases. They are “independent” because our groups don't overlap (each case belongs to only one creatine condition).
Depending on your license, your SPSS version may or may have the option shown below. It's fine to skip this step otherwise.

SPSS Kruskal-Wallis Test Syntax
Following the previous screenshots results in the syntax below. We'll run it and explain the output.
NPAR TESTS
/K-W=gain BY group(1 3)
/MISSING ANALYSIS.
SPSS Kruskal-Wallis Test Output
We'll skip the “RANKS” table and head over to the “Test Statistics” shown below.

Our test statistic -incorrectly labeled as “Chi-Square” by SPSS- is known as Kruskal-Wallis H. A larger value indicates larger differences between the groups we're comparing. For our data it's roughly 3.87. We need to know its sampling distribution for evaluating whether this is unusually large.
Exact Sig. uses the exact (but very complex) sampling distribution of H. However, it turns out that if each group contains 4 or more cases, this exact sampling distribution is almost identical to the (much simpler) chi-square distribution.
We therefore usually approximate the p-value with a chi-square distribution. If we compare k groups, we have k - 1 degrees of freedom, denoted by df in our output.
Asymp. Sig. is the p-value based on our chi-square approximation. The value of 0.145 basically means there's a 14.5% chance of finding our sample results if creatine doesn't have any effect in the population at large. So if creatine does nothing whatsoever, we have a fair (14.5%) chance of finding such minor weight gain differences just because of random sampling. If p > 0.05, we usually conclude that our differences are not statistically significant.
Note that our exact p-value is 0.146 whereas the approximate p-value is 0.145. This supports the claim that H is almost perfectly chi-square distributed.
Kruskal-Wallis Test - Reporting
The official way for reporting our test results includes our chi-square value, df and p as in
“this study did not demonstrate any effect from creatine,
H(2) = 3.87, p = 0.15.”
So that's it for now. I hope you found this tutorial helpful. Please let me know by leaving a comment below. Thanks!
THIS TUTORIAL HAS 72 COMMENTS:
By Jon Peck on August 29th, 2016
Hi Ruben,
Wondering why the tutorial doesn't use the newer nonparametric procedure, which includes some useful graphics as well as the statistics. The output is available as a Model Viewer or in traditional style.
Also, regarding the complaint about the labelling of the statistics, the statistic's significance is evaluated against a Chi-squared distribution, which is asmptotically correct. In the newer procedure, it is just labelled as "Test Statistic".
By Ruben Geert van den Berg on August 29th, 2016
Dear Jon, thanks for your feedback.
I generally prefer the legacy dialogs for nonparametric tests as well as charts. The basic reason is their simplicity. The new nonparametric interface bothers me with too much text and too many questions. I want my students to think about what they're doing and why and make their own, conscious choices. If I ask them why they chose for a binomial test, then "SPSS selected it for me" is not the answer I want to hear.
I also feel "asymptotically correct" doesn't help much if I'm analyzing a small sample. Many nonparametric tests will always produce exact statistics for smaller samples (n <= 25 or n <= 40). Reversely, for huge samples, the central limit theorem will often (not always) render a nonparametric test redundant in the first place. Any standard text on introductory statistics will tell you that the test statistic here is K-W H. So why not just call it K-W H? I should add to that that statistical terminology is a disaster anyway as many nonparametric tests are not nonparametric but distribution free instead. I understand this may render naming conventions in SPSS difficult too sometimes -if everybody incorrectly calls it a "nonparametric test", it may be best to just conform to this.
By Jon on August 29th, 2016
Fair enough, Ruben. I usually choose "Customize Tests". The new dialog, at least, brings the choices together, which might make picking the appropriate items easier. Automatic, at least, is likely to head off errors for those who are unsure.
I noticed that the legacy K-W dialog does call the statistic H, so the output should use a similar label
I'll report that, FWIW.
By HAUFLER on March 15th, 2017
Thanks! I have still q question: How is the syntax to run a Posthoc test for Kruskal Wallis (Dunn Bonferroni)? I have to do it via legacy dialog because the new one doesn't work but I don´t know which syntax is needed. Thanks in advance
By Ruben Geert van den Berg on March 15th, 2017
Hi Piah!
Short answer: you can't and you shouldn't. You use a KW test instead of ANOVA when you can't rely on distributional assumptions (small samples with non normally distributed variables). The Dunnett post hoc test does make distributional assumptions. So if you really want to use it, use ANOVA instead of the KW-test and you'll find it in the Options subdialog.
If you use KW because you really need a nonparametric/distribution free test, use Mann-Whitney tests as post hoc tests. For K groups, you can make (K - 1) * K * 0.5 pairwise comparisons. For 3 groups, that'll be (3 - 1) * 3 * 0.5 = 3. The Bonferroni adjustment requires that you multiply all p-values with the number of tests (3 in this case). So if you normally need p < 0.05, you now need p < 0.0166 in order for "significance".
Hope that helps!