The Mann-Whitney test is an alternative for the independent samples t-test when the assumptions required by the latter aren't met by the data. The most common scenario is testing a non normally distributed outcome variable in a small sample (say, n < 25).Non normality isn't a serious issue in larger samples due to the central limit theorem.

The Mann-Whitney test is also known as the **Wilcoxon test** for *independent* samples -which shouldn't be confused with the Wilcoxon signed-ranks test for *related* samples.

## Research Question

We'll use adratings.sav during this tutorial, a screenshot of which is shown above. These data contain the ratings of 3 car commercials by 18 respondents, balanced over gender and age category. Our research question is whether men and women judge our commercials similarly. For each commercial separately, our **null hypothesis** is:
“the mean ratings of men and women are equal.”

## Quick Data Check - Split Histograms

Before running any significance tests, let's first just inspect what our data look like in the first place. A great way for doing so is running some histograms. Since we're interested in differences between male and female respondents, let's split our histograms by gender. The screenshots below guide you through.

## Split Histograms - Syntax

Using the menu results in the first block of syntax below. We copy-paste it twice, replace the variable name and run it.

***Run split histograms to see if data look plausible.**

GRAPH

/HISTOGRAM=ad1

/PANEL COLVAR=gender COLOP=CROSS.

GRAPH

/HISTOGRAM=ad2

/PANEL COLVAR=gender COLOP=CROSS.

GRAPH

/HISTOGRAM=ad3

/PANEL COLVAR=gender COLOP=CROSS.

## Split Histograms - Results

Most importantly, **all results look plausible**; we don't see any unusual values or patterns. Second, our outcome variables don't seem to be normally distributed and we've a total sample size of only n = 18. This argues against using a t-test for these data.

Finally, by taking a good look at the split histograms, you can already see which commercials are rated more favorably by male versus female respondents. But even if they're rated perfectly similarly by large populations of men and women, we'll still see *some* differences in small samples. *Large* sample differences, however, are unlikely if the null hypothesis -equal population means- is really true. We'll now find out if our sample differences are large enough for refuting this hypothesis.

## SPSS Mann-Whitney Test - Menu

Depending on your SPSS license, you may or may not have the submenu available. If you don't have it, just skip the step below.

## SPSS Mann-Whitney Test - Syntax

Note: selecting

results in an extra line of syntax (omitted below).***Run Mann-Whitney test on 3 outcome variables at once.**

NPAR TESTS

/M-W= ad1 ad2 ad3 BY gender(0 1)

/MISSING ANALYSIS.

## SPSS Mann-Whitney Test - Output Descriptive Statistics

The Mann-Whitney test basically replaces all scores with their rank numbers: 1, 2, 3 through 18 for 18 cases. Higher scores get higher rank numbers. If our grouping variable (gender) doesn't affect our ratings, then the **mean ranks should be roughly equal** for men and women.

Our first commercial (“Family car”) shows the largest difference in mean ranks between male and female respondents: females seem much more enthusiastic about it. The reverse pattern -but much weaker- is observed for the other two commercials.

## SPSS Mann-Whitney Test - Output Significance Tests

Some of the output shown below may be absent depending on your SPSS license and the sample size: for n = 40 or fewer cases, you'll always get some exact results.

**Mann-Whitney U** and **Wilcoxon W** are our test statistics; they summarize the difference in mean rank numbers in a single number.Note that Wilcoxon W corresponds to the smallest sum of rank numbers from the previous table.

We prefer reporting **Exact Sig. (2-tailed)**: the exact p-value corrected for ties.

Second best is **Exact Sig. [2*(1-tailed Sig.)]**, the exact p-value but not corrected for ties.

For larger sample sizes, our test statistics are roughly normally distributed. An approximate (or “**Asymptotic**”) p-value is based on the standard normal distribution. The z-score and p-value reported by SPSS are calculated without applying the necessary continuity correction, resulting in some (minor) inaccuracy.

## SPSS Mann-Whitney Test - Conclusions

Like we just saw, SPSS Mann-Whitney test output may include up to 3 different 2-sided p-values. Fortunately, they all lead to the same conclusion if we follow the convention of rejecting the null hypothesis if p < 0.05:
Women rated the “Family Car” commercial more favorably than men (p = 0.001). The other two commercials didn't show a gender difference (p > 0.10).
The p-value of 0.001 indicates a probability of 1 in 1,000: if the *populations* of men and women rate this commercial similarly, then we've a 1 in 1,000 chance of finding the large difference we observe in our *sample*. Presumably, the populations of men and women don't rate it similarly after all.

So that's about it. Thanks for reading and feel free to leave a comment below!

## This tutorial has 22 comments

## By Jon Peck on September 21st, 2016

MW works for mean or median difference assuming that the distribution shapes are similar. But imagine two samples with the same mean but one skewed left and one skewed right. The test would still probably detect this.

## By Ruben Geert van den Berg on September 21st, 2016

Dear Jon, thanks for your suggestion. I've actually been thinking this over for a long time and there's two considerations:

1 - the Mann-Whitney test is sensitive only for different "central tendencies" of the distributions. That is: two distributions with similar means but sharply unequal standard deviations are obviously not similar but this won't be detected by the Mann-Whitney test. The same holds for different skewnesses and kurtoses.

2 - the Mann-Whitney test is applicable to ordinal variables for which means aren't defined.

These are the considerations that made me settle for "mean ranks". I hope this makes some sense to you. I'd like to add that the CSR tends to be sloppy regarding NPAR TESTS and I've some other worries but I won't bother my readers with those.

## By Jon Peck on September 21st, 2016

Strictly speaking, isn't the null hypothesis tested by Mann-Whitney that the two distributions are the same rather than that the means are the same?

## By Ruben Geert van den Berg on September 19th, 2016

Dear Jon, that's a great suggestion! I'll do just that with the next update of this tutorial. I'm a big fan of paneling but it's a bit of a pity that it sometimes requires VARSTOCASES -with which you'll lose the original variable labels by default.

## By Jon Peck on September 18th, 2016

For graphing, this would be a good place to use a population pyramid split by gender . With a little data manipulation the pyramids could also be paneled by the ad variables. Population pyramids with paneling are available in the Chart Builder as well as the legacy chart dialogs.