ANOVA - What if Levene’s Test is “Significant”?

SPSS ANOVA – Levene’s Test “Significant”

An assumption required for ANOVA is homogeneity of variances. We often run Levene’s test to check if this holds. But what if it doesn't? This tutorial walks you through.

SPSS ANOVA Dialogs I
Results I - Levene’s Test “Significant“
SPSS ANOVA Dialogs II
Results II - Welch and Games-Howell Tests
Plan B - Kruskal-Wallis Test

Example Data

All analyses in this tutorial use staff.sav, part of which is shown below. We encourage you to download these data and replicate our analyses.

Our data contain some details on a sample of N = 179 employees. The research question for today is: is salary associated with region? We'll try to support this claim by rejecting the null hypothesis that all regions have equal mean population salaries. A likely analysis for this is an ANOVA but this requires a couple of assumptions.

ANOVA Assumptions

An ANOVA requires 3 assumptions:

independent observations;
normality: the dependent variable must follow a normal distribution within each subpopulation.
homogeneity: the variance of the dependent variable must be equal over all subpopulations.

With regard to our data, independent observations seem plausible: each record represents a distinct person and people didn't interact in any way that's likely to affect their answers.

Second, normality is only needed for small sample sizes of, say, N < 25 per subgroup. We'll inspect if our data meet this requirement in a minute.

Last, homogeneity is only needed if sample sizes are sharply unequal. If so, we usually run Levene's test. This procedure tests if 2+ population variances are all likely to be equal.

Quick Data Check

Before running our ANOVA, let's first see if the reported salaries are even plausible. The best way to do so is inspecting a histogram which we'll create by running the syntax below.

*Run basic histogram on salary.

frequencies salary
/format notable
/histogram.

Result

Note that our histogram reports N = 175 rather than our N = 179 respondents. This implies that salary contains 4 missing values.
The frequency distribution, however, looks plausible: there's no clear outliers or other abnormalities that should ring any alarm bells.
The distribution shows some positive skewness. However, this makes perfect sense and is no cause for concern.

Let's now proceed to the actual ANOVA.

SPSS ANOVA Dialogs I

After opening our data in SPSS, let's first navigate to Analyze General Linear Model Univariate as shown below.

Let's now fill in the dialog that opens as shown below.

Completing these steps results in the syntax below. Let's run it.

*ANOVA with descriptive statistics, Levene's test and effect size: (partial) eta squared.

UNIANOVA salary BY region
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/PRINT ETASQ DESCRIPTIVE HOMOGENEITY
/CRITERIA=ALPHA(.05)
/DESIGN=region.

Results I - Levene’s Test “Significant”

The very first thing we inspect are the sample sizes used for our ANOVA and Levene’s test as shown below.

SPSS ANOVA Levenes Test Significant Output

First off, note that our Descriptive Statistics table is based on N = 171 respondents (bottom row). This is due to some missing values in both region and salary.
Second, sample sizes for “North” and “East” are rather small. We may therefore need the normality assumption. For now, let's just assume it's met.
Next, our sample sizes are sharply unequal so we really need to meet the homogeneity of variances assumption.
However, Levene’s test is statistically significant because its p < 0.05: we reject its null hypothesis of equal population variances.

The combination of these last 2 points implies that we can not interpret or report the F-test shown in the table below.

As discussed, we can't rely on this p-value for the usual F-test.

However, we can still interpret eta squared (often written as η²). This is a descriptive statistic that neither requires normality nor homogeneity. η² = 0.046 implies a small to medium effect size for our ANOVA.

Now, if we can't interpret our F-test, then how can we know if our mean salaries differ? Two good alternatives are:

running an ANOVA with the Welch statistic or
a Kruskal-Wallis test.

Let's start off with the Welch statistic.

SPSS ANOVA Dialogs II

For inspecting the Welch statistic, first navigate to Analyze Compare Means One-Way ANOVA as shown below.

SPSS Analyze Compare Means One Way ANOVA

Next, we'll fill out the dialogs that open as shown below.

This results in the syntax below. Again, let's run it.

*ANOVA with Welch statistic and Games-Howell post hoc tests.

ONEWAY salary BY region
/STATISTICS HOMOGENEITY WELCH
/MISSING ANALYSIS
/POSTHOC=GH ALPHA(0.05).

Results II - Welch and Games-Howell Tests

As shown below, the Welch test rejects the null hypothesis of equal population means.

This table is labelled “Robust Tests...” because it's robust to a violation of the homogeneity assumption as indicated by Levene’s test. So we now conclude that mean salaries are not equal over all regions.

But precisely which regions differ with regard to mean salaries? This is answered by inspecting post hoc tests. And if the homogeneity assumption is violated, we usually prefer Games-Howell as shown below.

SPSS ANOVA Games Howell Post Hoc Tests Output

Note that each comparison is shown twice in this table. The only regions whose mean salaries differ “significantly” are North and Top 4 City.

Plan B - Kruskal-Wallis Test

So far, we overlooked one issue: some regions have sample sizes of n = 15 or n = 16. This implies that the normality assumption should be met as well. A terrible idea here is to run

a Kolmogorov-Smirnov test or
a Shapiro-Wilk test

for each region separately. Neither test rejects the null hypothesis of a normally distributed dependent variable but this is merely due to insufficient sample sizes.

A much better idea is running a Kruskal-Wallis test. You could do so with the syntax below.

*Kruskal-Wallis test from Analyze - Nonparametric Tests - Legacy Dialogs - K Independent Samples.

NPAR TESTS
/K-W=salary BY region(1 5)
/STATISTICS DESCRIPTIVES
/MISSING ANALYSIS.

Result

Sadly, our Kruskal-Wallis test doesn't detect any difference between mean salary ranks over regions, H(4) = 6.58, p = 0.16.

In short, our analyses come up with inconclusive outcomes and it's unclear precisely why. If you've any suggestions, please throw us a comment below. Other than that,

Thanks for reading!

How to Run Levene’s Test in SPSS?

Levene’s test examines if 2+ populations all have
equal variances on some variable.

Null Hypothesis
Assumptions
Levene’s test in SPSS
Output
Reporting

Levene’s Test - What Is It?

If we want to compare 2(+) groups on a quantitative variable, we usually want to know if they have equal mean scores. For finding out if that's the case, we often use

an independent samples t-test for comparing 2 groups or
a one-way ANOVA for comparing 3+ groups.

Both tests require the homogeneity (of variances) assumption: the population variances of the dependent variable must be equal within all groups. However, you don't always need this assumption:

you don't need to meet the homogeneity assumption if the groups you're comparing have roughly equal sample sizes;
you do need this assumption if your groups have sharply different sample sizes.

Now, we usually don't know our population variances but we do know our sample variances. And if these don't differ too much, then the population variances being equal seems credible.

But how do we know if our sample variances differ “too much”? Well, Levene’s test tells us precisely that.

Null Hypothesis

The null hypothesis for Levene’s test is that the groups we're comparing all have equal population variances. If this is true, we'll probably find slightly different variances in samples from these populations. However, very different sample variances suggest that the population variances weren't equal after all. In this case we'll reject the null hypothesis of equal population variances.

Levene’s Test - Assumptions

Levene’s test basically requires two assumptions:

independent observations and
the test variable is quantitative -that is, not nominal or ordinal.

Levene’s Test - Example

A fitness company wants to know if 2 supplements for stimulating body fat loss actually work. They test 2 supplements (a cortisol blocker and a thyroid booster) on 20 people each. An additional 40 people receive a placebo.

All 80 participants have body fat measurements at the start of the experiment (week 11) and weeks 14, 17 and 20. This results in fatloss-unequal.sav, part of which is shown below.

Levene's Test in SPSS - Example Data Variable View

One approach to these data is comparing body fat percentages over the 3 groups (placebo, thyroid, cortisol) for each week separately.Perhaps a better approach to these data is using a single mixed ANOVA. Weeks would be the within-subjects factor and supplement would be the between-subjects factor. For now, we'll leave it as an exercise to the reader to carry this out. This can be done with an ANOVA for each of the 4 body fat measurements. However, since we've unequal sample sizes, we first need to make sure that our supplement groups have equal variances.

Running Levene’s test in SPSS

Several SPSS commands contain an option for running Levene’s test. The easiest way to go -especially for multiple variables- is the One-Way ANOVA dialog.This dialog was greatly improved in SPSS version 27 and now includes measures of effect size such as (partial) eta squared. So let's navigate to Analyze Compare Means One-Way ANOVA and fill out the dialog that pops up.

Levene's Test from SPSS One Way ANOVA Menu

As shown below, the Homogeneity of variance test under Options refers to Levene’s test.

Levene's Test from SPSS One Way ANOVA Dialog

Clicking Paste results in the syntax below. Let's run it.

SPSS Levene’s Test Syntax Example

*SPSS Levene's test syntax as pasted from Analyze - Compare Means - One-Way ANOVA.

ONEWAY fat11 fat14 fat17 fat20 BY condition
/STATISTICS DESCRIPTIVES HOMOGENEITY
/MISSING ANALYSIS.

Output for Levene’s test

On running our syntax, we get several tables. The second -shown below- is the Test of Homogeneity of Variances. This holds the results of Levene’s test.

SPSS Levene's Test Output and Interpretation

As a rule of thumb, we conclude that population variances are not equal if “Sig.” or p < 0.05. For the first 2 variables, p > 0.05: for fat percentage in weeks 11 and 14 we don't reject the null hypothesis of equal population variances.

For the last 2 variables, p < 0.05: for fat percentages in weeks 17 and 20, we reject the null hypothesis of equal population variances. So these 2 variables violate the homogeity of variance assumption needed for an ANOVA.

Descriptive Statistics Output

Remember that we don't need equal population variances if we have roughly equal sample sizes. A sound way for evaluating if this holds is inspecting the Descriptives table in our output.

Sample Sizes Used for ANOVA in Descriptives Table

As we see, our ANOVA is based on sample sizes of 40, 20 and 20 for all 4 dependent variables. Because they're not (roughly) equal, we do need the homogeneity of variance assumption but it's not met by 2 variables.

In this case, we'll report alternative measures (Welch and Games-Howell) that don't require the homogeneity assumption. How to run and interpret these is covered in SPSS ANOVA - Levene’s Test “Significant”.

Reporting Levene’s test

Perhaps surprisingly, Levene’s test is technically an ANOVA as we'll explain here. We therefore report it like just a basic ANOVA too. So we'll write something like “Levene’s test showed that the variances for body fat percentage in week 20 were not equal, F(2,77) = 4.58, p = .013.”

Levene’s Test - How Does It Work?

Levene’s test works very simply: a larger variance means that -on average- the data values are “further away” from their mean. The figure below illustrates this: watch the histograms become “wider” as the variances increase.

We therefore compute the absolute differences between all scores and their (group) means. The means of these absolute differences should be roughly equal over groups. So technically, Levene’s test is an ANOVA on the absolute difference scores. In other words: we run an ANOVA (on absolute differences) to find out if we can run an ANOVA (on our actual data).

If that confuses you, try running the syntax below. It does exactly what I just explained.

“Manual” Levene’s Test Syntax

*Add group means on fat20 to dataset.

aggregate outfile * mode addvariables
/break condition
/mfat20 = mean(fat20).

*Compute absolute differences between fat20 and group means.

compute adfat20 = abs(fat20 - mfat20).

*Run minimal ANOVA on absolute differences. F-test identical to previous Levene's test.

ONEWAY adfat20 BY condition.

Result

As we see, these ANOVA results are identical to Levene’s test in the previous output. I hope this clarifies why we report it as an ANOVA as well.

Thanks for reading!