ANOVA is a statistical technique for testing whether different groups have different means on some metric variable.
In short, ANOVA means **an**alysis **o**f **va**riance and it tests whether a number of means are equal. So what do means have to do with variance? That's a good question and the answer will become very clear in a few minutes.

## ANOVA - Example

There are 3 different schools in some area and each has exactly 1,000 students. We'd like to know whether the students from these schools score similarly on an IQ test. Because the tests are costly to administer, we sample 10 students from each school. Part of the data we collect are shown below.

## ANOVA - Basic Question

If the mean IQ scores are exactly equal for our 3 schools, then we may still find different mean IQ scores in our data. This is due to drawing small random samples. You can compare this sampling process to flipping a balanced coin 10 times: you probably won't throw *exactly* 5 heads and 5 tails on each try.

The basic question ANOVA tries to answer is: **can different sample means be attributed merely to random sampling?** Given the differences between our sample means, is the hypothesis tenable that the population means are equal?

## ANOVA - Descriptive Statistics

Answering this question with an ANOVA requires **three pieces of information for each group**: the sample mean, the sample size and the sample variance. The table below is all we need for running an ANOVA.

A bit later on, we'll combine these outcomes into a single number, the F statistic that indicates how likely the hypothesis of equal population means is. For now, let's first see why these outcomes are relevant in the first place.

## ANOVA - Sample Means

We see that our sample means fluctuate between 93 and 113 IQ points. Is that normal if our population means are equal? For answering that, we need to quantify the difference between these means. For 2 schools, we could simply take the difference between the two means like we do in an independent samples t test.
For 3 schools, however, we actually have 3 differences between means: there's a difference between Schools A and B, A and C and B and C. Now remember that there's a number that indicates how far a set of numbers lie apart: the variance. The **variance between the 3 sample means quantifies how different they are** in a single number; the larger this variance, the more they differ and the less plausible that the population means are really equal. In our case, this between groups variance is roughly 98.

## ANOVA - Sample Variances

If population means are equal, then sample means may still differ somewhat. So how much variance between sample means can we expect? Well, that depends on the variance *within* each group; if IQ scores lie very far apart (say the variance is 500) for some school, then the mean IQ score in a sample of n = 10 may be quite different from the population mean. If a second school has a variance of only 100 between IQ scores, then the sample mean based on n = 10 will probably differ less from the population mean.

This is not the easiest thing to understand and it may need to sink in a bit. So let's consider three scenarios: three schools with 1,000 students have a variance of 50, 100 or 500 between IQ scores. The histograms below show the population distributions. Each of the 9 histograms is based on 1,000 students and has a mean IQ score of exactly 100 points.

First consider the first row (variance is 50). If we'd draw a sample of 10 students from each school, should we expect very different sample means? No. We shouldn't. Because individual IQ scores for each school don't vary much, there isn't a lot of room for the sample means to deviate from 100.

The opposite holds for the bottom row (variance = 500): because the IQ scores lie far apart, the sample means will probably differ somewhat more from their population counterparts of 100.

In short, if the **variances within** groups are smaller, then we expect a smaller **variance between** group means too. This is basically “analysis of variance” tests whether different groups have different means.

## ANOVA - Sample Sizes

Finally, sample sizes are required for ANOVA. As sample sizes increase, the sample means will converge towards their population counterparts. In short, **larger samples provide stronger evidence** for rejecting or accepting the hypothesis of equal population variances.

## ANOVA - The F Statistic

Like we announced previously, we'll combine the variance between the sample means, the variances within our groups and the sample sizes into a single number, the F statistic. The flowchart below illustrates the basic idea.

## ANOVA - What is a Large F Value?

Larger F values are stronger evidence that population means were not equal. In our example, F = 6.2 as we'll see a bit later. So is that large enough for rejecting the hypothesis of equal population means?

One approach to the question is a data simulation: from 3 populations with equal means, we drew 1,000 samples of n = 10 and calculated the F values. The result is shown below.

The chart estimates the probabilities for finding different F values when population means are equal. In our case, 95% of our 1,000 F values are smaller than 3.45. F values larger than 3.45 are rare. So if we don't know the population means and find F = 6.2, we'll probably conclude that the population means aren't equal after all.

## ANOVA - Degrees of Freedom

Now, instead of simulating probabilities for different F values, we rather calculate them. The (complex) formula for doing so requires 3 numbers: an F value, df1 and df2. Df1 denotes **degrees of freedom for the enumerator** and is equal to (3 groups - 1 =) 2.

Df2 denotes **degrees of freedom for the denominator** and is (30 respondents - 3 groups =) 27. We'll get to why they refer to an enumerator and denominator in a minute. For now, the graph below shows the exact relation between F values and probabilities.

Note two things: first, our simulation histogram nicely follows this curve. Second, 95% of all F values are smaller than 3.35 rather than 3.45 as our simulation suggested.

## ANOVA - Computer Output

Right, we'll now have our software (SPSS or some other package) run an ANOVA on our data. The main output is shown below. The only number that *really* matters is - the p value - but we'll explain a bit more of the output.

**Mean Square Between** groups (= 978) is the variance between the 3 sample mean IQ scores, multiplied by 10 because each mean is based on 10 respondents.

**Mean Square Within** (= 159) is the average variance within each sample.

The **F statistic** = 6.2.

“Sig” denotes the **p value** (= 0.006). If population means are really equal, we have a 0.006 chance of finding an F value of 6.2 or larger. It seems that the population means aren't equal after all.

**Df1** (= 2) are the **degrees of freedom of the enumerator**. This is because the F statistic is calculated as

$$F = \frac{Mean\;Square\;Between}{Mean\;Square\;Within}$$

and “Mean Square Between” is the enumerator in this division.

The F formula confirms that F becomes larger as 1) the between groups variance becomes larger or 2) the variance within groups becomes smaller as we showed with a diagram.

**Df2** (= 27) denotes **degrees of freedom of the denominator** because “Mean Square Within” is the denominator in the aforementioned division.

## ANOVA - Final Notes

This tutorial aimed to explain the basic idea behind ANOVA. The example we used illustrates the most simple case of ANOVA: one between subjects factor (school) with 3 levels (our 3 schools).

Next to between subjects factors, we may also have within subjects factors. The difference is beyond the scope of this tutorial (but see SPSS Repeated Measures ANOVA). In practice, we often have more than one factor, resulting in a factorial ANOVA. We'll soon write a tutorial on that as well.

We hope you found this tutorial useful!

## This tutorial has 6 comments

## By tejaswi on March 17th, 2018

Thank u its helping me