SPSS TUTORIALS BASICS ANOVA REGRESSION FACTOR CORRELATION

# Simple Introduction to Confidence Intervals

A confidence interval is a range of values
that encloses a parameter with a given likelihood.
So let's say we've a sample of 200 people from a population of 100,000. Our sample data come up with a correlation of 0.41 and indicate that the 95% confidence interval for this correlation
runs from 0.29 to 0.52.
This means that

• the range of values -0.29 through 0.52-
• has a 95% likelihood
• of enclosing the parameter -the correlation for the entire population- that we'd like to know.

So basically, a confidence interval tells us how much our sample correlation is likely to differ from the population correlation we're after.

## Confidence Intervals - Example

El Hierro is the smallest Canary island and has 8,077 inhabitants of 18 years or over. A scientist wants to know their average yearly income. He asks a sample of N = 100. The table below presents his findings.

Based on these 100 people, he concludes that the average yearly income for all 8,077 inhabitants is probably between $25,630 and$32,052. So how does that work?

## Confidence Intervals - How Does it Work?

Let's say the tax authorities have access to the yearly incomes of all 8,077 inhabitants. The table below shows some descriptive statistics.

Now, a scientist who samples 100 of these people can compute a sample mean income. This sample mean probably differs somewhat from the $32,383 population mean. Another scientist could also sample 100 people and come up with another different mean. And so on: if we'd draw 100 different samples, we'd probably find 100 different means. In short, sample means fluctuate over samples. So how much do they fluctuate? This is expressed by the standard deviation of sample means over samples, known as the standard error -SE- of the mean. SE is calculated as $$SE = \frac{\sigma}{\sqrt{N}}$$ so for our data that'll be $$SE = \frac{22,874}{\sqrt{100}} = 2,287.$$ Right. Now, statisticians also figured out the exact frequency distribution of sample means: the sampling distribution of the mean. For our data, it's shown below. Our graph tells us that 95% of all samples will come up with a mean between roughly$27,808 and $36,958. This is basically the mean ± 2SE: • the lower bound is roughly$32,383 - 2 · $2,287 =$27,808 and

• ### By Ruben Geert van den Berg on September 14th, 2021

Hi Kostiantyn!

This is because we had the bad luck to sample some people having much lower incomes than average.

A 95% CI not enclosing a population means happens in 5% of all samples.

This is basically just the uncertainty that inferential statistics is attempting to manage...

Hope that helps!

SPSS tutorials

• ### By Kostiantyn on September 14th, 2021

Thank you Ruben!
You made a great tutorial

• ### By jenny som on September 24th, 2021

I love this text. Simple and complete. Thank you!!!