Normal Distribution (Statistics)

Normal Distribution – Quick Introduction

Normal Distribution - General Formula
Standard Normal Distribution
Normal Distribution - Basic Properties
Finding Probabilities from a Normal Distribution
Finding Critical Values from an Inverse Normal Distribution
Are my Variables Normally Distributed?

Definition

The normal distribution is the probability density function defined by
$$f(x) = \frac{1}{\sigma\sqrt{2\pi}}\cdot e^{\dfrac{(x - \mu)^2}{-2\sigma^2}}$$

This results in a symmetrical curve like the one shown below.

The surface areas under this curve give us the percentages -or probabilities- for any interval of values. Assuming that these IQ scores are normally distributed with a population mean of 100 and a standard deviation of 15 points:

34.1% of all people score between 85 and 100 points;
15.9% of all people score 115 points or more;
a random person has a 50% (or 0.50) probability of scoring 100 points or lower.

In statistics, the normal distribution plays 2 important roles:

a frequency distribution (values over observations): for example, IQ scores are roughly normally distributed over a population of people.
a sampling distribution (statistic over samples): proportions and means are roughly normally distributed over samples. From this normal distribution we can look up the probability for any observed sample mean or proportion.Strictly, we always look up probabilities for ranges rather than separate outcomes. This is basically statistical significance.

Normal Distribution - General Formula

The general formula for the normal distribution is $$f(x) = \frac{1}{\sigma\sqrt{2\pi}}\cdot e^{\dfrac{(x - \mu)^2}{-2\sigma^2}}$$

where

$\sigma$ (“sigma”) is a population standard deviation;
$\mu$ (“mu”) is a population mean;
$x$ is a value or test statistic;
$e$ is a mathematical constant of roughly 2.72;
$\pi$ (“pi”) is a mathematical constant of roughly 3.14.

The “normal curve” results from plotting $f(x)$ -probability density- for a number of $x$ values. Its horizontal position is set by $\mu$, its width and height by $\sigma$. The figure below gives some examples.

Normal Distribution for Different Means and Standard Deviations

As with all probability density functions, the formula does not return probabilities. In order to find these, we need to find the surface areas for ranges of $x$ values as shown below.

So how to find the probability for any range of values? Well, you could manually compute it from an integral over the normal distribution formula. An easier option, however, is to look it up in Googlesheets as we'll show later on.

Standard Normal Distribution

The standard normal distribution is a normal distribution
with μ = 0 and σ = 1. Filling in these numbers into the general formula simplifies it to
$$f(x) = \frac{1}{\sqrt{2\pi}}\cdot e^{\dfrac{x^2}{-2}}$$

The standard normal distribution is the only normal distribution we really need. Why? Well, we can use a normal distribution to look up a probability for $x$ if

$x$ is normally distributed and
we know its population mean μ and
we know its population standard deviation σ.

With these 3 numbers we could also compute a z-score: $$z = \frac{x - \mu}{\sigma}$$

The result of doing so is that $z$ is given a standard of μ = 0 and σ = 1. So if $x$ follows a normal distribution then $z$ follows a standard normal distribution.
Converting $x$ into $z$ may seem theoretical. However, this is exactly what happens if we run a t-test or a z-test. Keep in mind that computing $z$ or standardizing values does not “normalize” them in any way. That is, $z$ only follows a standard normal distribution if $x$ is normally distributed.

Normal Distribution - Basic Properties

Before we look up some probabilities in Googlesheets, there's a couple of things we should know:

the normal distribution always runs from $-\infty$ to $\infty$;
the total surface area (= probability) of a normal distribution is always exactly 1;
the normal distribution is exactly symmetrical around its mean $\mu$ and therefore has zero skewness;
due to its symmetry, the median is always equal to the mean for a normal distribution;
the normal distribution always has a kurtosis of zero.

Finding Probabilities from a Normal Distribution

This Googlesheet (read-only) shows how to find probabilities from a normal distribution.

Normal Distribution Find Probabilities Googlesheets

Simply type =norm.dist(a,b,c,true) into some cell and

replace a by some x or z-value;
replace b by the population mean μ;
replace c by the population standard deviation σ.

This results in a left tail probability. Like so, the highlighted example tells us that there's a 0.159 -roughly 16%- probability that z < -1 if z is normally distributed with μ = 0 and σ = 1.

Because the surface area -or total probability- is always 1, we can find any right tail probability with

$p(X \gt x) = 1 - p(X \lt x)$
Like so, the probability that z > -1 is (1 - 0.159 =) 0.841.

And what about the probability that x is between -2 and -1? Or -formally- p(-2 < X < -1)? Well,

$p(x_a \lt X \lt x_b) = p(X \lt x_b) - p(X \lt x_a)$
so that'll be (0.159 - 0.023 =) 0.136 or 13.6% as shown below.

If you're not sure you master this, try and compute each of the percentages shown above for yourself in an empty Googlesheet.

Finding Critical Values from an Inverse Normal Distribution

The normal distribution tells us probabilities for ranges of values. These are needed for testing null hypotheses.
The inverse normal distribution tells us ranges of values for probabilities. These are needed for computing confidence intervals.

This Googlesheet (read-only) illustrates how to find critical values for a normally distributed variable.

Normal Distribution Find Critical Values Googlesheets

Simply type =norminv(a,b,c) into some cell and

replace a by the left tail probability;
replace b by the population mean μ (usually 0);
replace c by the population standard deviation σ (usually 1);

Keep in mind that the probability of not including some parameter is evenly divided over both tails. It is 0.05 for a 95% confidence interval. This 0.05 is divided into a left tail of 0.025 and a right tail of 0.025.
For a standard normal distribution, this results in -1.96 < Z < 1.96. The figure below illustrates how this works.

Standard Normal Distribution With Critical Values

The exact critical values shown here are all computed in this Googlesheet (read-only).

Are my Variables Normally Distributed?

Many statistical procedures such as ANOVA, t-tests, regression and others require the normality assumption: variables must be normally distributed in the population. This assumption is only needed for small sample sizes of, say, N < 25 or so. For larger samples, the central limit theorem renders most tests robust to violations of normality -but let's discuss that some other day.
Anyway. If a variable is normally distributed in some population, then it should be roughly normally distributed in some sample as well. A first check -simple and solid- is inspecting its frequency distribution from a histogram.

In SPSS, we can very easily add normal curves to histograms. This normal curve is given the same mean and SD as the observed scores. It quickly shows how (much) the observed distribution deviates from a normal distribution.

A second check is inspecting descriptive statistics, notably skewness and kurtosis. Some basic properties of the normal distribution are that

its skewness is exactly 0 and
its kurtosis is exactly 0 too.

If this is true in some population, then observed variables should probably not have large (absolute) skewnesses or kurtoses. The example table below highlights some striking deviations from this. They suggest that reaction times 2, 3 and 5 are probably not normally distributed in some population.

Table Showing Descriptive Statistics Including Skewness and Kurtosis

Last, there's 2 normality tests: statistical tests for evaluating population normality. These are the

Kolmogorov-Smirnov normality test and the
Shapiro-Wilk normality test.

Both tests serve the exact same purpose: they test the null hypothesis that a variable is normally distributed in some population.

Table Showing Shapiro-Wilk Normality Test Results

Sadly, both tests have low power in small sample sizes -precisely when normality is really needed. This means they may not reject normality even if it doesn't hold. Like so, they may create a false sense of security and we therefore don't recommend them.

Thanks for reading!

Null Hypothesis – Simple Introduction

A null hypothesis is a precise statement about a population
that we try to reject with sample data. We don't usually believe our null hypothesis (or H₀) to be true. However, we need some exact statement as a starting point for statistical significance testing.

The Null Hypothesis is the Starting Point for Statistical Significance Testing

Null Hypothesis Examples

Often -but not always- the null hypothesis states there is no association or difference between variables or subpopulations. Like so, some typical null hypotheses are:

the correlation between frustration and aggression is zero (correlation analysis);
the average income for men is similar to that for women (independent samples t-test);
Nationality is (perfectly) unrelated to music preference (chi-square independence test);
the average population income was equal over 2012 through 2016 (repeated measures ANOVA).

“Null” Does Not Mean “Zero”

A common misunderstanding is that “null” implies “zero”. This is often but not always the case. For example, a null hypothesis may also state that the correlation between frustration and aggression is 0.5. No zero involved here and -although somewhat unusual- perfectly valid.
The “null” in “null hypothesis” derives from “nullify”⁵: the null hypothesis is the statement that we're trying to refute, regardless whether it does (not) specify a zero effect.

Null Hypothesis Testing -How Does It Work?

I want to know if happiness is related to wealth among Dutch people. One approach to find this out is to formulate a null hypothesis. Since “related to” is not precise, we choose the opposite statement as our null hypothesis: the correlation between wealth and happiness is zero among all Dutch people. We'll now try to refute this hypothesis in order to demonstrate that happiness and wealth are related all right.
Now, we can't reasonably ask all 17,142,066 Dutch people how happy they generally feel.

So we'll ask a sample (say, 100 people) about their wealth and their happiness. The correlation between happiness and wealth turns out to be 0.25 in our sample. Now we've one problem: sample outcomes tend to differ somewhat from population outcomes. So if the correlation really is zero in our population, we may find a non zero correlation in our sample. To illustrate this important point, take a look at the scatterplot below. It visualizes a zero correlation between happiness and wealth for an entire population of N = 200.

Null Hypothesis - Population Scatterplot

Now we draw a random sample of N = 20 from this population (the red dots in our previous scatterplot). Even though our population correlation is zero, we found a staggering 0.82 correlation in our sample. The figure below illustrates this by omitting all non sampled units from our previous scatterplot.

This raises the question how we can ever say anything about our population if we only have a tiny sample from it. The basic answer: we can rarely say anything with 100% certainty. However, we can say a lot with 99%, 95% or 90% certainty.

Probability

So how does that work? Well, basically, some sample outcomes are highly unlikely given our null hypothesis. Like so, the figure below shows the probabilities for different sample correlations (N = 100) if the population correlation really is zero.

Null Hypothesis - Sampling Distribution for Correlation

A computer will readily compute these probabilities. However, doing so requires a sample size (100 in our case) and a presumed population correlation ρ (0 in our case). So that's why we need a null hypothesis.
If we look at this sampling distribution carefully, we see that sample correlations around 0 are most likely: there's a 0.68 probability of finding a correlation between -0.1 and 0.1. What does that mean? Well, remember that probabilities can be seen as relative frequencies. So imagine we'd draw 1,000 samples instead of the one we have. This would result in 1,000 correlation coefficients and some 680 of those -a relative frequency of 0.68- would be in the range -0.1 to 0.1. Likewise, there's a 0.95 (or 95%) probability of finding a sample correlation between -0.2 and 0.2.

P-Values

We found a sample correlation of 0.25. How likely is that if the population correlation is zero? The answer is known as the p-value (short for probability value): A p-value is the probability of finding some sample outcome or a more extreme one if the null hypothesis is true. Given our 0.25 correlation, “more extreme” usually means larger than 0.25 or smaller than -0.25. We can't tell from our graph but the underlying table tells us that p ≈ 0.012. If the null hypothesis is true, there's a 1.2% probability of finding our sample correlation.

Conclusion?

If our population correlation really is zero, then we can find a sample correlation of 0.25 in a sample of N = 100. The probability of this happening is only 0.012 so it's very unlikely. A reasonable conclusion is that our population correlation wasn't zero after all.
Conclusion: we reject the null hypothesis. Given our sample outcome, we no longer believe that happiness and wealth are unrelated. However, we still can't state this with certainty.

Null Hypothesis - Limitations

Thus far, we only concluded that the population correlation is probably not zero. That's the only conclusion from our null hypothesis approach and it's not really that interesting.
What we really want to know is the population correlation. Our sample correlation of 0.25 seems a reasonable estimate. We call such a single number a point estimate.
Now, a new sample may come up with a different correlation. An interesting question is how much our sample correlations would fluctuate over samples if we'd draw many of them. The figure below shows precisely that, assuming our sample size of N = 100 and our (point) estimate of 0.25 for the population correlation.

Null Hypothesis - Sampling Distribution Under Alternative Hypothesis

Confidence Intervals

Our sample outcome suggests that some 95% of many samples should come up with a correlation between 0.06 and 0.43. This range is known as a confidence interval. Although not precisely correct, it's most easily thought of as the bandwidth that's likely to enclose the population correlation.
One thing to note is that the confidence interval is quite wide. It almost contains a zero correlation, exactly the null hypothesis we rejected earlier.
Another thing to note is that our sampling distribution and confidence interval are slightly asymmetrical. They are symmetrical for most other statistics (such as means or beta coefficients) but not correlations.

References

Agresti, A. & Franklin, C. (2014). Statistics. The Art & Science of Learning from Data. Essex: Pearson Education Limited.
Cohen, J (1988). Statistical Power Analysis for the Social Sciences (2nd. Edition). Hillsdale, New Jersey, Lawrence Erlbaum Associates.
Field, A. (2013). Discovering Statistics with IBM SPSS Newbury Park, CA: Sage.
Howell, D.C. (2002). Statistical Methods for Psychology (5th ed.). Pacific Grove CA: Duxbury.
Van den Brink, W.P. & Koele, P. (2002). Statistiek, deel 3 [Statistics, part 3]. Amsterdam: Boom.