SPSS Tutorials


Pearson Correlation Coefficient – Simple Tutorial

“Correlation coefficient” -unless otherwise specified- normally refers to the “Pearson correlation” (or officially: the product moment correlation coefficient or PMCC): The Pearson correlation is a number between -1 and +1 that indicates the extent to which two variables are linearly related. Pearson correlations are suitable only for metric variables.However, see Assumption of Equal Intervals.

Correlation Coefficient - Example

We asked 40 freelancers for their yearly incomes over 2010 through 2014. Part of the raw data are shown below.

Correlation Coefficient - Data View

Question: is there any relation between income over 2010 and income over 2011? Well, a splendid way for finding out is inspecting a scatterplot for these two variables: we'll represent each freelancer by a dot. The horizontal and vertical positions of each dot indicate a freelancer’s income over 2010 and 2011. The result is shown below.

Correlation Coefficient - Scatterplot

Our scatterplot shows a strong relation between income over 2010 and 2011: freelancers who had a low income over 2010 (leftmost dots) typically had a low income over 2011 as well (lower dots) and vice versa. Furthermore, this relation is roughly linear; the main pattern in the dots is a straight line.
The extent to which our dots lie on a straight line indicates the strength of the relation. The Pearson correlation is a number that indicates the exact strength of this relation.

Correlation Coefficients and Scatterplots

A correlation coefficient indicates the extent to which dots in a scatterplot lie on a straight line. This implies that we can usually estimate correlations pretty accurately from nothing more than scatterplots. (An exception are scatterplots with may ties, which we'll discuss later.) The figure below, in which the correlation coefficient is denoted by “r” nicely illustrates this point.

Correlation Coefficient - Multiple Scatterplots

Correlation Coefficient - Basics

Some basic points regarding correlation coefficients are nicely illustrated by the previous figure. The least you should know is that

Correlation Coefficient - Perfect Linear Relations

Correlation Coefficient - Interpretation Caveats

When interpreting correlations, you should keep some things in mind. An elaborate discussion deserves a separate tutorial but we'll briefly mention two main points.

Correlation Coefficient - Software

Most spreadsheet editors such as MS Excel, GoogleDocs and OpenOffice can compute correlations for you. The illustration below shows an example in GoogleDocs.

Correlation Coefficient in Google Sheet

Correlation Coefficient - Correlation Matrix

Keep in mind that correlations apply to pairs of variables. If you're interested in more than 2 variables, you'll probably want to take a look at the correlations between all different variable pairs. These correlations are usually shown in a square table known as a correlation matrix. Statistical software packages such as SPSS create correlations matrices before you can blink your eyes. An example is shown below.

Correlation Coefficient - SPSS Correlation Matrix

Note that the diagonal elements (in red) are the correlations between each variable and itself. This is why they are always 1.
Also note that the correlations beneath the diagonal (in grey) are redundant because they're identical to the correlations above the diagonal. Technically, we say that this is a symmetrical matrix.
Finally, note that the pattern of correlations makes perfect sense: correlations between yearly incomes become lower insofar as these years lie further apart.

Pearson Correlation - Formula

If we want to inspect correlations, we'll have a computer calculate them for us. You'll rarely (probably never) need the actual formula. However, for the sake of completeness, a Pearson correlation between variables X and Y is calculated by
$$r_{XY} = \frac{\sum_{i=1}^n(X_i - \overline{X})(Y_i - \overline{Y})}{\sqrt{\sum_{i=1}^n(X_i - \overline{X})^2}\sqrt{\sum_{i=1}^n(Y_i - \overline{Y})^2}}$$
The formula basically comes down to dividing the covariance by the product of the standard deviations. Since a coefficient is a number divided by some other number, our formula shows why we speak of a correlation coefficient.

Previous tutorial: SPSS One Sample Chi-Square Test

Next tutorial: SPSS Correlation Analyis Tutorial

Let me know what you think!

*Required field. Your comment will show up after approval from a moderator.

This tutorial has 7 comments