Power (Statistics) - The Ultimate Beginners Guide

Power (Statistics) – The Ultimate Beginners Guide

In statistics, power is the probability of rejecting
a false null hypothesis.

Power Calculation Example
Power & Alpha Level
Power & Effect Size
Power & Sample Size
3 Main Reasons for Power Calculations
Software for Power Calculations - G*Power

Power - Minimal Example

In some country, IQ and salary have a population correlation ρ = .10.
A scientist examines a sample of N = 10 people and finds a sample correlation r = .15.
He tests the (false) null hypothesis H₀ that ρ = 0. The significance level for this test, p = .68.
Since p > .05, his chosen alpha level, he does not reject his (false) null hypothesis that ρ = 0.

Now, given a sample size of N = 10 and a population correlation ρ = 0.10, what's the probability of correctly rejecting the null hypothesis? This probability is known as power and denoted as (1 - β) in statistics. For the aforementioned example, (1 - β) is only .058 (roughly 6%) as shown below.

If a population correlation ρ = .10 and
we sample N = 10 respondents, then
we need to find an absolute sample correlation of | r | > .63 for rejecting H₀ at α = .05.
The probability of finding this is only .058.

So even though H₀ is false, we're unlikely to actually reject it. Not rejecting a false H₀ is known as a committing a type II error.

Type I and Type II Errors

Any null hypothesis may be true or false and we may or may not reject it. This results in the 4 scenarios outlined below.

	Reality: H₀ is true	Reality: H₀ is false
Decision: reject H₀	Type I error Probability = α	Correct decision Probability = (1 - β) = power
Decision: retain H₀	Correct decision Probability = (1 - α)	Type II error Probability = β

As you probably guess, we usually want the power for our tests to be as high as possible. But before taking a look at factors affecting power, let's first try and understand how a power calculation actually works.

Power Calculation Example

A pharmaceutical company wants to demonstrate that their medicine against high blood pressure actually works. They expect the following:

the average blood pressure in some untreated population is 160 mmHg;
they expect their medicine to lower this to roughly 154 mmHg;
the standard deviation should be around 8 mmHg (both populations);
they plan to use an independent samples t-test at α = 0.05 with N = 20 for either subsample.

Given these considerations, what's the power for this study? Or -alternatively- what's the probability of rejecting H₀ that the mean blood pressure is equal between treated and untreated populations?

Obviously, nobody knows the outcomes for this study until it's finished. However, we do know the most likely outcomes: they're our population estimates. So let's for a moment pretend that we'll find exactly these and enter them into a t-test calculator.

Compute t-test for expected sample sizes, means and SD's in Excel

We expect p = 0.023 so we expect to reject H₀.
This is based on a t-distribution with df = 38 degrees of freedom (total sample size N = 40 - 2).
We expect to find t = 2.37 if the population mean difference is 6 mmHg (160 - 154).

Now, this expected (or average) t = 2.37 under the alternative hypothesis H_a is known as a noncentrality parameter or NCP. The NCP tells us how t is distributed under some exact alternative hypothesis and thus allows us to estimate the power for some test. The figure below illustrates how this works.

Central Noncentral T-Distribution For Power

First off, our H₀ is tested using a central t-distribution with df = 38;
If we test at α = 0.05 (2-tailed), we'll reject H₀ if t < -2.02 (left critical value) or if t > 2.02 (right critical value);
If our alternative hypothesis H_A is exactly true, t follows a noncentral t-distribution with df = 38 and NCP = 2.37;
Under this noncentral t-distribution, the probability of finding t > 2.02 ≈ 0.637. So this is roughly the probability of rejecting H₀ -or the power (1 - β)- for our first scenario.

A minor note here is that we'd also reject H₀ if t < -2.02 but this probability is almost zero for our first scenario. The exact calculation can be replicated from the SPSS syntax below.

*Enter chosen alpha and expected NCP as raw data.
data list free/alpha ncp.
begin data
0.05 2.37
end data.

*Compute left (lct) and right (rct) critical t-values and power.
compute lct = idf.t(0.5 * alpha,38).
compute rct = idf.t(1 - (0.5 * alpha),38).
compute lprob = ncdf.t(lct,38,ncp).
compute rprob = 1 - ncdf.t(rct,38,ncp).
compute power = lprob + rprob.
execute.

*Show 3 decimal places for all values.
formats all (f8.3).

Power and Effect Size

Like we just saw, estimating power requires specifying

an exact null hypothesis and
an exact alternative hypothesis.

In the previous example, our scientists had an exact alternative hypothesis because they had very specific ideas regarding population means and standard deviations. In most applied studies, however, we're pretty clueless about such population parameters. This raises the question how do we get an exact alternative hypothesis?

For most tests, the alternative hypothesis can be specified as an effect size measure: a single number combining several means, variances and/or frequencies. Like so, we proceed from requiring a bunch of unknown parameters to a single unknown parameter.

What's even better: widely agreed upon rules of thumb are available for effect size measures. An overview is presented in this Googlesheet, partly shown below.

In applied studies, we often use G*Power for estimating power. The screenshot below replicates our power calculation example for the blood pressure medicine study.

Gpower Example Independent Samples T-Test

G*Power computes both effect size and power from two means and SD's

Note that estimating power in G*Power only requires

a single estimated effect size measure. Optionally, G*Power computes it for you, given your sample means and SD's.
the alpha level -often 0.05- used for testing the null hypothesis &
one or more sample sizes

Let's now take a look at how these 3 factors relate to power.

Factors Affecting Power

The figure below gives a quick overview how 3 factors relate to power.

Let's now take a closer look at each of them.

Power & Alpha Level

Everything else equal, increasing alpha increases power. For our example calculation, power increases from 0.637 to 0.753 if we test at α = 0.10 instead of 0.05.

Sampling Distributions Power Versus Alpha

A higher alpha level results in smaller (absolute) critical values: we already reject H₀ if t > 1.69 instead of t > 2.02. So the light blue area, indicating (1 - β), increases. We basically require a smaller deviation from H₀ for statistical significance.

However, increasing alpha comes at a cost: it increases the probability of committing a type I error (rejecting H₀ when it's actually true). Therefore, testing at α > 0.05 is generally frowned upon. In short, increasing alpha basically just decreases one problem by increasing another one.

Power & Effect Size

Everything else equal, a larger effect size results in higher power. For our example, power increases from 0.637 to 0.869 if we believe that Cohen’s D = 1.0 rather than 0.8.

Power Versus Effect Size Sampling Distributions

A larger effect size results in a larger noncentrality parameter (NCP). Therefore, the distributions under H₀ and H_A lie further apart. This increases the light blue area, indicating the power for this test.

Keep in mind, though, that we can estimate but not choose some population effect size. If we overestimate this effect size, we'll overestimate the power for our test accordingly. Therefore, we can't usually increase power by increasing an effect size.

An arguable exception is increasing an effect size by modifying a research design or analysis. For example, (partial) eta squared for a treatment effect in ANOVA may increase by adding a covariate to the analysis.

Power & Sample Size

Everything else equal, larger sample size(s) result in higher power. For our example, increasing the total sample size from N = 40 to N = 80 increases power from 0.637 to 0.912.

Power Versus Sample Size Sampling Distributions

The increase in power stems from our distributions lying further apart. This reflects an increased noncentrality parameter (NCP). But why does the NCP increase with larger sample sizes?

Well, recall that for a t-distribution, the NCP is the expected t-value under H_A. Now, t is computed as

$$t = \frac{\overline{X_1} - \overline{X_2}}{SE}$$

where $SE$ denotes the standard error of the mean difference. In turn, $SE$ is computed as

$$SE = Sw\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}$$

where $S_w$ denotes the estimated population SD of the outcome variable. This formula shows that as sample sizes increase, $SE$ decreases and therefore t (and hence the NCP) increases.

On top of this, degrees of freedom increase (from df = 38 to df = 78 for our example). This results in slightly smaller (absolute) critical t-values but this effect is very modest.

In short, increasing sample size(s) is a sound way to increase the power for some test.

Power & Research Design

Apart from sample size, effect size & α, research design may also affect power. Although there's no exact formulas, some general guidelines are that

everything else equal, within-subjects designs tend to have more power than between-subjects designs;
for ANCOVA, including one or two covariates tends to increase power for demonstrating a treatment effect;
for multiple regression, power for each separate predictor tends to decrease as more predictors are added to the model;

3 Main Reasons for Power Calculations

Power calculations in applied research serve 3 main purposes:

compute the required sample size prior to data collection. This involves estimating an effect size and choosing α (usually 0.05) and the desired power (1 - B), often 0.80;
estimate power before collecting data for some planned analyses. This requires specifying the intended sample size, choosing an α and estimating which effect sizes are expected. If the estimated power is low, the planned study may be cancelled or proceed with a larger sample size;
estimate power after data have been collected and analyzed. This calculation is based on the actual sample size, α used for testing and observed effect size.

Different types of power analysis are made simple by G*Power

Software for Power Calculations - G*Power

G*Power is freely downloadable software for running the aforementioned and many other power calculations. Among its features are

computing effect sizes from descriptive statistics (mostly sample means and standard deviations);
computing power, required sample sizes, required effect sizes and more;
creating plots that visualize how power, effect size and sample size relate for many different statistical procedures. The figure below shows an example for multiple linear regression.

Linear Regression Power Sample Size Plot

Required sample sizes for multiple linear regression, given desired power,
chosen α and 3 estimated effect sizes

Altogether, we think G*Power is amazing software and we highly recommend using it. The only disadvantage we can think of is that it requires rather unusual effect size measures. Some examples are

Cohen’s f for ANOVA and
Cohen’s W for a chi-square test.

This is awkward because the APA and (perhaps therefore) most journal articles typically recommend reporting

(partial) eta-squared for ANOVA and
the contingency coefficient or (better) Cramér’s V for a chi-square test.

These are also the measures we typically obtain from statistical packages such as SPSS or JASP. Fortunately, G*Power converts some measures and/or computes them from descriptive statistics like we saw in this screenshot.

Software for Power Calculations - SPSS

In SPSS, observed power can be obtained from the GLM, UNIANOVA and (deprecated) MANOVA procedures. Keep in mind that GLM - short for General Linear Model- is very general indeed: it can be used for a wide variety of analyses including

(multiple) linear regression;
t-tests;
ANCOVA (analysis of covariance);
repeated measures ANOVA.

Select Observed power from Analyze - General Linear Model -
Univariate - Options

Other power calculations (required sample sizes or estimating power prior to data collection) were added to SPSS version 27, released in 2020.

Power Analysis as found in SPSS version 27 onwards

In my opinion, SPSS power analysis is a pathetic attempt to compete with G*Power. If you don't believe me, just try running a couple of power analyses in both programs simultaneously. If you do believe me, ignore SPSS power analysis and just go for G*Power.

Thanks for reading.

Probability Density Functions – Simple Tutorial

A probability density function is a function from which
probabilities for ranges of outcomes can be obtained.

Probability Density Functions - Basic Rules
Cumulative Probability Density Functions
Inverse Cumulative Probability Density Functions
Differences Probability Density and Probability Distributions
Probability Density Functions in Applied Statistics

Example

The birth weights of mice follow a normal distribution, which is a probability density function. The population mean μ = 1 gram and the standard deviation σ = 0.25 grams. What's the probability that a newly born mouse
has a birth weight between 1.0 and 1.2 grams? The figure below shows how to obtain an approximate answer, using only the probability density curve we just described.

The probability is the surface area under the curve between 1.0 and 1.2 grams. It has a width of 0.2 grams and its average height -the probability density for this weight interval- is roughly 1.45. Therefore, the probability that a newborn mouse weighs between 1.0 and 1.2 grams is 1.45 · 0.2 = 0.29 -some 29%.

So What is Probability Density?

Probability density is probability per measurement unit. Our probability density of 1.45 means that the probability is 1.45 per gram -the measurement unit- over the interval between 1.0 and 1.2 grams. In contrast to probability, probability density can exceed 1 but only over an interval smaller than 1 measurement unit.

Compare this to population density: a population density of 100 inhabitants per square kilometer for some village doesn't imply that it has 100 inhabitants. If this village has a surface area of only 0.5 square kilometers, then it has (100 · 0.5 =) 50 inhabitants.

The screenshot below shows how to get a probability density from Excel or Google sheets.

Simply typing =NORM.DIST(1.1,1,0.25,FALSE) into some cell returns the probability density at x = 1.1, which is 1.473. The last argument, cumulative, refers to the cumulative density function which we'll discuss in a minute.

Anyway. In applied statistics, we're usually after probabilities instead of probability densities. So what good is a curve showing probability densities? Well, just like a histogram, it shows which ranges of values occur how often. Like so, it predicts what a histogram will look like if we actually draw a (reasonably large) sample.
The figure below illustrates just this: it shows a histogram for a sample of 10,000 mice with the assumed normal curve (in red) superimposed on it.

Histogram With Normal Curve Birth Weights

The normal curve (in red) predicts the shape of this histogram fairly precisely.

This curve -just a simple function- gives us a ton of information about our variable such as its

mean;
median;
standard deviation, skewness, kurtosis, interquartile range and more.

Probability Density Functions - Basic Rules

The mathematical definition of a probability density function is any function

whose surface area is 1 and
which doesn't return values < 0.

Furthermore,

probability density functions only apply to continuous variables and
the probability for any single outcome is defined as zero. Only ranges of outcomes have non zero probabilities.

So how do we usually obtain such probabilities in applied research? The easy way is using a cumulative probability density function.

Cumulative Probability Density Functions

A cumulative probability density function returns the probability
that an outcome is smaller than some value x. Such a probability -denoted as $P(X \lt x)$- is known as a cumulative probability.

Example: the birth weights of mice are normally distributed with μ = 1 and σ = 0.25 grams. What's the probability that a random mouse is born with a weight less than 0.75 grams? The figure below shows that this probability corresponds to the surface area left of 0.75 grams, which is 0.159 or 15.9%.

So how did we find this exact surface area? Well, the surface area left of any value can be computed with an integral:

$$F_{cpd}(x) = \int_{-\infty}^x F_{pd}(x)dx = P(X \lt x)$$

where

$F_{cpd}(x)$ denotes the cumulative probability density function;
$F_{pd}(x)$ denotes a probability density function and
$P(X \lt x)$ is the probability that an outcome $X \lt x$.

The figure below shows what a cumulative normal density function looks like.

Cumulative Probability Density Function Example Curve

Note that we can readily look up probabilities from this curve. However, we can't easily estimate this variable's mean, standard deviation or skewness from this curve. The main exception to this is its median of 1.0 grams.
Last but not least, the screenshot below shows how to obtain cumulative probabilities in Excel or Google Sheets.

Cumulative Probability Density Function Excel

If a variable is normally distributed with μ = 1 and σ = 0.25, then typing =NORM.DIST(0.75,1,0.25,TRUE) into some cell returns the probability that X < 0.75, which is 0.159.

Inverse Cumulative Probability Density Functions

An inverse cumulative probability density function returns
the value x for a given cumulative probability. Example: the birth weights of mice are normally distributed with μ = 1 and σ = 0.25 grams. Which birth weight separates the 10% lowest from the 90% highest birth weights? The figure below shows how to find this value in Excel: a birth weight less than 0.680 grams has a 0.1 or 10% probability of occurring.

Inverse Cumulative Probability Density Function Excel

Looking up this value from the inverse cumulative density in Excel is done by typing =NORM.INV(0.1,1,0.25) which returns a value (birth weight in this example) of 0.680.

Differences Probability Density and Probability Distributions

Probability density functions are often misreferred to as “probability-distributions”. This is confusing because they really are 2 different things:

probability density functions apply to continuous variables whereas probability distributions apply to discrete variables;
probability density functions return probability densities whereas probability distribution functions return probabilities;
by definition, separate outcomes have zero probabilities for probability density functions. For probability distributions, separate outcomes may have non zero probabilities.

A text book illustration of a true probability distribution is shown below: the outcome of a roll with a balanced die.

Uniform Probability Distribution Outcome Die Roll

Sadly, the SPSS manual abbreviates both density and distribution functions to “PDF” as shown below. Also note that the Bernoulli distribution -a probability distribution- is wrongfully listed under probability density functions.

Interestingly, cumulative probability density functions are comparable to cumulative probability functions. Both return cumulative probabilities: the probability that some outcome is equal to or smaller than some value x denoted as $P(X \le x)$.

Probability Density Functions in Applied Statistics

The big 4 probability density functions in applied statistics are

the normal distribution (normality assumption and z-tests);
the t-distribution (t-tests and regression coefficients);
the χ²-distribution (chi-square test and loglinear analysis);
the F-distribution (ANOVA, Levene's test).

These functions are used in different forms that serve different purposes:

1. Cumulative probability density functions return probabilities for ranges of outcomes. Two such types of probabilities are

statistical significance and
(1 - β) or power.

2. Inverse cumulative probability density functions return ranges of outcomes for (chosen) probabilities. Like so, they're used for constructing confidence intervals: ranges of values that enclose some parameter with a given likelihood, often 95%. Example: “the 95% confidence interval for the mean monthly salary runs from $2,300 through $2,450”.

3. Probability density functions are sometimes used to inspect statistical assumptions. Like so, the normality assumption can be evaluated by superimposing a normal curve over a histogram of observed values like we saw here. Alternatives for testing for normality are

the Shapiro-Wilk test and
the Kolmogorov-Smirnov test.

Right. I guess that's basically it regarding probability density functions. Let us know if you found this tutorial helpful by throwing a comment below.

Thanks for reading!

Percentiles – Quick Introduction & Examples

The nth percentile is the value that separates
the lowest n% of values from the other values.

Example: the 10th percentile for body weight is 60 kilos. This means that 10% of all people weigh less than 60 kilos and 90% of people weigh more.

Percentiles - Simple Example
Percentiles - Interpolation Formula
PERCENTILE.EXC or PERCENTILE.INC?
Percentiles in SPSS
Quartiles, Median & Boxplots

Percentiles - Simple Example

Some fishermen catch and measure 100 trouts. The data thus obtained are in this Googlesheet, partly shown below.

So what's the 10th percentile for the length of these trouts? For our 100 observations, this is super easy. We simply

sort our lengths ascendingly;
rank our lengths while ignoring ties (values that occur more than once);
find the length between observations 10 (10% of 100 observations) and 11 (the next observation).

As shown in the screenshot above, observations 10 and 11 both have a length of 31 centimeters. This is the 10th percentile for length as either Excel or SPSS will readily confirm.

Sadly, things are rarely that simple with real life data. For example, how to find the 15th percentile from N = 141 observations?

In this case, we'd better use one or two simple formulas. We'll demonstrate them in order to find the 15th percentile for length.

Percentiles - Rank Formula

Percentile $pct$ is the value that has $Rank_{pct}$ defined as

$$Rank_{pct} = \frac{pct}{100} \cdot (N + 1)$$

where

$Rank_{pct}$ denotes the rank for some percentile $pct$ and;
$N$ denotes the sample size or population size.

So the 15th percentile for 100 observations is the observation with rank

$$Rank_{15} = \frac{15}{100} \cdot (100 + 1) = 15.15$$

Sadly, there is no observation with rank 15.15. So we look at the nearest ranks, 15 and 16 in our Googlesheet.

Note that

observation 15 has a length of 31 centimeters and
observation 16 has a length of 32 centimeters.

If both values would have been equal -as between ranks 10 and 11, both 31 centimeters- we would have reported this value. However, the 15th percentile is some value between 31 centimeters (rank 15) and 32 centimeters (rank 16).

If may be tempting to simply report the average, 31.5 centimeters. However, 15.15 is closer to rank 15 than rank 16. This is usually taken into account by linear interpolation.

Percentiles - Interpolation Formula

For non integer ranks, exact percentiles are usually computed with

$$Pct = X_{tr} + (X_{tr + 1} - X_{tr}) \cdot ({r - tr})$$

where

$Pct$ denotes the desired percentile;
$r$ denotes the decimal rank for the desired percentile;
$tr$ denotes the truncated rank for the desired percentile;
$X_{tr}$ denotes the score for the truncated rank;
$X_{tr + 1}$ denotes the score for the truncated rank + 1.

For our example, this results in

$$Pct = 31 + (32 - 31) \cdot ({15.15 - 15}) = 31.15$$

Our Googlesheet shows how to implement this formula and its outcome.

Note that we replicated this outcome with the built-in function for percentiles, which is =PERCENTILE.EXC(B2:B101,0.15) in Googlesheets as well as Excel. As we'll see in a minute, SPSS yields the same outcome.

PERCENTILE.EXC or PERCENTILE.INC?

You may have noticed that Excel and Googlesheets contain 2 different percentile formulas:

PERCENTILE.EXC excludes percentiles 0 and 100. That is, these are undefined.
PERCENTILE.INC defines percentile 0 as the minimum and percentile 100 as the maximum.

So which one is best?

My personal opinion is that PERCENTILE.EXC makes more sense given our definition: the nth percentile is the value that separates
the lowest n% of values from the other values. This implies that the zeroeth percentile would be the value that separates the lowest 0% (?!?!) of all values from the others.

This -and therefore PERCENTILE.INC- doesn't make a lot of sense to me. But if you disagree, I'll be happy to hear from you.

Percentiles in SPSS

SPSS users may first download and open trout.sav. Now, the simplest way to find percentiles is from Analyze Descriptive statistics Frequencies and fill out the dialogs as shown below.

A much faster option is to use SPSS syntax like the one shown below.

*Find percentiles 5, 10 and 15 for length.

frequencies length
/percentiles 5 10 15.

Completing these steps confirms once more 31.15 centimeters as the 15th percentile for the lengths of our trouts.

Quartiles, Median & Boxplots

The percentiles that are most often reported are

the 25th percentile, also known as quartile 1;
the 50th percentile, also known as quartile 2 or the median;
the 75th percentile, also known as quartile 3.

These percentiles are often reported in boxplots such as the one shown below.

Percentiles - Conceptual Issues

Last but not least, I'd like to point out 2 conceptual issues with percentiles that are mentioned by few text books.

First off, in case of ties, percentiles may not exactly separate the lowest n% of observations from the others. Regarding our first example,

9.0% of trouts have a length smaller than 31 centimeters;
6.0% of trouts have a length equal to 31 centimeters;
85.0% of trouts have a length greater than 31 centimeters.

Note that there is no single value here that exactly separates the lowest 10% from all other observations.

The second conceptual issue is the opposite: in some cases, an infinite number of values exactly separate the lowest n% of values. This holds for our second example, which came up with a rank of 15.15.

Remember that ranks 15 and 16 corresponded to 31 and 32 centimeters. Our interpolation formula came up with 15.15 centimeters but

31.0000001 centimeters also exactly separates the lowest 15%,
31.0000002 centimeters also exactly separates the lowest 15%,
and so on...

Fortunately, these conceptual issues rarely plague real-world data analysis.

Right, so that'll do. If you've any questions or remarks, please throw me a comment below. Other than that,

Thanks for reading!

Pearson Correlations – Quick Introduction

A Pearson correlation is a number between -1 and +1 that indicates
to which extent 2 variables are linearly related. The Pearson correlation is also known as the “product moment correlation coefficient” (PMCC) or simply “correlation”.

Pearson correlations are only suitable for quantitative variables (including dichotomous variables).

For ordinal variables, use the Spearman correlation or Kendall’s tau and
for nominal variables, use Cramér’s V.

Correlation Coefficient - Example

We asked 40 freelancers for their yearly incomes over 2010 through 2014. Part of the raw data are shown below.

Today’s question is: is there any relation between income over 2010
and income over 2011? Well, a splendid way for finding out is inspecting a scatterplot for these two variables: we'll represent each freelancer by a dot. The horizontal and vertical positions of each dot indicate a freelancer’s income over 2010 and 2011. The result is shown below.

Pearson Correlation Coefficient - Scatterplot Incomes

Our scatterplot shows a strong relation between income over 2010 and 2011: freelancers who had a low income over 2010 (leftmost dots) typically had a low income over 2011 as well (lower dots) and vice versa. Furthermore, this relation is roughly linear; the main pattern in the dots is a straight line.
The extent to which our dots lie on a straight line indicates the strength of the relation. The Pearson correlation is a number that indicates the exact strength of this relation.

Correlation Coefficients and Scatterplots

A correlation coefficient indicates the extent to which dots in a scatterplot lie on a straight line. This implies that we can usually estimate correlations pretty accurately from nothing more than scatterplots. The figure below nicely illustrates this point.

Pearson Correlation Coefficient - Multiple Scatterplots

Correlation Coefficient - Basics

Some basic points regarding correlation coefficients are nicely illustrated by the previous figure. The least you should know is that

Correlations are never lower than -1. A correlation of -1 indicates that the data points in a scatter plot lie exactly on a straight descending line; the two variables are perfectly negatively linearly related.
A correlation of 0 means that two variables don't have any linear relation whatsoever. However, some non linear relation may exist between the two variables.
Correlation coefficients are never higher than 1. A correlation coefficient of 1 means that two variables are perfectly positively linearly related; the dots in a scatter plot lie exactly on a straight ascending line.

Correlation Coefficient - Perfect Linear Relations

Correlation Coefficient - Interpretation Caveats

When interpreting correlations, you should keep some things in mind. An elaborate discussion deserves a separate tutorial but we'll briefly mention two main points.

Correlations may or may not indicate causal relations. Reversely, causal relations from some variable to another variable may or may not result in a correlation between the two variables.
Correlations are very sensitive to outliers; a single unusual observation may have a huge impact on a correlation. Such outliers are easily detected by a quick inspection a scatterplot.

Correlation Coefficient - Software

Most spreadsheet editors such as Excel, Google sheets and OpenOffice can compute correlations for you. The illustration below shows an example in Googlesheets.

Correlation Coefficient - Correlation Matrix

Keep in mind that correlations apply to pairs of variables. If you're interested in more than 2 variables, you'll probably want to take a look at the correlations between all different variable pairs. These correlations are usually shown in a square table known as a correlation matrix. Statistical software packages such as SPSS create correlations matrices before you can blink your eyes. An example is shown below.

Correlation Coefficient - SPSS Correlation Matrix

Note that the diagonal elements (in red) are the correlations between each variable and itself. This is why they are always 1.
Also note that the correlations beneath the diagonal (in grey) are redundant because they're identical to the correlations above the diagonal. Technically, we say that this is a symmetrical matrix.
Finally, note that the pattern of correlations makes perfect sense: correlations between yearly incomes become lower insofar as these years lie further apart.

Pearson Correlation - Formula

If we want to inspect correlations, we'll have a computer calculate them for us. You'll rarely (probably never) need the actual formula. However, for the sake of completeness, a Pearson correlation between variables X and Y is calculated by
$$r_{XY} = \frac{\sum_{i=1}^n(X_i - \overline{X})(Y_i - \overline{Y})}{\sqrt{\sum_{i=1}^n(X_i - \overline{X})^2}\sqrt{\sum_{i=1}^n(Y_i - \overline{Y})^2}}$$
The formula basically comes down to dividing the covariance by the product of the standard deviations. Since a coefficient is a number divided by some other number our formula shows why we speak of a correlation coefficient.

Correlation - Statistical Significance

The data we've available are often -but not always- a small sample from a much larger population. If so, we may find a non zero correlation in our sample
even if it's zero in the population. The figure below illustrates how this could happen.

Scatterplot Showing Sample Correlation if Population Correlation is Zero

If we ignore the colors for a second, all 1,000 dots in this scatterplot visualize some population. The population correlation -denoted by ρ- is zero between test 1 and test 2.
Now, we could draw a sample of N = 20 from this population for which the correlation r = 0.95. Reversely, this means that a sample correlation of 0.95 doesn't prove with certainty that there's a non zero correlation in the entire population. However, finding r = 0.95 with N = 20 is extremely unlikely if ρ = 0. But precisely how unlikely? And how do we know?

Correlation - Test Statistic

If ρ -a population correlation- is zero, then the probability for a given sample correlation -its statistical significance- depends on the sample size. We therefore combine the sample size and r into a single number, our test statistic t: $$T = R\sqrt{\frac{(n - 2)}{(1 - R^2)}}$$

Now, T itself is not interesting. However, we need it for finding the significance level for some correlation. T follows a t distribution with ν = n - 2 degrees of freedom but only if some assumptions are met.

Correlation Test - Assumptions

The statistical significance test for a Pearson correlation requires 3 assumptions:

independent observations;
the population correlation, ρ = 0;
normality: the 2 variables involved are bivariately normally distributed in the population. However, this is not needed for a reasonable sample size -say, N ≥ 20 or so.The reason for this lies in the central limit theorem.

Pearson Correlation - Sampling Distribution

In our example, the sample size N was 20. So if we meet our assumptions, T follows a t-distribution with df = 18 as shown below.

Pearson Correlation - T-Distribution with DF = 18

This distribution tells us that there's a 95% probability that -2.1 < t < 2.1, corresponding to -0.44 < r < 0.44. Conclusion: if N = 20, there's a 95% probability of finding -0.44 < r < 0.44. There's only a 5% probability of finding a correlation outside this range. That is, such correlations are statistically significant at α = 0.05 or lower: they are (highly) unlikely and thus refute the null hypothesis of a zero population correlation.
Last, our sample correlation of 0.95 has a p-value of 1.55e^-10 -one to 6,467,334,654. We can safely conclude there's a non zero correlation in our entire population.

Thanks for reading!