Power (Statistics) - The Ultimate Beginners Guide
SPSS tutorials website header logo SPSS TUTORIALS VIDEO COURSE BASICS ANOVA REGRESSION FACTOR

Power (Statistics) – The Ultimate Beginners Guide

In statistics, power is the probability of rejecting
a false null hypothesis.

Power - Minimal Example

Now, given a sample size of N = 10 and a population correlation ρ = 0.10, what's the probability of correctly rejecting the null hypothesis? This probability is known as power and denoted as (1 - β) in statistics. For the aforementioned example, (1 - β) is only .058 (roughly 6%) as shown below.

Gpower Example Single Correlation

If a population correlation ρ = .10 and
we sample N = 10 respondents, then
we need to find an absolute sample correlation of | r | > .63 for rejecting H0 at α = .05.
The probability of finding this is only .058.

So even though H0 is false, we're unlikely to actually reject it. Not rejecting a false H0 is known as a committing a type II error.

Type I and Type II Errors

Any null hypothesis may be true or false and we may or may not reject it. This results in the 4 scenarios outlined below.

Reality: H0 is trueReality: H0 is false
Decision: reject H0Type I error
Probability = α
Correct decision
Probability = (1 - β) = power
Decision: retain H0Correct decision
Probability = (1 - α)
Type II error
Probability = β

As you probably guess, we usually want the power for our tests to be as high as possible. But before taking a look at factors affecting power, let's first try and understand how a power calculation actually works.

Power Calculation Example

A pharmaceutical company wants to demonstrate that their medicine against high blood pressure actually works. They expect the following:

Given these considerations, what's the power for this study? Or -alternatively- what's the probability of rejecting H0 that the mean blood pressure is equal between treated and untreated populations?

Obviously, nobody knows the outcomes for this study until it's finished. However, we do know the most likely outcomes: they're our population estimates. So let's for a moment pretend that we'll find exactly these and enter them into a t-test calculator.

Power For T-Test Excel Example Compute t-test for expected sample sizes, means and SD's in Excel

We expect p = 0.023 so we expect to reject H0.
This is based on a t-distribution with df = 38 degrees of freedom (total sample size N = 40 - 2).
We expect to find t = 2.37 if the population mean difference is 6 mmHg (160 - 154).

Now, this expected (or average) t = 2.37 under the alternative hypothesis Ha is known as a noncentrality parameter or NCP. The NCP tells us how t is distributed under some exact alternative hypothesis and thus allows us to estimate the power for some test. The figure below illustrates how this works.

Central Noncentral T-Distribution For Power

A minor note here is that we'd also reject H0 if t < -2.02 but this probability is almost zero for our first scenario. The exact calculation can be replicated from the SPSS syntax below.

*Enter chosen alpha and expected NCP as raw data.
data list free/alpha ncp.
begin data
0.05 2.37
end data.

*Compute left (lct) and right (rct) critical t-values and power.
compute lct = idf.t(0.5 * alpha,38).
compute rct = idf.t(1 - (0.5 * alpha),38).
compute lprob = ncdf.t(lct,38,ncp).
compute rprob = 1 - ncdf.t(rct,38,ncp).
compute power = lprob + rprob.
execute.

*Show 3 decimal places for all values.
formats all (f8.3).

Power and Effect Size

Like we just saw, estimating power requires specifying

In the previous example, our scientists had an exact alternative hypothesis because they had very specific ideas regarding population means and standard deviations. In most applied studies, however, we're pretty clueless about such population parameters. This raises the question how do we get an exact alternative hypothesis?

For most tests, the alternative hypothesis can be specified as an effect size measure: a single number combining several means, variances and/or frequencies. Like so, we proceed from requiring a bunch of unknown parameters to a single unknown parameter.

What's even better: widely agreed upon rules of thumb are available for effect size measures. An overview is presented in this Googlesheet, partly shown below.

Effect Size Rules Of Thumb

In applied studies, we often use G*Power for estimating power. The screenshot below replicates our power calculation example for the blood pressure medicine study.

Gpower Example Independent Samples T-Test G*Power computes both effect size and power from two means and SD's

Note that estimating power in G*Power only requires

a single estimated effect size measure. Optionally, G*Power computes it for you, given your sample means and SD's.
the alpha level -often 0.05- used for testing the null hypothesis &
one or more sample sizes

Let's now take a look at how these 3 factors relate to power.

Factors Affecting Power

The figure below gives a quick overview how 3 factors relate to power.

Factors Affecting Power In Statistics

Let's now take a closer look at each of them.

Power & Alpha Level

Everything else equal, increasing alpha increases power. For our example calculation, power increases from 0.637 to 0.753 if we test at α = 0.10 instead of 0.05.

Sampling Distributions Power Versus Alpha

A higher alpha level results in smaller (absolute) critical values: we already reject H0 if t > 1.69 instead of t > 2.02. So the light blue area, indicating (1 - β), increases. We basically require a smaller deviation from H0 for statistical significance.

However, increasing alpha comes at a cost: it increases the probability of committing a type I error (rejecting H0 when it's actually true). Therefore, testing at α > 0.05 is generally frowned upon. In short, increasing alpha basically just decreases one problem by increasing another one.

Power & Effect Size

Everything else equal, a larger effect size results in higher power. For our example, power increases from 0.637 to 0.869 if we believe that Cohen’s D = 1.0 rather than 0.8.

Power Versus Effect Size Sampling Distributions

A larger effect size results in a larger noncentrality parameter (NCP). Therefore, the distributions under H0 and HA lie further apart. This increases the light blue area, indicating the power for this test.

Keep in mind, though, that we can estimate but not choose some population effect size. If we overestimate this effect size, we'll overestimate the power for our test accordingly. Therefore, we can't usually increase power by increasing an effect size.

An arguable exception is increasing an effect size by modifying a research design or analysis. For example, (partial) eta squared for a treatment effect in ANOVA may increase by adding a covariate to the analysis.

Power & Sample Size

Everything else equal, larger sample size(s) result in higher power. For our example, increasing the total sample size from N = 40 to N = 80 increases power from 0.637 to 0.912.

Power Versus Sample Size Sampling Distributions

The increase in power stems from our distributions lying further apart. This reflects an increased noncentrality parameter (NCP). But why does the NCP increase with larger sample sizes?

Well, recall that for a t-distribution, the NCP is the expected t-value under HA. Now, t is computed as

$$t = \frac{\overline{X_1} - \overline{X_2}}{SE}$$

where \(SE\) denotes the standard error of the mean difference. In turn, \(SE\) is computed as

$$SE = Sw\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}$$

where \(S_w\) denotes the estimated population SD of the outcome variable. This formula shows that as sample sizes increase, \(SE\) decreases and therefore t (and hence the NCP) increases.

On top of this, degrees of freedom increase (from df = 38 to df = 78 for our example). This results in slightly smaller (absolute) critical t-values but this effect is very modest.

In short, increasing sample size(s) is a sound way to increase the power for some test.

Power & Research Design

Apart from sample size, effect size & α, research design may also affect power. Although there's no exact formulas, some general guidelines are that

3 Main Reasons for Power Calculations

Power calculations in applied research serve 3 main purposes:

Gpower Types Of Power Analyses Different types of power analysis are made simple by G*Power

Software for Power Calculations - G*Power

G*Power is freely downloadable software for running the aforementioned and many other power calculations. Among its features are

Linear Regression Power Sample Size Plot Required sample sizes for multiple linear regression, given desired power,
chosen α and 3 estimated effect sizes

Altogether, we think G*Power is amazing software and we highly recommend using it. The only disadvantage we can think of is that it requires rather unusual effect size measures. Some examples are

This is awkward because the APA and (perhaps therefore) most journal articles typically recommend reporting

These are also the measures we typically obtain from statistical packages such as SPSS or JASP. Fortunately, G*Power converts some measures and/or computes them from descriptive statistics like we saw in this screenshot.

Software for Power Calculations - SPSS

In SPSS, observed power can be obtained from the GLM, UNIANOVA and (deprecated) MANOVA procedures. Keep in mind that GLM - short for General Linear Model- is very general indeed: it can be used for a wide variety of analyses including

Observed Power In SPSS Glm Select Observed power from Analyze - General Linear Model -
Univariate - Options

Other power calculations (required sample sizes or estimating power prior to data collection) were added to SPSS version 27, released in 2020.

Power Analysis In SPSS 27 Power Analysis as found in SPSS version 27 onwards

In my opinion, SPSS power analysis is a pathetic attempt to compete with G*Power. If you don't believe me, just try running a couple of power analyses in both programs simultaneously. If you do believe me, ignore SPSS power analysis and just go for G*Power.

Thanks for reading.

Probability Density Functions – Simple Tutorial

A probability density function is a function from which
probabilities for ranges of outcomes can be obtained.

Example

The birth weights of mice follow a normal distribution, which is a probability density function. The population mean μ = 1 gram and the standard deviation σ = 0.25 grams. What's the probability that a newly born mouse
has a birth weight between 1.0 and 1.2 grams?
The figure below shows how to obtain an approximate answer, using only the probability density curve we just described.

Normal Distribution Mice 10 12

The probability is the surface area under the curve between 1.0 and 1.2 grams. It has a width of 0.2 grams and its average height -the probability density for this weight interval- is roughly 1.45. Therefore, the probability that a newborn mouse weighs between 1.0 and 1.2 grams is 1.45 · 0.2 = 0.29 -some 29%.

So What is Probability Density?

Probability density is probability per measurement unit. Our probability density of 1.45 means that the probability is 1.45 per gram -the measurement unit- over the interval between 1.0 and 1.2 grams. In contrast to probability, probability density can exceed 1 but only over an interval smaller than 1 measurement unit.

Compare this to population density: a population density of 100 inhabitants per square kilometer for some village doesn't imply that it has 100 inhabitants. If this village has a surface area of only 0.5 square kilometers, then it has (100 · 0.5 =) 50 inhabitants.

The screenshot below shows how to get a probability density from Excel or Google sheets.

Probability Density Function Excel

Simply typing =NORM.DIST(1.1,1,0.25,FALSE) into some cell returns the probability density at x = 1.1, which is 1.473. The last argument, cumulative, refers to the cumulative density function which we'll discuss in a minute.

Anyway. In applied statistics, we're usually after probabilities instead of probability densities. So what good is a curve showing probability densities? Well, just like a histogram, it shows which ranges of values occur how often. Like so, it predicts what a histogram will look like if we actually draw a (reasonably large) sample.
The figure below illustrates just this: it shows a histogram for a sample of 10,000 mice with the assumed normal curve (in red) superimposed on it.

Histogram With Normal Curve Birth Weights The normal curve (in red) predicts the shape of this histogram fairly precisely.

This curve -just a simple function- gives us a ton of information about our variable such as its

Probability Density Functions - Basic Rules

The mathematical definition of a probability density function is any function

Furthermore,

So how do we usually obtain such probabilities in applied research? The easy way is using a cumulative probability density function.

Cumulative Probability Density Functions

A cumulative probability density function returns the probability
that an outcome is smaller than some value x.
Such a probability -denoted as \(P(X \lt x)\)- is known as a cumulative probability.

Example: the birth weights of mice are normally distributed with μ = 1 and σ = 0.25 grams. What's the probability that a random mouse is born with a weight less than 0.75 grams? The figure below shows that this probability corresponds to the surface area left of 0.75 grams, which is 0.159 or 15.9%.

Probability Density Function Left Tail

So how did we find this exact surface area? Well, the surface area left of any value can be computed with an integral:

$$F_{cpd}(x) = \int_{-\infty}^x F_{pd}(x)dx = P(X \lt x)$$

where

The figure below shows what a cumulative normal density function looks like.

Cumulative Probability Density Function Example Curve

Note that we can readily look up probabilities from this curve. However, we can't easily estimate this variable's mean, standard deviation or skewness from this curve. The main exception to this is its median of 1.0 grams.
Last but not least, the screenshot below shows how to obtain cumulative probabilities in Excel or Google Sheets.

Cumulative Probability Density Function Excel

If a variable is normally distributed with μ = 1 and σ = 0.25, then typing =NORM.DIST(0.75,1,0.25,TRUE) into some cell returns the probability that X < 0.75, which is 0.159.

Inverse Cumulative Probability Density Functions

An inverse cumulative probability density function returns
the value x for a given cumulative probability.
Example: the birth weights of mice are normally distributed with μ = 1 and σ = 0.25 grams. Which birth weight separates the 10% lowest from the 90% highest birth weights? The figure below shows how to find this value in Excel: a birth weight less than 0.680 grams has a 0.1 or 10% probability of occurring.

Inverse Cumulative Probability Density Function Excel

Looking up this value from the inverse cumulative density in Excel is done by typing =NORM.INV(0.1,1,0.25) which returns a value (birth weight in this example) of 0.680.

Differences Probability Density and Probability Distributions

Probability density functions are often misreferred to as “probability-distributions”. This is confusing because they really are 2 different things:

A text book illustration of a true probability distribution is shown below: the outcome of a roll with a balanced die.

Uniform Probability Distribution Outcome Die Roll

Sadly, the SPSS manual abbreviates both density and distribution functions to “PDF” as shown below. Also note that the Bernoulli distribution -a probability distribution- is wrongfully listed under probability density functions.

Probability Densities In SPSS Manual

Interestingly, cumulative probability density functions are comparable to cumulative probability functions. Both return cumulative probabilities: the probability that some outcome is equal to or smaller than some value x denoted as \(P(X \le x)\).

Probability Density Functions in Applied Statistics

The big 4 probability density functions in applied statistics are

These functions are used in different forms that serve different purposes:

1. Cumulative probability density functions return probabilities for ranges of outcomes. Two such types of probabilities are

2. Inverse cumulative probability density functions return ranges of outcomes for (chosen) probabilities. Like so, they're used for constructing confidence intervals: ranges of values that enclose some parameter with a given likelihood, often 95%. Example: “the 95% confidence interval for the mean monthly salary runs from $2,300 through $2,450”.

3. Probability density functions are sometimes used to inspect statistical assumptions. Like so, the normality assumption can be evaluated by superimposing a normal curve over a histogram of observed values like we saw here. Alternatives for testing for normality are

Right. I guess that's basically it regarding probability density functions. Let us know if you found this tutorial helpful by throwing a comment below.

Thanks for reading!

Percentiles – Quick Introduction & Examples

The nth percentile is the value that separates
the lowest n% of values from the other values.

Example: the 10th percentile for body weight is 60 kilos. This means that 10% of all people weigh less than 60 kilos and 90% of people weigh more.

Percentiles - Simple Example

Some fishermen catch and measure 100 trouts. The data thus obtained are in this Googlesheet, partly shown below.

Percentiles Simple Example

So what's the 10th percentile for the length of these trouts? For our 100 observations, this is super easy. We simply

As shown in the screenshot above, observations 10 and 11 both have a length of 31 centimeters. This is the 10th percentile for length as either Excel or SPSS will readily confirm.

Sadly, things are rarely that simple with real life data. For example, how to find the 15th percentile from N = 141 observations?

In this case, we'd better use one or two simple formulas. We'll demonstrate them in order to find the 15th percentile for length.

Percentiles - Rank Formula

Percentile \(pct\) is the value that has \(Rank_{pct}\) defined as

$$Rank_{pct} = \frac{pct}{100} \cdot (N + 1)$$

where

So the 15th percentile for 100 observations is the observation with rank

$$Rank_{15} = \frac{15}{100} \cdot (100 + 1) = 15.15$$

Sadly, there is no observation with rank 15.15. So we look at the nearest ranks, 15 and 16 in our Googlesheet.

Percentile Non Integer Ranks

Note that

If both values would have been equal -as between ranks 10 and 11, both 31 centimeters- we would have reported this value. However, the 15th percentile is some value between 31 centimeters (rank 15) and 32 centimeters (rank 16).

If may be tempting to simply report the average, 31.5 centimeters. However, 15.15 is closer to rank 15 than rank 16. This is usually taken into account by linear interpolation.

Percentiles - Interpolation Formula

For non integer ranks, exact percentiles are usually computed with

$$Pct = X_{tr} + (X_{tr + 1} - X_{tr}) \cdot ({r - tr})$$

where

For our example, this results in

$$Pct = 31 + (32 - 31) \cdot ({15.15 - 15}) = 31.15$$

Our Googlesheet shows how to implement this formula and its outcome.

Percentiles Interpolation Formula

Note that we replicated this outcome with the built-in function for percentiles, which is =PERCENTILE.EXC(B2:B101,0.15) in Googlesheets as well as Excel. As we'll see in a minute, SPSS yields the same outcome.

PERCENTILE.EXC or PERCENTILE.INC?

You may have noticed that Excel and Googlesheets contain 2 different percentile formulas:

So which one is best?

My personal opinion is that PERCENTILE.EXC makes more sense given our definition: the nth percentile is the value that separates
the lowest n% of values from the other values.
This implies that the zeroeth percentile would be the value that separates the lowest 0% (?!?!) of all values from the others.

This -and therefore PERCENTILE.INC- doesn't make a lot of sense to me. But if you disagree, I'll be happy to hear from you.

Percentiles in SPSS

SPSS users may first download and open trout.sav. Now, the simplest way to find percentiles is from Analyze SPSS Menu Arrow Descriptive statistics SPSS Menu Arrow Frequencies and fill out the dialogs as shown below.

Percentiles In SPSS Frequencies

A much faster option is to use SPSS syntax like the one shown below.

*Find percentiles 5, 10 and 15 for length.

frequencies length
/percentiles 5 10 15.

Completing these steps confirms once more 31.15 centimeters as the 15th percentile for the lengths of our trouts.

Percentiles In SPSS Output

Quartiles, Median & Boxplots

The percentiles that are most often reported are

These percentiles are often reported in boxplots such as the one shown below.

Boxplot Example With Interpretation

Percentiles - Conceptual Issues

Last but not least, I'd like to point out 2 conceptual issues with percentiles that are mentioned by few text books.

First off, in case of ties, percentiles may not exactly separate the lowest n% of observations from the others. Regarding our first example,

Note that there is no single value here that exactly separates the lowest 10% from all other observations.

The second conceptual issue is the opposite: in some cases, an infinite number of values exactly separate the lowest n% of values. This holds for our second example, which came up with a rank of 15.15.

Remember that ranks 15 and 16 corresponded to 31 and 32 centimeters. Our interpolation formula came up with 15.15 centimeters but

Fortunately, these conceptual issues rarely plague real-world data analysis.

Right, so that'll do. If you've any questions or remarks, please throw me a comment below. Other than that,

Thanks for reading!

Pearson Correlations – Quick Introduction

A Pearson correlation is a number between -1 and +1 that indicates
to which extent 2 variables are linearly related.
The Pearson correlation is also known as the “product moment correlation coefficient” (PMCC) or simply “correlation”.

Pearson correlations are only suitable for quantitative variables (including dichotomous variables).

Correlation Coefficient - Example

We asked 40 freelancers for their yearly incomes over 2010 through 2014. Part of the raw data are shown below.

Correlation Coefficient - Data View

Today’s question is: is there any relation between income over 2010
and income over 2011?
Well, a splendid way for finding out is inspecting a scatterplot for these two variables: we'll represent each freelancer by a dot. The horizontal and vertical positions of each dot indicate a freelancer’s income over 2010 and 2011. The result is shown below.

Pearson Correlation Coefficient - Scatterplot Incomes

Our scatterplot shows a strong relation between income over 2010 and 2011: freelancers who had a low income over 2010 (leftmost dots) typically had a low income over 2011 as well (lower dots) and vice versa. Furthermore, this relation is roughly linear; the main pattern in the dots is a straight line.
The extent to which our dots lie on a straight line indicates the strength of the relation. The Pearson correlation is a number that indicates the exact strength of this relation.

Correlation Coefficients and Scatterplots

A correlation coefficient indicates the extent to which dots in a scatterplot lie on a straight line. This implies that we can usually estimate correlations pretty accurately from nothing more than scatterplots. The figure below nicely illustrates this point.

Pearson Correlation Coefficient - Multiple Scatterplots

Correlation Coefficient - Basics

Some basic points regarding correlation coefficients are nicely illustrated by the previous figure. The least you should know is that

Correlation Coefficient - Perfect Linear Relations

Correlation Coefficient - Interpretation Caveats

When interpreting correlations, you should keep some things in mind. An elaborate discussion deserves a separate tutorial but we'll briefly mention two main points.

Correlation Coefficient - Software

Most spreadsheet editors such as Excel, Google sheets and OpenOffice can compute correlations for you. The illustration below shows an example in Googlesheets.

Correlation Coefficient in Google Sheet

Correlation Coefficient - Correlation Matrix

Keep in mind that correlations apply to pairs of variables. If you're interested in more than 2 variables, you'll probably want to take a look at the correlations between all different variable pairs. These correlations are usually shown in a square table known as a correlation matrix. Statistical software packages such as SPSS create correlations matrices before you can blink your eyes. An example is shown below.

Correlation Coefficient - SPSS Correlation Matrix

Note that the diagonal elements (in red) are the correlations between each variable and itself. This is why they are always 1.
Also note that the correlations beneath the diagonal (in grey) are redundant because they're identical to the correlations above the diagonal. Technically, we say that this is a symmetrical matrix.
Finally, note that the pattern of correlations makes perfect sense: correlations between yearly incomes become lower insofar as these years lie further apart.

Pearson Correlation - Formula

If we want to inspect correlations, we'll have a computer calculate them for us. You'll rarely (probably never) need the actual formula. However, for the sake of completeness, a Pearson correlation between variables X and Y is calculated by
$$r_{XY} = \frac{\sum_{i=1}^n(X_i - \overline{X})(Y_i - \overline{Y})}{\sqrt{\sum_{i=1}^n(X_i - \overline{X})^2}\sqrt{\sum_{i=1}^n(Y_i - \overline{Y})^2}}$$
The formula basically comes down to dividing the covariance by the product of the standard deviations. Since a coefficient is a number divided by some other number our formula shows why we speak of a correlation coefficient.

Correlation - Statistical Significance

The data we've available are often -but not always- a small sample from a much larger population. If so, we may find a non zero correlation in our sample
even if it's zero in the population.
The figure below illustrates how this could happen.

Scatterplot Showing Sample Correlation if Population Correlation is Zero

If we ignore the colors for a second, all 1,000 dots in this scatterplot visualize some population. The population correlation -denoted by ρ- is zero between test 1 and test 2.
Now, we could draw a sample of N = 20 from this population for which the correlation r = 0.95. Reversely, this means that a sample correlation of 0.95 doesn't prove with certainty that there's a non zero correlation in the entire population. However, finding r = 0.95 with N = 20 is extremely unlikely if ρ = 0. But precisely how unlikely? And how do we know?

Correlation - Test Statistic

If ρ -a population correlation- is zero, then the probability for a given sample correlation -its statistical significance- depends on the sample size. We therefore combine the sample size and r into a single number, our test statistic t: $$T = R\sqrt{\frac{(n - 2)}{(1 - R^2)}}$$

Now, T itself is not interesting. However, we need it for finding the significance level for some correlation. T follows a t distribution with ν = n - 2 degrees of freedom but only if some assumptions are met.

Correlation Test - Assumptions

The statistical significance test for a Pearson correlation requires 3 assumptions:

Pearson Correlation - Sampling Distribution

In our example, the sample size N was 20. So if we meet our assumptions, T follows a t-distribution with df = 18 as shown below.

Pearson Correlation - T-Distribution with DF = 18

This distribution tells us that there's a 95% probability that -2.1 < t < 2.1, corresponding to -0.44 < r < 0.44. Conclusion: if N = 20, there's a 95% probability of finding -0.44 < r < 0.44. There's only a 5% probability of finding a correlation outside this range. That is, such correlations are statistically significant at α = 0.05 or lower: they are (highly) unlikely and thus refute the null hypothesis of a zero population correlation.
Last, our sample correlation of 0.95 has a p-value of 1.55e-10 -one to 6,467,334,654. We can safely conclude there's a non zero correlation in our entire population.

Thanks for reading!