SPSS tutorials website header logo

SPSS tutorials


SPSS Moderation Regression Tutorial

A sports doctor routinely measures the muscle percentages of his clients. He also asks them how many hours per week they typically spend on training. Our doctor suspects that clients who train more are also more muscled. Furthermore, he thinks that the effect of training on muscularity declines with age. In multiple regression analysis, this is known as a moderation interaction effect. The figure below illustrates it.

Moderation Interaction In Regression Diagram

So how to test for such a moderation effect? Well, we usually do so in 3 steps:

  1. if both predictors are quantitative, we usually mean center them first;
  2. we then multiply the centered predictors into an interaction predictor variable;
  3. finally, we enter both mean centered predictors and the interaction predictor into a regression analysis.

SPSS Moderation Regression - Example Data

These 3 predictors are all present in muscle-percent-males-interaction.sav, part of which is shown below.

SPSS Moderation Regression Variable View

We did the mean centering with a simple tool which is downloadable from SPSS Mean Centering and Interaction Tool.
Alternatively, mean centering manually is not too hard either and covered in How to Mean Center Predictors in SPSS?

SPSS Moderation Regression - Dialogs

Our moderation regression is not different from any other multiple linear regression analysis: we navigate to Analyze SPSS Menu Arrow Regression SPSS Menu Arrow Linear and fill out the dialogs as shown below.

SPSS Regression With Moderation Interaction Dialogs

Clicking Paste results in the following syntax. Let's run it.

*Regression with mean centered predictors and interaction predictor.

/METHOD=ENTER cent_age cent_thours int_1

SPSS Moderation Regression - Coefficients Output

SPSS Moderation Regression Coefficients Output

Age is negatively related to muscle percentage. On average, clients lose 0.072 percentage points per year.
Training hours are positively related to muscle percentage: clients tend to gain 0.9 percentage points for each hour they work out per week.
The negative B-coefficient for the interaction predictor indicates that the training effect becomes more negative -or less positive- with increasing ages.

Now, for any effect to bear any importance, it must be statistically significant and have a reasonable effect size.

At p = 0.000, all 3 effects are highly statistically significant. As effect size measures we could use the semipartial correlations (denoted as “Part”) where

The training effect is almost large and the age and age by training interaction are almost medium. Regardless of statistical significance, I think the interaction may be ignored if its part correlation r < 0.10 or so but that's clearly not the case here. We'll therefore examine the interaction in-depth by means of a simple slopes analysis.

With regard to the residual plots (not shown here), note that

Creating Age Groups

Our simple slopes analysis starts with creating age groups. I'll go for tertile groups: the youngest, intermediate and oldest 33.3% of the clients will make up my groups. This is an arbitrary choice: we may just as well create 2, 3, 4 or whatever number of groups. Equal group sizes are not mandatory either and perhaps even somewhat unusual. In any case, the syntax below creates the age tertile groups as a new variable in our data.

*Create age tertile groups.

rank age
/ntiles(3) into agecat3.

*Label new variable and values.

variable labels agecat3 'Age Tertile Group'.
value labels agecat3 1 'Youngest Ages' 2 'Intermediary Ages' 3 'Highest Ages'.

*Check descriptive statistics age per age group.

means age by agecat3
/cells count min max mean stddev.


Descriptive Statistics By Age Group

Some basic conclusions from this table are that

  1. our age groups have precisely equal sample sizes of n = 81;
  2. the group mean ages are unevenly distributed: the difference between young and intermediary -some 6 years- is much smaller than between intermediary and highest -some 13 years;
  3. the highest age group has a much larger standard deviation than the other 2 groups.

Points 2 and 3 are caused by the skewness in age and argue against using tertile groups. However, I think that having equal group sizes easily outweighs both disadvantages.

Simple Slopes Analysis I - Fit Lines

Let's now visualize the moderation interaction between age and training. We'll start off creating a scatterplot as shown below.

SPSS Scatterplot Menu 840 1 SPSS Scatterplot Simple Slopes Analysis

Clicking Paste results in the syntax below.

*Create scatterplot muscle percentage by uncentered training hours by age group.

/SCATTERPLOT(BIVAR)=thours WITH mperc BY agecat3
/TITLE='Muscle Percentage by Training Hours by Age Group'.

*After running chart, add separate fit lines manually.

Adding Separate Fit Lines to Scatterplot

After creating our scatterplot, we'll edit it by double-clicking it. In the Chart Editor window that opens, we click the icon labeled Add Fit Line at Subgroups

SPSS Add Fit Line At Subgroups In Chart Editor

After adding the fit lines, we'll simply close the chart editor. Minor note: scatterplots with (separate) fit lines can be created in one go from the Chart Builder in SPSS version 25+ but we'll cover that some other time.


SPSS Scatterplot Separate Fit Lines For Groups

Our fit lines nicely explain the nature of our age by training interaction effect:

Again, the similarity between the 2 youngest groups may be due to the skewness in ages: the mean ages for these groups aren't too different but very different from the highest age group.

Simple Slopes Analysis II - Coefficients

After visualizing our interaction effect, let's now test it: we'll run a simple linear regression of training on muscle percentage for our 3 age groups separately. A nice way for doing so in SPSS is by using SPLIT FILE.

The REGRESSION syntax was created from the menu as previously but with (uncentered) training as the only predictor.

*Split file by age group.

sort cases by agecat3.
split file layered by agecat3.

*Run simple linear regression with uncentered training hours on muscle percentage.


*Split file off.

split file off.


SPSS Simple Slopes Analysis Output Table

The coefficients table confirms our previous results:

for the youngest age group, the training effect is statistically significant at p = 0.000. Moreover, its part correlation of r = 0.59 indicates a large effect;
the results for the intermediary age group are roughly similar to the youngest group;
for the highest age group, the part correlation of r = 0.077 is not substantial. We wouldn't take it seriously even if it had been statistically significant -which it isn't at p = 0.49.

Last, the residual histograms (not shown here) don't show anything unusual. The residual scatterplot for the oldest age group looks curvilinear except from some outliers. We should perhaps take a closer look at this analysis but we'll leave that for another day.

Thanks for reading!

Previous tutorial: SPSS Mean Centering and Interaction Tool

Next tutorial: SPSS – Data Preparation for Regression

Let me know what you think!

*Required field. Your comment will show up after approval from a moderator.