SPSS tutorials website header logo SPSS TUTORIALS BASICS ANOVA REGRESSION FACTOR CORRELATION

SPSS Stepwise Regression Tutorial II

A large bank wants to gain insight into their employees’ job satisfaction. They carried out a survey, the results of which are in bank_clean.sav. The survey included some statements regarding job satisfaction, some of which are shown below.

SPSS Variable View Bank 720

Research Question

The main research question for today is which factors contribute (most) to overall job satisfaction? as measured by overall (“I'm happy with my job”). The usual approach for answering this is predicting job satisfaction from these factors with multiple linear regression analysis.2,6 This tutorial will explain and demonstrate each step involved and we encourage you to run these steps yourself by downloading the data file.

Data Check 1 - Coding

One of the best SPSS practices is making sure you've an idea of what's in your data before running any analyses on them. Our analysis will use overall through q9 and their variable labels tell us what they mean. Now, if we look at these variables in data view, we see they contain values 1 through 11.
So what do these values mean and -importantly- is this the same for all variables? A great way to find out is running the syntax below.

*Check coding: higher values indicate positive or negative sentiment?.

display dictionary
/variables overall to q9.

Result

SPSS Display Dictionary Consistent Coding

If we quickly inspect these tables, we see two important things:

Taking these findings together, we expect positive (rather than negative) correlations among all these variables. We'll see in a minute that our data confirm this.

Data Check 2 - Distributions

Our previous table suggests that all variables hold values 1 through 11 and 11 (“No answer”) has already been set as a user missing value. Now let's see if the distributions for these variables make sense by running some histograms over them.

*Check distributions.

frequencies overall to q9
/format notable
/histogram.

Data Check 3 - Missing Values

First and foremost, the distributions of all variables show values 1 through 10 and they look plausible. However, we have 464 cases in total but our histograms show slightly lower sample sizes. This is due to missing values. To get a quick idea to what extent values are missing, we'll run a quick DESCRIPTIVES table over them.

*1. Show only variable labels in output.

set tvars labels.

*2. Check for missings and listwise valid n.

descriptives overall to q9.

Result

SPSS Stepwise Regression Descriptives Missing Values

For now, we mostly look at N, the number of valid values for each variable. We see two important things:

Correlations

We'll now inspect the correlations over our variables as shown below.

SPSS Correlation Menu Bivariate

In the next dialog, we select all relevant variables and leave everything else as-is. We then click Paste, resulting in the syntax below.

SPSS Stepwise Regression Correlations Flag Significance
*Inspect correlations.

CORRELATIONS
/VARIABLES=overall q1 q2 q3 q4 q5 q6 q7 q8 q9
/PRINT=TWOTAIL NOSIG
/MISSING=PAIRWISE.

Importantly, note the last line -/MISSING=PAIRWISE.- here.

Result

SPSS Correlation Matrix Pairwise Deletion

Note that all correlations are positive -like we expected. Most correlations -even small ones- are statistically significant with p-values close to 0.000. This means there's a zero probability of finding this sample correlation if the population correlation is zero.
Second, each correlation has been calculated on all cases with valid values on the 2 variables involved, which is why each correlation has a different N. This is known as pairwise exclusion of missing values, the default for CORRELATIONS.
The alternative, listwise exclusion of missing values, would only use our 297 cases that don't have missing values on any of the variables involved. Like so, pairwise exclusion uses way more data values than listwise exclusion; with listwise exclusion we'd “lose” almost 36% or the data we collected.

Multiple Linear Regression - Assumptions

Simply “regression” usually refers to (univariate) multiple linear regression analysis and it requires some assumptions:1,4

  1. the prediction errors are independent over cases;
  2. the prediction errors follow a normal distribution;
  3. the prediction errors have a constant variance (homoscedasticity);
  4. all relations among variables are linear and additive.

We usually check our assumptions before running an analysis. However, the regression assumptions are mostly evaluated by inspecting some charts that are created when running the analysis.3 So we first run our regression and then look for any violations of the aforementioned assumptions.

Regression

Now that we're sure our data make perfect sense, we're ready for the actual regression analysis. We'll generate the syntax by following the screenshots below.

SPSS Regression Menu Empty SPSS Stepwise Regression Dialog 1

(We'll explain why we choose Stepwise when discussing our output.)

SPSS Stepwise Regression Statistics Plots Dialogs

- Here we select some charts for evaluation the regression assumptions.
By default, SPSS uses only our 297 complete cases for regression. By choosing this option, our regression will use the correlation matrix we saw earlier and thus use more of our data.

“Stepwise” - What Does That Mean?

When we select the stepwise method, SPSS will include only “significant” predictors in our regression model: although we selected 9 predictors, those that don't contribute uniquely to predicting job satisfaction will not enter our regression equation. In doing so, it iterates through the following steps:

  1. find the predictor that contributes most to predicting the outcome variable and add it to the regression model if its p-value is below a certain threshold (usually 0.05).
  2. inspect the p-values of all predictors in the model. Remove predictors from the model if their p-values are above a certain threshold (usually 0.10);
  3. repeat this process until 1) all “significant” predictors are in the model and 2) no “non significant” predictors are in the model.

Regression Results - Coefficients Table

SPSS Stepwise Regression Coefficients Table

Our coefficients table tells us that SPSS performed 4 steps, adding one predictor in each. We usually report only the final model.
Our unstandardized coefficients and the constant allow us to predict job satisfaction. Precisely, Y' = 3.233 + 0.232 * x1 + 0.157 * x2 + 0.102 * x3 + 0.083 * x4 where Y' is predicted job satisfaction, x1 is meaningfulness and so on. This means that respondents who score 1 point higher on meaningfulness will -on average- score 0.23 points higher on job satisfaction.
Importantly, all predictors contribute positively (rather than negatively) to job satisfaction. This makes sense because they are all positive work aspects.
If our predictors have different scales -not really the case here- we may compare their relative strengths -the beta coefficients- by standardizing them. Like so, we see that meaningfulness (.460) contributes about twice as much as colleagues (.290) or support (.242).
All predictors are highly statistically significant (p = 0.000), which is not surprising considering our large sample size and the stepwise method we used.

Regression Results - Model Summary

SPSS Stepwise Regression Model Summary Table

Adding each predictor in our stepwise procedure results in a better predictive accuracy.
R is simply the Pearson correlation between the actual and predicted values for job satisfaction;
R square -the squared correlation- is the proportion of variance in job satisfaction accounted for by the predicted values;
We typically see that our regression equation performs better in the sample on which it's based than in our population. tries to estimate the predictive accuracy in our population and is slightly lower than R square.
We'll probably settle for -and report on- our final model; the coefficients look good it predicts job performance best.

Regression Results - Residual Histogram

SPSS Stepwise Regression Residual Histogram

Remember that one of our regression assumptions is that the residuals (prediction errors) are normally distributed. Our histogram suggests that this more or less holds, although it's a little skewed to the left.

Regression Results - Residual Plot

SPSS Stepwise Regression Residual Scatterplot

We also created a scatterplot with predicted values on the x-axis and residuals on the y-axis. This chart does not show violations of the independence, homoscedasticity and linearity assumptions but it's not very clear.
We mostly see a striking pattern of descending straight lines. This is because our dependent variable only holds values 1 through 10. Therefore, each predicted value and its residual always add up to 1, 2 and so on. Standardizing both variables may change the scales of our scatterplot but not its shape.

Stepwise Regression - Reporting

There's no full consensus on how to report a stepwise regression analysis.5,7 As a basic guideline, include

  1. a table with descriptive statistics;
  2. the correlation matrix of the dependents variable and all (candidate) predictors;
  3. the model summary table with R square and change in R square for each model;
  4. the coefficients table with at least the B and β coefficients and their p-values.

Regarding the correlations, we'd like to have statistically significant correlations flagged but we don't need their sample sizes or p-values. Since you can't prevent SPSS from including the latter, try SPSS Correlations in APA Format.
You can further edit the result fast in an OpenOffice or Excel spreadsheet by right clicking the table and selecting Copy special SPSS Menu Arrow Excel Worksheet.

SPSS Stepwise Regression - Correlation Matrix in Excel

I guess that's about it. I hope you found this tutorial helpful. Thanks for reading!

References

  1. Stevens, J. (2002). Applied multivariate statistics for the social sciences. Mahway, NJ: Lawrence Erlbaum Associates.
  2. Agresti, A. & Franklin, C. (2014). Statistics. The Art & Science of Learning from Data. Essex: Pearson Education Limited.
  3. Hair, J.F., Black, W.C., Babin, B.J. et al (2006). Multivariate Data Analysis. New Jersey: Pearson Prentice Hall.
  4. Berry, W.D. (1993). Understanding Regression Assumptions. Newbury Park, CA: Sage.
  5. Field, A. (2013). Discovering Statistics with IBM SPSS Newbury Park, CA: Sage.
  6. Howell, D.C. (2002). Statistical Methods for Psychology (5th ed.). Pacific Grove CA: Duxbury.
  7. Nicol, A.M. & Pexman, P.M. (2010). Presenting Your Findings. A Practical Guide for Creating Tables. Washington: APA.

Tell us what you think!

*Required field. Your comment will show up after approval from a moderator.

THIS TUTORIAL HAS 8 COMMENTS: