SPSS tutorials website header logo SPSS TUTORIALS BASICS ANOVA REGRESSION FACTOR CORRELATION

SPSS Scatterplot Tutorial

A large bank wants to gain insight into their employees’ job satisfaction. They carried out a survey, the results of which are in bank_clean.sav. The survey included the number of hours people work per week and their gross monthly salaries.

SPSS Scatterplot Data View Bank Clean

Research Question

It seems obvious that working hours are related to monthly salaries: employees who work more hours earn more money. But we'd like to know more about this relationship so our research question is how (strongly) is monthly salary related to working hours? Since we already inspected this data file (and set missing values) we can simply run correlations whours salary. and see that the correlation is 0.648, quite a strong linear relation. Some would leave it at. However, a scatterplot will show that there's way more to this relation.

SPSS Scatterplot Creation

We'll first run our scatterplot the way most users find easiest: by following the screenshots below.

SPSS Scatterplot Menu 840 1 SPSS Scatterplot Dialog Minimal

The aforementioned steps result in the syntax below. Running it creates our first basic scatterplot.

SPSS Scatterplot Syntax

*Minimal scatterplot syntax from legacy dialogs.

GRAPH
/SCATTERPLOT(BIVAR)=whours WITH salary
/MISSING=LISTWISE.

Note: you'll get the exact same result by running graph/scatter whours with salary. You probably prefer this second version if you want to create multiple scatterplots by copy-paste-editing the syntax. If you want to create a huge number of scatterplots, see SPSS with Python - Looping over Scatterplots.

Result

SPSS Scatterplot Bank Unstyled

As we see, this is not a simple linear relation. First, we see that our dots become more dispersed as our respondents work more hours; the more hours people work, the greater the standard deviation of monthly salary. This is a textbook example of heteroscedasticity, the opposite of homoscedasticity, an important assumption for regression.
Second, the see the pattern of dots “bend upwards” towards the right side of our chart. This is a clear indication of nonlinearity, which also violates the regression assumptions.
So why do we see heteroscedasticity and nonlinearity in our scatterplot? Well, perhaps the higher hourly wages are only available for those in the more high level jobs which also require more hours per week. Interestingly, we have “job type” in our data, which comes somewhat close to job levels. Let's now add it to our scatterplot by following the screenshot below. Tip: use the dialog recall button for quick access to the scatter dialog.

SPSS Scatterplot with Legend

SPSS Scatterplot Dialog Markers

Set Markers by: uses a different colors for our dots, based on some variable. We'll enter jtype (job type).
Label Cases by: should label each dot with the value of a (unique identifier) variable but it doesn't work.You probably want to use this only for very small samples anyway. If you really need it: it does work in the chart builder but we'll skip it for now. We'll leave it empty.
Optionally, let's add some nice title to our chart.

SPSS Scatterplot with Legend Syntax

*Scatterplot with different colors for different job types.

GRAPH
/SCATTERPLOT(BIVAR)=whours WITH salary BY jtype
/MISSING=LISTWISE
/TITLE "Monthly Salary by Weekly Hours | n = 464".

Result

SPSS Scatterplot Legend Styled Bank

And there we have it. The cause for the heteroscedasticity and nonlinearity is that middle and upper managers have (very) high hourly wages and typically work more hours too than the other employees.
This plot also suggests that we should perhaps not lump together all job types: for sales employees (red dots), the relation between hours and salary looks very linear -presumably because their hourly wages are rather fixed. The precise opposite holds for upper management (black dots). We'll now confirm this by inspecting the correlation for each group separately.

Correlations for Job Types Separately

*Sort cases needed for split file.

sort cases by jtype.

*Split file.

split file by jtype.

*Separate correlations for job types.

correlations salary with whours.

split file off.

Result

SPSS Separate Correlations for Groups

Indeed, the correlation between hours and salary is 0.79 for sales employees and 0.21 for upper management. We'll leave it as an exercise to the reader to create scatterplots for separate job type groups.

Final Notes

Our first finding on these data was simply a correlation of 0.65 between working hours and salary. However, a scatterplot suggested that it wasn't quite as simple as that. I hope we gave you an idea how to create scatterplots easily in SPSS and why they can be very useful indeed.
Thanks for reading!

Tell us what you think!

*Required field. Your comment will show up after approval from a moderator.

THIS TUTORIAL HAS 4 COMMENTS: