SPSS Tutorials


Association between Metric Variables


This tutorial will investigate the association between metric variables with nice tables and charts. As an example, we'll use income_2010 and income_2011 from freelancers.sav.

SPSS Metric Variable in Data View

Quick Data Check

Before jumping into analyses, let's first just inspect whether both variables have plausible values. A fast way for doing so is generating a histogram by running FREQUENCIES. The syntax below does just that.

*Run histograms (but no tables) for income_2010 and income_2011.

frequencies income_2010 income_2011
/format notable
SPSS Histogram Extreme ValuesHistogram for income_2011 does not look plausible due to extreme value.

Finding and Specifying User Missing Values

Conclusion: although the histogram for income_2010 looks fine, income_2011 seems to have some extremely large value(s) that don't indicate yearly incomes.
One way to track these down is running FREQUENCIES and sorting the table descendingly by value (syntax below, step 1); we'll now see these unlikely values at the top of the frequencies table. This shows that income_2011 contains 99999997 which we'll specify as a user missing value.
We'll also hide the decimals of the values in both variables (step 3). This somewhat suppresses excessive decimal places in output tables. Note that a nice tool for doing so after running tables is available from SPSS Set Decimals Output Tables.

SPSS Find and Set Missing Values Syntax

*1. Income_2011 has extreme values. Check out which.

frequencies income_2011
/format dvalue.

*2. Specify 99999997 as user missing.

missing values income_2011(99999997).

*3. Hide dollar cents for more space on x axis.

formats income_2010 income_2011(dollar9).

*4. Quick check.

frequencies income_2010 income_2011
/format notable


A univariate DESCRIPTIVES table doesn't say anything about the association between two metric variables. However, as it's commonly included in reports, we'll run one too.
Optionally, styling can be applied by using an SPSS table template (.stt file). Like so, we'll hide its title.“Valid N (listwise)” can't be hidden with a table template but we used a Python script for doing so. Our final result is shown in the next screenshot.

*Standard descriptive statistics table.

descriptives income_2010 income_2011.
SPSS Descriptives Table StyledSPSS DESCRIPTIVES Table with Styling Applied

Creating SPSS Scatterplots

A great way for visualizing the association -if any- between metric variables is running a scatterplot. The screenshots below show how to do so.
For creating multiple scatterplots, copy-paste the syntax a couple of times and replace the variable names. For creating many scatterplots, have Python loop over the variable names and run the syntax for you.

SPSS Scatterplot Creation SPSS Scatterplot Creation 2

SPSS Scatterplot Syntax

Note: the graph resulting from running the syntax can be styled by applying an SPSS chart template (.sgt file). The screenshot below shows our final result after doing so.

*Create scatterplot with income_2010 (x axis) and income_2011 (y axis).

/SCATTERPLOT(BIVAR)=income_2010 WITH income_2011
/TITLE='All Respondents (n = 39)'.

SPSS Scatterplot Example

SPSS Scatterplot Styled

Conclusion: income_2010 is very strongly related to income_2011. This relation is roughly linear.Honestly, the relation didn't look entirely linear to us. However, a quick CURVEFIT showed us that alternative models deviate from a linear relation to a negligible extent only. That is, the relation turned out to be much more linear than it seemed at first glance.


We already saw from our scatterplot that our two variables are strongly related in a linear fashion. It can be quantified by calculating a Pearson correlation by running CORRELATIONS.
The resulting table always contains p-values but these are nonsensical if their statistical assumptions haven't been met. We'll therefore hide them with a simple tool available from SPSS Correlations without Significance.

*Correlation and N between income_2010 and income_2011.

correlations income_2010 with income_2011.

Conclusion: the correlation of .904 confirms that there is a very strong linear relation between income_2010 and income_2011 indeed.

Previous tutorial: Association between Metric and Categorical Variable

Next tutorial: Descriptive Statistics – One Metric Variable

Let me know what you think!

*Required field. Your comment will show up after approval from a moderator.