SPSS TUTORIALS FULL COURSE BASICS ANOVA REGRESSION FACTOR

# Association between Categorical Variables

This tutorial walks through running nice tables and charts for investigating the association between categorical or dichotomous variables. If statistical assumptions are met, these may be followed up by a chi-square test.
As an example, we'll see whether sector_2010 and sector_2011 in freelancers.sav are associated in any way.

## SPSS Quick Data Check

Before doing anything else, let's first just take a quick look at both variables separately. In the syntax below, we first ensure we'll see both values and value labels in our output tables (step 1). Next, we run a basic FREQUENCIES command.

*1. Set both values and value labels for output tables.

set tnumbers both.

*2. Run frequencies.

frequencies sector_2010 sector_2011.

## RECODE System Missing Values

Both variables contain values from 1 through 5 plus system missing values. Since both variables are nominal, we may include these system missings as just another category. This keeps the N nice and constant over analyses and results in cleaner tables.For nicer tables, you may remove “Valid” with a Python script and apply styling with an SPSS table template (.stt file). The syntax below shows how to do so with RECODE.

*1. Recode system missing into value that's not present in variables yet (here: 6).

recode sector_2010 sector_2011 (sysmis = 6).

*2. Explain what formerly missing value means.

add value labels sector_2010 sector_2011 6 '(Unknown)'.

*3. Show only value labels in output.

set tnumbers labels.

*4. Run clean frequency tables.

frequencies sector_2010 sector_2011.

## SPSS CROSSTABS for Both Variables

Thus far, we only had a look at both variables separately. In order to see how they're associated, we'll inspect their contingency table obtained from CROSSTABS. Displaying column percentages without frequencies is our preferred option here.

*Run contingency table with (only) column percentages.

crosstabs sector_2011 by sector_2010
/cells column.

Conclusion: the variables are strongly related.Again, note that we're only describing the data at hand. We're not making any attempt to generalize these results to any larger population. Roughly, most people who worked in a sector in 2010 stayed in the same sector in 2011. For example, 60% of respondents who worked in industry in 2010 stayed in industry. Another 20% moved to finance and the final 20% moved to “other”.

## SPSS Clustered Bar Chart Creation

We'll now visualize the contents of the previous table. An option here is a split bar chart but we'll go for a clustered bar chart instead. The screenshots below walk you through the process.

## SPSS Clustered Bar Chart Syntax

*Run clustered bar chart for sector_2011 by sector_2010.

GRAPH
/BAR(GROUPED)=COUNT BY sector_2011 BY sector_2010
/TITLE='Sector in 2010 by sector in 2011 (N = 40)'.

## SPSS Clustered Bar Chart Styling

Although our chart is technically correct, it looks appalling. Its default color scheme basically just looks like a bad joke from the software developers. A fast way to prettify this and similar charts is building and applying an SPSS chart template (.sgt file). Our final result after doing so is shown in the last screenshot.

## SPSS Clustered Bar Chart Example

Conclusion: as with the contingency table, we don't see much of a clear pattern here except for people tending to stay in the same sector as the previous year.

# Tell us what you think!

*Required field. Your comment will show up after approval from a moderator.

# THIS TUTORIAL HAS 14 COMMENTS:

• ### By Abass Alhassan on July 4th, 2015

The tutorial has been helpful to me. thanks.

• ### By Garabasa on December 10th, 2015

This is very useful, Many thanks to the Authors.

• ### By Wendy on March 7th, 2017

Hi Ruben
Thanks for a great tutorial! My question is about post hoc testing. I've run a Chi-squared on 3 age groups and 3 body composition groups. It's come up as significant (0.005), but how do I work out in which specific group combinations the differences are? Apologies if you've answered this question previously. Thanks!

• ### By Ruben Geert van den Berg on March 8th, 2017

Hi Wendy, thanks for the compliment!

I'm actually working on new tutorials on popular tests (including chi-square independence) and the answer to your question will be included: add adjusted standardized residuals (ARESID in your syntax) to your CROSSTAB. For reasonable sample sizes, they roughly follow a standard normal distribution. This means that any cell with an ARESID outside the range -2 through 2 is statistically significantly different from what should be expected from mere sampling fluctuation and thus signifies a "real" difference.

Does that make any sense?

Best,

Ruben