Association between Categorical Variables

This tutorial walks through running nice tables and charts for investigating the association between categorical or dichotomous variables. If statistical assumptions are met, these may be followed up by a chi-square test.
As an example, we'll see whether sector_2010 and sector_2011 in freelancers.sav are associated in any way.

SPSS Categorical Variables in Data View

SPSS Quick Data Check

Before doing anything else, let's first just take a quick look at both variables separately. In the syntax below, we first ensure we'll see both values and value labels in our output tables (step 1). Next, we run a basic FREQUENCIES command.

*1. Set both values and value labels for output tables.

set tnumbers both.

*2. Run frequencies.

frequencies sector_2010 sector_2011.

RECODE System Missing Values

Both variables contain values from 1 through 5 plus system missing values. Since both variables are nominal, we may include these system missings as just another category. This keeps the N nice and constant over analyses and results in cleaner tables.For nicer tables, you may remove “Valid” with a Python script and apply styling with an SPSS table template (.stt file). The syntax below shows how to do so with RECODE.

*1. Recode system missing into value that's not present in variables yet (here: 6).

recode sector_2010 sector_2011 (sysmis = 6).

*2. Explain what formerly missing value means.

add value labels sector_2010 sector_2011 6 '(Unknown)'.

*3. Show only value labels in output.

set tnumbers labels.

*4. Run clean frequency tables.

frequencies sector_2010 sector_2011.

SPSS CROSSTABS for Both Variables

Thus far, we only had a look at both variables separately. In order to see how they're associated, we'll inspect their contingency table obtained from CROSSTABS. Displaying column percentages without frequencies is our preferred option here.

*Run contingency table with (only) column percentages.

crosstabs sector_2011 by sector_2010
/cells column.

Conclusion: the variables are strongly related.Again, note that we're only describing the data at hand. We're not making any attempt to generalize these results to any larger population. Roughly, most people who worked in a sector in 2010 stayed in the same sector in 2011. For example, 60% of respondents who worked in industry in 2010 stayed in industry. Another 20% moved to finance and the final 20% moved to “other”.

SPSS Clustered Bar Chart Creation

We'll now visualize the contents of the previous table. An option here is a split bar chart but we'll go for a clustered bar chart instead. The screenshots below walk you through the process.

SPSS Create Clustered Bar Chart SPSS Create Clustered Bar Chart 2

SPSS Clustered Bar Chart Syntax

*Run clustered bar chart for sector_2011 by sector_2010.

/BAR(GROUPED)=COUNT BY sector_2011 BY sector_2010
/TITLE='Sector in 2010 by sector in 2011 (N = 40)'.
SPSS Clustered Bar Chart

SPSS Clustered Bar Chart Styling

Although our chart is technically correct, it looks appalling. Its default color scheme basically just looks like a bad joke from the software developers. A fast way to prettify this and similar charts is building and applying an SPSS chart template (.sgt file). Our final result after doing so is shown in the last screenshot.

SPSS Clustered Bar Chart Example

SPSS Clustered Bar Chart Styled

Conclusion: as with the contingency table, we don't see much of a clear pattern here except for people tending to stay in the same sector as the previous year.

Tell us what you think!

*Required field. Your comment will show up after approval from a moderator.