Association between Categorical Variables

This tutorial walks through running nice tables and charts for investigating the association between categorical or dichotomous variables. If statistical assumptions are met, these may be followed up by a chi-square test.
As an example, we'll see whether sector_2010 and sector_2011 in freelancers.sav are associated in any way.

SPSS Quick Data Check

Before doing anything else, let's first just take a quick look at both variables separately. In the syntax below, we first ensure we'll see both values and value labels in our output tables (step 1). Next, we run a basic FREQUENCIES command.

*1. Set both values and value labels for output tables.

set tnumbers both.

*2. Run frequencies.

frequencies sector_2010 sector_2011.

RECODE System Missing Values

Both variables contain values from 1 through 5 plus system missing values. Since both variables are nominal, we may include these system missings as just another category. This keeps the N nice and constant over analyses and results in cleaner tables.For nicer tables, you may remove “Valid” with a Python script and apply styling with an SPSS table template (.stt file). The syntax below shows how to do so with RECODE.

*1. Recode system missing into value that's not present in variables yet (here: 6).

recode sector_2010 sector_2011 (sysmis = 6).

*2. Explain what formerly missing value means.

add value labels sector_2010 sector_2011 6 '(Unknown)'.

*3. Show only value labels in output.

set tnumbers labels.

*4. Run clean frequency tables.

frequencies sector_2010 sector_2011.

SPSS CROSSTABS for Both Variables

Thus far, we only had a look at both variables separately. In order to see how they're associated, we'll inspect their contingency table obtained from CROSSTABS. Displaying column percentages without frequencies is our preferred option here.

*Run contingency table with (only) column percentages.

crosstabs sector_2011 by sector_2010
/cells column.

Conclusion: the variables are strongly related.Again, note that we're only describing the data at hand. We're not making any attempt to generalize these results to any larger population. Roughly, most people who worked in a sector in 2010 stayed in the same sector in 2011. For example, 60% of respondents who worked in industry in 2010 stayed in industry. Another 20% moved to finance and the final 20% moved to “other”.

SPSS Clustered Bar Chart Creation

We'll now visualize the contents of the previous table. An option here is a split bar chart but we'll go for a clustered bar chart instead. The screenshots below walk you through the process.

SPSS Clustered Bar Chart Syntax

*Run clustered bar chart for sector_2011 by sector_2010.

GRAPH
/BAR(GROUPED)=COUNT BY sector_2011 BY sector_2010
/TITLE='Sector in 2010 by sector in 2011 (N = 40)'.

SPSS Clustered Bar Chart Styling

Although our chart is technically correct, it looks appalling. Its default color scheme basically just looks like a bad joke from the software developers. A fast way to prettify this and similar charts is building and applying an SPSS chart template (.sgt file). Our final result after doing so is shown in the last screenshot.

SPSS Clustered Bar Chart Example

Conclusion: as with the contingency table, we don't see much of a clear pattern here except for people tending to stay in the same sector as the previous year.

Tell us what you think!

THIS TUTORIAL HAS 14 COMMENTS:

By Abass Alhassan on July 4th, 2015

The tutorial has been helpful to me. thanks.
By Garabasa on December 10th, 2015

This is very useful, Many thanks to the Authors.
By Alishan on January 5th, 2016

Very nice tutorial we wnat to learn more many thanks
By Wendy on March 7th, 2017

Hi Ruben
Thanks for a great tutorial! My question is about post hoc testing. I've run a Chi-squared on 3 age groups and 3 body composition groups. It's come up as significant (0.005), but how do I work out in which specific group combinations the differences are? Apologies if you've answered this question previously. Thanks!
By Ruben Geert van den Berg on March 8th, 2017

Hi Wendy, thanks for the compliment!

I'm actually working on new tutorials on popular tests (including chi-square independence) and the answer to your question will be included: add adjusted standardized residuals (ARESID in your syntax) to your CROSSTAB. For reasonable sample sizes, they roughly follow a standard normal distribution. This means that any cell with an ARESID outside the range -2 through 2 is statistically significantly different from what should be expected from mere sampling fluctuation and thus signifies a "real" difference.

Does that make any sense?

Best,

Ruben

1 … 3

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Association between Categorical Variables

SPSS Quick Data Check

RECODE System Missing Values

SPSS CROSSTABS for Both Variables

SPSS Clustered Bar Chart Creation

SPSS Clustered Bar Chart Syntax

SPSS Clustered Bar Chart Styling

SPSS Clustered Bar Chart Example

Tell us what you think!

THIS TUTORIAL HAS 14 COMMENTS:

By Abass Alhassan on July 4th, 2015

By Garabasa on December 10th, 2015

By Alishan on January 5th, 2016

By Wendy on March 7th, 2017

By Ruben Geert van den Berg on March 8th, 2017