SPSS tutorials website header logo SPSS TUTORIALS BASICS ANOVA REGRESSION FACTOR CORRELATION

Comparing Dichotomous or Categorical Variables

Summary

This tutorial shows how to create nice tables and charts for comparing multiple dichotomous or categorical variables. We recommend following along by downloading and opening freelancers.sav.

SPSS Data View Categorical Variabes

The question we'll answer is in which sectors our respondents have been working and to what extent this has been changing over the years 2010 through 2014. Variables sector_2010 through sector_2014 contain the necessary information.

SPSS Frequency Tables

A simple and straightforward way for answering our question is running basic FREQUENCIES tables over the relevant variables. The syntax below shows how to do so. The next screenshot shows the first of the five tables created like so.

*1. Display values and value labels in output tables.

set tnumbers both.

*2. Inspect frequency tables.

frequencies sector_2010 to sector_2014.

SPSS FREQUENCIES Output

SPSS FREQUENCIES Output Tables

Right, with some effort we can see from these tables in which sectors our respondents have been working over the years. However, these separate tables don't provide for a nice overview. Therefore, we'll next create a single overview table for our five variables.
The table we'll create requires that all variables have identical value labels. Inspecting the five frequencies tables shows that all variables have values from 1 through 5 and these are identically labeled. A final preparation before creating our overview table is handling the system missing values that we see in some frequency tables.

Including System Missing Values

Since we're dealing with nominal variables, we may include system missing values as if they were valid. This keeps the N nice and consistent over analyses. Since the valid values run through 5, we'll RECODE them into 6.

*1. Recode system missings into value that's not present in variables yet (here: 6).

recode sector_2010 to sector_2014 (sysmis = 6).

*2. Apply description to former system missing values.

add value labels sector_2010 to sector_2014 6 '(Unknown)'.

SPSS TABLES Command

We'll now run a single table containing the percentages over categories for all 5 variables. One way to do so is by using TABLES as shown below. Using TABLES is rather challenging as it's not available from the menu and has been removed from the command syntax reference. We'll therefore propose an alternative way for creating this exact same table a bit later on.

*1. Display value labels but no values in output tables.

set tnumbers labels.

*2. Frequency table for multiple variables.

tables
/ftotal = total
/table = sector_2010 + sector_2011 + sector_2012 + sector_2013 + sector_2014 by (labels) + total
/statistics = cpct((pct4)'')
/title = "Primary Sectors over 2010-2014."
/caption "N = 40 respondents.".

SPSS TABLES Output Table

SPSS TABLES Command Categorical Variables

SPSS VARSTOCASES Command

At this point, we'd like to visualize the previous table as a chart. A single graph containing separate bar charts for different years would be nice here. However, SPSS can't generate this graph given our current data structure.
The solution is to restructure our data: we'll put our five variables (sectors for five years) on top of each other in a single variable. A second variable will indicate the year for each sector.
The syntax below shows how to do so with VARSTOCASES. Since we'll focus on sectors and years exclusively, we'll drop all other variables from the original data.

SPSS VARSTOCASES Syntax Example

*Put sector_2010 through sector_2014 on top of each other and create new variable "Year" to indicate years.

VARSTOCASES
/make sector from sector_2010 to sector_2014
/index Year (sector)
/drop all.

Result

SPSS VARSTOCASES Command Explained

Additional Data Tweaks

Note that the variable label for sector is no longer correct after running VARSTOCASES; it's no longer limited to 2010. The first step in the syntax below will fixes this.
Also, note that year is a string variable representing years. We may chop off “sector_” from all values by using SUBSTR in order to clean it up a bit. This will make subsequent tables and charts look much nicer.

*1. Correct variable label for sector.

variable labels sector "Primary Sector".

*2. Chop off "sector_" from year.

compute year = char.substr(year,index(year,'_') + 1).

SPSS CROSSTABS Table

Since we restructured our data, the main question has now become whether there's an association between sector and year. Although year is metric, we'll treat both variables as categorical.
A contingency table generated with CROSSTABS now sheds some light onto this association. Note that the results are identical to the TABLES and FREQUENCIES results we ran previously.

*Investigate association between primary sector and year.

crosstabs sector by year/cells column.

SPSS CROSSTABS Output

SPSS CROSSTABS Output Table

SPSS Split Bar Chart

Restructuring out data allows us to run a split bar chart; we'll make bar charts displaying frequencies for sector for our five years separately in a single chart. The screenshot below walks you through.

When running the syntax for this chart, the variable label of year will be shown above the chart. We don't want this but there's no easy way for circumventing it. The solution here is changing the variable label to a title for our chart and we do so by adding step 2 to our chart syntax below. Preceding it with TEMPORARY (step 1), circumvents the need to change back the variable label later on.

SPSS Split Bar Chart Syntax

*1. Indicate that following data modifications must be reversed later on.

temporary.

*2. Abuse variable label as title for chart.

variable labels Year "Primary Sectors by Year (N = 40)".

*3. Run chart and change back variable label.

GRAPH
/BAR(SIMPLE)=COUNT BY sector
/PANEL COLVAR=Year COLOP=CROSS.

SPSS Split Bar Chart

SPSS Split Bar Chart Colvar

Conclusion

Our chart visualizes the sectors our respondents have been working in over the years. However, the chart doesn't look very pretty and its layout is far from optimal. Creating an SPSS chart template for it can do some real magic here but this is beyond our scope now.
For rounding up with a bit of an anti climax, we don't observe any outspoken association between primary sector and year.

Tell us what you think!

*Required field. Your comment will show up after approval from a moderator.

THIS TUTORIAL HAS 2 COMMENTS: