# SPSS Tutorials

BASICS REGRESSION T-TEST ANOVA CORRELATION

# Association between Metric and Categorical Variable

## Summary

This tutorial shows how to run nice tables and graphs for investigating the association between a metric and a categorical variable. If statistical assumptions are met, these may be followed up by an ANOVA.
As an example, we'll use freelancers.sav and see whether (and how) sector_2010 is related to income_2010.

## Data Inspection and FILTER

We'll first inspect FREQUENCIES for sector_2010 by running the syntax below (step 1). In first instance, the table is rather messy due to system missing values (screenshot beneath syntax).
In second instance, we'll FILTER out cases with system missings as this results in a cleaner table.For an even cleaner table, we can hide “Valid” with a Python script and style the table with an SPSS table template (.stt file). Doing so also keeps N nice and constant over analyses.

## SPSS FREQUENCIES and FILTER Syntax

*1. Inspect frequency distribution for sector_2010.

frequencies sector_2010.

*2. Create filter variable for excluding cases with system missings on sector_2010.

compute filt_1 = not(sysmis(sector_2010)).

*3. Apply label to filter variable.

variable labels filt_1 "Filter that excludes cases having sysmis on sector_2010".

*4. Switch filter variable on.

filter by filt_1.

*5. Rerun frequencies table for cleaner result.

frequencies sector_2010.
First FREQUENCIES Table from Running Syntax Above

## Histogram and Custom Currency Format

We'll inspect the histogram for income_2010 to see whether it holds any unusual values. This isn't the case but the chart gets cluttered up somewhat due to the large numbers representing income.
One way to deal with this is dividing all income values by 1,000 as shown in the syntax below (step 2). In order to make clear we now have income in thousands of dollars, we'll suffix all values with “K” (short for “Kilo” or 1,000) by defining a custom currency format in step 3.
We'll specify this as the format for income_2010 with FORMATS (step 4) after which we obtain more readable charts.

## SPSS Histogram and Custom Currency Format Syntax

*1. Run basic histogram for income_2010. No unusual values in chart.

frequencies income_2010
/format notable
/histogram.

*2. Divide all incomes by 1,000.

compute income_2010 = income_2010 / 1000.

*3. Set custom currency A (= cca) format with "K" suffix.

set cca '-,\$,K,K'.

*4. Use newly defined cca format for income_2010.

formats income_2010 (cca12).

*5. More space on x-axis of chart.

frequencies income_2010
/format notable
/histogram.

## SPSS MEANS Table

Now that we made sure there's nothing awkward regarding our variables of interest, let's see whether they are associated. We'll first do so by running a basic MEANS table as shown in the syntax below (step 3).
Optionally: because we don't like the default title (“Report”), we'll make it invisible with an SPSS table template (.stt file). Instead, we'll display a variable label it as if it was the title. We'll therefore change the variable label for income_2010 (step 2). Preceding it by TEMPORARY circumvents the need to reverse this action. The result is shown in the next screenshot.

## SPSS MEANS Table Syntax

*1. Indicate that next command must be reversed later on.

temporary.

*2. Set variable label to desired title for means table.

variable label income_2010 "Mean income by sector over 2010.".

*3. Run means table. Also indicates end of temporary and reverses previous command.

means income_2010 by sector_2010/cells count mean stddev.

Conclusion: income_2010 and sector_2010 seem strongly associated. Roughly, respondents in IT and healt care had incomes around \$55,000. All other sectors showed mean incomes around \$35,000.

## SPSS Bar Chart for Independent Means

Next, we'll visualize the previous table as a bar chart. The screenshots below walk you through.

## SPSS Bar Chart for Independent Means Syntax

Following the steps outlined by the screenshots results in the syntax below. Run it in order to generate the chart shown in the screenshot.

*Create bar chart for independent means.

GRAPH /BAR(SIMPLE)=MEAN(income_2010) BY sector_2010
/title "Mean income by sector over 2010 (N = 37)".

## SPSS Bar Chart Styling

The previous bar chart very clearly visualizes the pattern we saw in the means table. However, it doesn't look very pretty. We'll prettify it somewhat by building and setting an SPSS chart template (.sgt file). Our final result is shown below.

## SPSS Create Split Histogram

Optionally, we can look a bit further into the differences between the mean incomes for different sectors by running a split histogram: we'll create a chart with histograms for income_2010 for different sectors separately. The screenshot below walks you through.

## SPSS Split Histogram Styling

The syntax generated by following the previous screenshot is shown below As we did previously, we'll use a variable label as if it was the chart title. We'll apply some styling with a chart template. Our final result is shown in the following screenshot.

## SPSS Split Histogram Syntax

*1. Indicate that following command must be reversed later on.

temporary.

*2. Set variable label to chart title.

variable labels sector_2010 "Income by sector over 2010 (N = 37)".

*3. Run chart, end temporary, reverse previous command.

GRAPH
/HISTOGRAM=income_2010
/PANEL COLVAR=sector_2010.

Conclusion: the histogram for Health Care doesn't look different from the others except that all incomes are some \$20,000 higher than for other sectors except IT. For IT, we see a peak around \$80,000 but another peak appears around \$30,000. The average income was high, but it has a large standard deviation as well.

# Let me know what you think!

*Required field. Your comment will show up after approval from a moderator.

# This tutorial has 4 comments

• ### By Ruben Geert van den Berg on February 7th, 2016

Hi Tom!

I don't have a Mac (I'd love to have one in order to troubleshoot issues such as the one you're reporting right now).

-Anyway, exactly what kind of ANOVA are you looking for? Only between-subjects factors or a repeated measures ANOVA? The latter case requires the GLM command which you'll only have if you have the "Advanced Statistics" add-on module installed (unfortunately, it's not free). Also see our overview of all SPSS commands.

-Do you have one or multiple factors?

-Could you send me screenshots of the menu under both "Analyze => compare means" and "Analyze => General linear model", if present?

-Could you run

`show license.`

in the syntax editor and send me the result?

This may identify the problem and suggest a workaround for it.

• ### By Thomas Vorbach on February 6th, 2016

Sorry this is not helping me. I have a MAC and there is no option to run an ANOVA???I went to analyze, compare...everything else is there but the ANOVA!

• ### By Ruben Geert van den Berg on December 4th, 2015

Thanks for the compliment! We don't have anything on regression yet but it's pretty much on the top of our list. We hope to publish one or more regression tutorials in maybe two weeks or so. Keep an eye on our home page, it introduces at least one new tutorial every Monday so you can't miss them!

• ### By Shane Graber on December 3rd, 2015

This is a pretty great site. Do you have tutorials on multiple regression?