This tutorial shows how to run nice tables and graphs for investigating the association between a metric and a categorical variable. If statistical assumptions are met, these may be followed up by an ANOVA.
As an example, we'll use freelancers.sav and see whether (and how) sector_2010 is related to income_2010.
Data Inspection and FILTER
We'll first inspect FREQUENCIES for sector_2010 by running the syntax below (step 1). In first instance, the table is rather messy due to system missing values (screenshot beneath syntax).
In second instance, we'll FILTER out cases with system missings as this results in a cleaner table.For an even cleaner table, we can hide “Valid” with a Python script and style the table with an SPSS table template (.stt file). Doing so also keeps N nice and constant over analyses.
SPSS FREQUENCIES and FILTER Syntax
*2. Create filter variable for excluding cases with system missings on sector_2010.
compute filt_1 = not(sysmis(sector_2010)).
*3. Apply label to filter variable.
variable labels filt_1 "Filter that excludes cases having sysmis on sector_2010".
*4. Switch filter variable on.
filter by filt_1.
*5. Rerun frequencies table for cleaner result.
Histogram and Custom Currency Format
We'll inspect the histogram for income_2010 to see whether it holds any unusual values. This isn't the case but the chart gets cluttered up somewhat due to the large numbers representing income.
One way to deal with this is dividing all income values by 1,000 as shown in the syntax below (step 2). In order to make clear we now have income in thousands of dollars, we'll suffix all values with “K” (short for “Kilo” or 1,000) by defining a custom currency format in step 3.
We'll specify this as the format for income_2010 with FORMATS (step 4) after which we obtain more readable charts.
SPSS Histogram and Custom Currency Format Syntax
*2. Divide all incomes by 1,000.
compute income_2010 = income_2010 / 1000.
*3. Set custom currency A (= cca) format with "K" suffix.
set cca '-,$,K,K'.
*4. Use newly defined cca format for income_2010.
formats income_2010 (cca12).
*5. More space on x-axis of chart.
SPSS MEANS Table
Now that we made sure there's nothing awkward regarding our variables of interest, let's see whether they are associated. We'll first do so by running a basic MEANS table as shown in the syntax below (step 3).
Optionally: because we don't like the default title (“Report”), we'll make it invisible with an SPSS table template (.stt file). Instead, we'll display a variable label it as if it was the title. We'll therefore change the variable label for income_2010 (step 2). Preceding it by TEMPORARY circumvents the need to reverse this action. The result is shown in the next screenshot.
SPSS MEANS Table Syntax
*2. Set variable label to desired title for means table.
variable label income_2010 "Mean income by sector over 2010.".
*3. Run means table. Also indicates end of temporary and reverses previous command.
means income_2010 by sector_2010/cells count mean stddev.
Conclusion: income_2010 and sector_2010 seem strongly associated. Roughly, respondents in IT and healt care had incomes around $55,000. All other sectors showed mean incomes around $35,000.
SPSS Bar Chart for Independent Means
Next, we'll visualize the previous table as a bar chart. The screenshots below walk you through.
SPSS Bar Chart for Independent Means Syntax
Following the steps outlined by the screenshots results in the syntax below. Run it in order to generate the chart shown in the screenshot.
GRAPH /BAR(SIMPLE)=MEAN(income_2010) BY sector_2010
/title "Mean income by sector over 2010 (N = 37)".
SPSS Bar Chart Styling
The previous bar chart very clearly visualizes the pattern we saw in the means table. However, it doesn't look very pretty. We'll prettify it somewhat by building and setting an SPSS chart template (.sgt file). Our final result is shown below.
SPSS Create Split Histogram
Optionally, we can look a bit further into the differences between the mean incomes for different sectors by running a split histogram: we'll create a chart with histograms for income_2010 for different sectors separately. The screenshot below walks you through.
SPSS Split Histogram Styling
The syntax generated by following the previous screenshot is shown below As we did previously, we'll use a variable label as if it was the chart title. We'll apply some styling with a chart template. Our final result is shown in the following screenshot.
SPSS Split Histogram Syntax
*2. Set variable label to chart title.
variable labels sector_2010 "Income by sector over 2010 (N = 37)".
*3. Run chart, end temporary, reverse previous command.
Conclusion: the histogram for Health Care doesn't look different from the others except that all incomes are some $20,000 higher than for other sectors except IT. For IT, we see a peak around $80,000 but another peak appears around $30,000. The average income was high, but it has a large standard deviation as well.