APA Style Reporting Factor Analysis

APA Reporting SPSS Factor Analysis

Introduction
Creating APA Tables - the Easy Way
Table I - Factor Loadings & Communalities
Table II - Total Variance Explained
Table III - Factor Correlations

Introduction

Creating APA style tables from SPSS factor analysis output can be cumbersome. This tutorial therefore points out some tips, tricks & pitfalls. We'll use the results of SPSS Factor Analysis - Intermediate Tutorial.

All analyses are based on 20-career-ambitions-pca.sav (partly shown below).

SPSS Factor Analysis Promax Rotation Variable View

Note that some items were reversed and therefore had “(R)” appended to their variable labels;
We'll FILTER out cases with 10 or more missing values.

After opening these data, you can replicate the final analyses by running the SPSS syntax below.

*ACTIVATE FILTER VARIABLE.

filter by filt01.

*PCA VI - AS PREVIOUS BUT REMOVE TOU04.

FACTOR
/VARIABLES Car01 Car02 Car03 Car04 Car05 Car06 Car07 Car08 Conf01 Conf02 Conf03 Conf05 Conf06
Comp01 Comp02 Comp03 Tou01 Tou02 Tou05 Succ01 Succ02 Succ03 Succ04 Succ05 Succ06
Succ07
/MISSING PAIRWISE
/PRINT INITIAL EXTRACTION ROTATION
/FORMAT SORT BLANK(.3)
/CRITERIA FACTORS(5) ITERATE(25)
/EXTRACTION PC
/ROTATION PROMAX
/METHOD=CORRELATION.

Creating APA Tables - the Easy Way

For a wide variety of analyses, the easiest way to create APA style tables from SPSS output is usually to

adjust your analyses in SPSS so the output is as close as possible to the desired end results. Changing table layouts (which variables/statistics go into which rows/columns?) is also best done here.
copy-paste one or more tables into Excel or Googlesheets. This is the easiest way to set decimal places, fonts, alignment, borders and more;
copy-paste your table(s) from Excel into WORD. Perhaps adjust the table widths with “autofit”, and you'll often have a perfect end result.

Autofit to Contents, then Window results in optimal column widths

Table I - Factor Loadings & Communalities

The figure below shows an APA style table combining factor loadings and communalities for our example analysis.

Apa Reporting Factor Analysis Factor Loadings Table

Example APA style factor loadings table

If you take a good look at the SPSS output, you'll see that you cannot simply copy-paste these tables for combining them in Excel. This is because the factor loadings (pattern matrix) table follows a different variable order than the communalities table. Since the latter follows the variable order as specified in your syntax, the easiest fix for this is to

make sure that only variable names (not labels) are shown in the output;
copy-paste the correctly sorted pattern matrix into Excel;
copy-paste the variable names into the FACTOR syntax and rerun it.

Tip: try and replace the line breaks between variable names by spaces as shown below.

Replace line breaks by spaces in an SPSS syntax window

Also, you probably want to see only variable labels (not names) from now on. And -finally- we no longer want to hide any small absolute factor loadings shown below.

This table is fine for an exploratory analysis but not for reporting

The syntax below does all that and thus creates output that is ideal for creating APA style tables.

*SHOW ONLY VARIABLE LABELS, NOT NAMES.

set tvars labels.

*RERUN PREVIOUS ANALYSIS WITH VARIABLE ORDER AS IN PATTERN MATRIX TABLE.

FACTOR
/VARIABLES Car02 Car04 Car05 Car03 Car01 Car08 Car07 Car06 Succ01 Succ02 Succ03 Succ05 Succ07 Succ04 Succ06
Conf03 Conf01 Conf05 Conf02 Conf06 Tou02 Tou05 Tou01 Comp02 Comp03 Comp01
/MISSING PAIRWISE
/PRINT INITIAL EXTRACTION ROTATION
/FORMAT SORT
/CRITERIA FACTORS(5) ITERATE(25)
/EXTRACTION PC
/ROTATION PROMAX
/METHOD=CORRELATION.

You can now safely combine the communalities and pattern matrix tables and make some final adjustments. The end result is shown in this Googlesheet, partly shown below.

Apa Style Factor Loadings Table Googlesheets

An APA table for WORD is best created in a Googlesheet or Excel

Since decimal places, fonts, alignment and borders have all been set, this table is now perfect for its final copy-paste into WORD.

Table II - Total Variance Explained

The screenshot below shows how to report the Eigenvalues table in APA style.

Apa Reporting Factor Analysis Eigenvalues Table

APA style Eigenvalues example table

The corresponding SPSS output table comes fairly close to this. However, an annoying problem are the missing percent signs.

If we copy-paste into Excel and set a percentage format, 34.57 is converted into 3,457%. This is because Excel interprets these numbers as proportions rather than percentage points as SPSS does. The easiest fix is setting a percent format for these columns in SPSS before copy-pasting into Excel.

The OUTPUT MODIFY example below does just that for all Eigenvalues tables in the output window.

*APPLY PERCENTAGE FORMAT TO PERCENT COLUMNS IN ALL TOTAL VARIANCE EXPLAINED TABLES.

output modify
/select tables
/tablecells select = ["% of Variance"] format = 'pct6.2'
/tablecells select = ["Cumulative %"] format = 'pct6.2'.

After this tiny fix, you can copy-paste this table from SPSS into Excel. We can now easily make some final adjustments (including the removal of some rows and columns) and copy-paste this table into WORD.

Table III - Factor Correlations

If you used an oblique factor rotation, you'll probably want to report the correlations among your factors. The figure below shows an APA style factor correlations table.

Apa Reporting Factor Analysis Correlations Among Factors Table

The corresponding SPSS output table (shown below) is pretty different from what we need.

APA style factor correlation table

Adjusting this table manually is pretty doable. However, I personally prefer to use an SPSS Python script for doing so.

You can download my script from LAST-FACTOR-CORRELATION-TABLE-TO-APA.sps. This script is best run from an INSERT command as shown below.

*Run Python script for adjusting factor correlation table.

insert file = 'D:\DOWNLOADS\LAST-FACTOR-CORRELATION-TABLE-TO-APA.sps'.

I highly recommend trying this script but it does make some assumptions:

the above syntax assumes the script is located in D:\DOWNLOADS so you probably need to change that;
the script assumes that you've the SPSS Python3.x essentials properly installed (usually the case for recent SPSS versions);
the script assumes that no SPLIT FILE is in effect.

If you've any trouble or requests regarding my script, feel free to contact me and I'll see what I can do.

Final Notes

Right, so these are the basic routines I follow for creating APA style factor analysis tables. I hope you'll find them helpful.

If you've any feedback, please throw me a comment below.

Thanks for reading!

Creating Boxplots in SPSS – Quick Guide

Introduction & Practice Data File

There's 3 ways to create boxplots in SPSS:

Graphs Legacy Dialogs Boxplot
Analyze Descriptive Statistics Explore
Graphs Chart Builder

The first approach is the simplest but it also has fewer options than the others. This tutorial walks you through all 3 approaches while creating different types of boxplots.

Boxplot for 1 Variable - 1 Group of Cases
Boxplot for Multiple Variables - 1 Group of Cases
Boxplot for 1 Variable - Multiple Groups of Cases
Tip 1 - Remove Outliers for Single Group
Tip 2 - Show Outlier Values in Boxplot
Tip 3 - Adding Titles to Boxplots

Example Data

All examples in this tutorial use driving-test.sav, partly shown below.

Our data file contains a sample of N = 238 people who were examined in a driving simulator. Participants were presented with 5 dangerous situations to which they had to respond as fast as possible. The data hold their reaction times and some other variables.

Boxplot for 1 Variable - 1 Group of Cases

We'll first run a boxplot for the reaction times on trial 1 for all cases. One option is Graphs Legacy Dialogs Boxplot which opens the dialogs shown below.

Completing these steps results in the syntax below.

*Boxplot for r01 (all cases).

EXAMINE VARIABLES=r01
/COMPARE VARIABLE
/PLOT=BOXPLOT
/STATISTICS=NONE
/NOTOTAL
/ID=id
/MISSING=LISTWISE.

Result

Our boxplot shows some potential outliers as well as extreme values. Interpreting these -and all other boxplot elements- is discussed in Boxplots - Beginners Tutorial. Also note that our boxplot doesn't have a title yet. Options for adding it are discussed in Tip 3 - Adding Titles to Boxplots.

Boxplot for Multiple Variables - 1 Group of Cases

We'll now create a single boxplot for our 5 reaction time variables for all participants. We navigate to Analyze Descriptive Statistics Explore and fill out the dialogs as shown below.

“Dependents together” means that all dependent variables are shown together in each boxplot. If you enter a factor -say, sex- you'll get a separate boxplot for each factor level -female and male respondents. “Factor levels together” creates a separate boxplot for each dependent variable, showing all factor levels together in each boxplot.

“Exclude cases pairwise” means that the results for each variable are based on all cases that don't have a missing value for that variable. “Exclude cases listwise” uses only cases without any missing values on all variables.

A minor note here is that many SPSS users select “Normality plots and tests” in this dialog for running a

Anyway. Completing these steps results in the syntax below. Let's run it.

*Boxplot for comparing 5 variables on 1 group of cases.

EXAMINE VARIABLES=r01 r02 r03 r04 r05
/COMPARE VARIABLE
/PLOT=BOXPLOT
/STATISTICS=NONE
/NOTOTAL
/ID=id
/MISSING=PAIRWISE /* IMPORTANT! */.

Result

Now, before inspecting our boxplot, take a close look at the Case Processing Summary table first.

The first columns tells how many cases were used for each variable. Note that trial 5 has N = 205 or 86.1% missing values. Remember that “Exclude cases listwise” was the default in the Explore dialog. If we hadn't changed that, then none of our variables would have used more than N = 33 cases. The actual boxplot, however, wouldn't show anything wrong. This really is a major pitfall. Please avoid it.

Anyway, the figure below shows our actual boxplot.

SPSS Boxplot Multiple Variables 1 Sample

Note that we already saw the first boxplot bar in our previous example. Second, trials 2 and 4 seem strongly positively skewed. Both variables look odd. We'd better inspect their histograms to see what's really going on.

Boxplot for 1 Variable - Multiple Groups of Cases

We'll now run a boxplot for trial 3 for age groups separately. We first navigate to Graphs Chart Builder and fill out the dialogs as shown below.

Select “Point ID Label” in this tab and then drag & drop r03 into the ID box on the canvas. Doing so will show actual outlier values in the final boxplot.

Completing these steps results in the syntax below.

*Boxplot for comparing 3 age groups on 1 variable.

GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=agegroup r03 MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: agegroup=col(source(s), name("agegroup"), unit.category())
DATA: r03=col(source(s), name("r03"))
GUIDE: axis(dim(1), label("Age Group"))
GUIDE: axis(dim(2), label("Reaction time trial 3"))
GUIDE: text.title(label("I CAN TYPE MY AMAZING TITLE RIGHT HERE!"))
SCALE: cat(dim(1), include("1", "2", "3"))
SCALE: linear(dim(2), include(0))
ELEMENT: schema(position(bin.quantile.letter(agegroup*r03)), label(r03))
END GPL.

Result

SPSS Boxplot Multiple Samples 1 Variable

This boxplot shows increasing medians and standard deviations with increasing ages. Note that our boxplot also shows outlier values. In this example, these are reaction times of 1,441 and 1,455 milliseconds but for the youngest age group only.

Tip 1 - Remove Outliers for Single Group

If you'd like to remove outliers based on boxplot results, you'd normally set them as user missing values. For example, MISSING VALUES r03 (1441 THRU HI). sets values of 1441 and higher as missing for r03. In our example, however, this won't work: the aforementioned values are potential outliers only for the youngest age group. For the other age groups, they're within a normal range.

A solution is converting these values into different values for the youngest age group only. One option is combining DO IF with RECODE. The syntax below, however, shows a shorter option based on IF.

*Quick checktable.

means r03 by agegroup
/cells count min max mean stddev.

*Recode potential outliers into 999999998 but only for agegroup 1.

if(agegroup = 1 and r03 >= 1441) r03 = 999999998.

*Set recoded outliers as user missing values.

missing values r03 (999999998).

*Apply value label to recoded outliers.

add value labels r03 999999998 'Value removed because outlier'.

*Rerun checktable.

means r03 by agegroup
/cells count min max mean stddev.

Tip 2 - Show Outlier Values in Boxplot

You can show data values for potential outliers and extreme values in boxplots. This only works if each boxplot involves a single dependent variable. Simply use this dependent variable as the ID variable too.
The only dialog that supports this is the Chart Builder. If you prefer the other dialogs, modifying the /ID subcommand in the syntax also does the trick.

*Show potential outlier values and extreme values -if any- in boxplot.

EXAMINE VARIABLES=r03 BY agegroup
/PLOT=BOXPLOT
/STATISTICS=NONE
/NOTOTAL
/ID=r03. /*Label outliers with actual data values.

Tip 3 - Adding Titles to Boxplots

There's 3 options for showing titles in SPSS boxplots:

create your boxplot via the Chart Builder as in example 3;
use a chart template that has a fixed title and/or subtitle;
add a title manually after creating your boxplot.

For this last option, open a Chart Editor window by double-clicking your chart. You can now add a title from the Options menu.

Note that you can adjust your title after adding it.

Final Notes

There's many more variations on boxplots, especially clustered boxplots. However, I think you'll get them done fairly easily after studying this tutorial.

If you've any questions or remarks, please throw me a comment below.

Thanks for reading!

Boxplots – Beginners Tutorial

A boxplot is a chart showing quartiles, outliers and
the minimum and maximum scores for 1+ variables.

Quartiles
Interquartile Range - IQR
Potential Outliers
Extreme Values
Boxplots or Histograms?

Example

A sample of N = 233 people completed a speedtask. The chart below shows a boxplot of their reaction times.

Some rough conclusions from this chart are that

all 233 reaction times lie between 0 and 3,000 milliseconds;
4 scores are high extreme values. These are reaction times between 2,551 and 2,905 milliseconds;
there's 1 high potential outlier of 1,749 milliseconds;
the maximum reaction time (excluding potential outliers and extreme values) is around 1,650 milliseconds;
75% of all respondents score lower than some 1,150 milliseconds. This is the 75th percentile or quartile 3;
50% of all respondents score lower than some 975 milliseconds. This is the 50th percentile (the median) or quartile 2;
25% of all respondents score lower than some 800 milliseconds. This is the 25th percentile or quartile 1;
the minimum reaction time (excluding potential outliers and extreme values) is around 350 milliseconds;
there's 1 low potential outlier of 239 milliseconds;
there aren't any low extreme values.

So what are quartiles? And how to obtain them? And how are potential outliers and extreme values defined?

We'll show you all you need to know in this Googlesheet, part of which is shown below.

Quartile 1

Quartile 1 is the 25th percentile: it is the score that separates the lowest 25% from the highest 75% of scores. In Googlesheets and Excel, =PERCENTILE.EXC(A2:A234,0.25) returns quartile 1 for the scores in cells A2 through A234 (our 233 reaction times). The result is 811.5. This means that 25% of our scores are lower than 811.5 milliseconds. Or -reversely- 75% are higher.

A minor complication here is that 25% of N = 233 scores results in 58.25 scores. As there's no such thing as “0.25 scores”, we can't precisely separate the lowest 25% from the highest 75%.
There's no real solution to this problem but a technique known as linear interpolation probably comes closest. This is how Excel, Googlesheets and SPSS all come up with 811.5 as quartile 1 for our 233 scores.

Quartile 2

Quartile 2 -also known as the median- is the 50th percentile: the score that separates the lowest 50% from the highest 50% of scores. In Googlesheets, =PERCENTILE.EXC(A2:A234,0.50) returns quartile 2 for the scores in cells A2 through A234. For these data, that'll be 954 milliseconds.

This median is a measure of central tendency: it tells us that people typically had a reaction time of 954 milliseconds. Common measures of central tendency are

the mean;
the median;
the mode.

Percentiles, quartiles and measures of central tendency can be obtained from SPSS’ Frequencies dialog.

Quartile 3

Quartile 3 is the 75th percentile: the score that separates the lowest 75% from the highest 25% of scores. In Googlesheets, =PERCENTILE.EXC(A2:A234,0.75) returns quartile 3 for the scores in cells A2 through A234. For our 233 reaction times, that'll be 1,164 milliseconds.

The screenshot below shows that SPSS comes up with the exact same quartiles as Excel and Googlesheets. We'll now use quartiles 1 and 3 (811.5 and 1,164 milliseconds) for computing the interquartile range or IQR.

SPSS comes up with identical quartiles for our N = 233 reaction times

Interquartile Range - IQR

The interquartile range or IQR is computed as

$$IQR = quartile\;3 - quartile\;1$$

so for our data, that'll be

$$IQR = 1,164 - 811.5 = 352.5$$

The IQR is a measure of dispersion: it tells how far data points typically lie apart. Common measures of dispersion are

the standard deviation
the variance;
the IQR;
the range.

Measures of dispersion in SPSS’ Frequencies dialog.

Potential Outliers

In boxplots, potential outliers are defined as follows:

low potential outlier: score is more than 1.5 IQR but at most 3 IQR below quartile 1;
high potential outlier: score is more than 1.5 IQR but at most 3 IQR above quartile 3.

For our data at hand, quartile 1 = 811.5 and the IQR = 352.5. Therefore, the thresholds for low potential outliers are

upper bound: 811.5 - 1.5 * 352.5 = 282.8;
lower bound: 811.5 - 3 * 352.5 = -246.0.

Scores that are smaller than this lower bound are considered low extreme values: these are scores even more than 3 IQR below quartile 1.

Potential Outliers And Extreme Values For Boxplots

Thresholds for high potential outliers are computed in a similar fashion, using quartile 3 and the IQR. To sum things up: for our data at hand, thresholds for potential outliers are

low potential outlier: -246 ≤ reaction time < 282.8 (milliseconds);
high potential outlier: 1,692.8 < reaction time ≤ 2,221.5 (milliseconds).

As shown in our boxplot example, potential outliers are typically shown as circles. These either lie below the minimum or above the maximum (both excluding outliers).

A final note here is that these definitions apply only to boxplots. In other contexts, z-scores are often used to define outliers.

Extreme Values

For boxplots, extreme values are defined as follows:

low extreme value: score is more than 3 IQR below quartile 1;
high extreme value: score is more than 3 IQR above quartile 3.

For our 233 reaction times, this implies

low extreme value: reaction time < -246 (milliseconds);
high extreme value: reaction time > 2,221.5 (milliseconds).

In boxplots, extreme values are usually indicated by asterisks (*). Note that our example boxplot shows 4 high extreme values but no low extreme values.

Boxplots - Purposes

Basic purposes of boxplots are

quick and simple data screening, especially for outliers and extreme values;
comparing 2+ variables for 1 sample (within-subjects test);
comparing 2+ samples on 1 variable (between-subjects test).

The figure below shows a quick boxplot comparison among 3 samples (age groups) on 1 variable (reaction time trial 3).

The youngest age group has 2 potential outliers. However, they don't look too bad as they'd fall in the normal range for the other age groups.
The young age group has the lowest “box”. This indicates that these respondents have the smallest IQR. Since the IQR ignores the bottom and top 25% of scores, this group does not necessarily have the smallest standard deviation too.
The median lies roughly midway between quartiles 1 and 3. This suggests a roughly symmetrical frequency distribution.
The oldest age group has the highest median reaction time and reversely. Respondents thus seem to get slower with increasing age.
Reaction time for the oldest respondents have the largest range: the scores seem to lie further apart insofar as respondents are older.

Boxplots or Histograms?

Histograms.

The figure below illustrates why I always prefer histograms over boxplots. It's based on the exact same data as our last boxplot example.

So what did the boxplot tell us that this histogram doesn't? Well, nothing really. Does it? Reversely, however, the histogram tells us that

reaction times seem to follow a bimodal distribution for the intermediate age group;
this distribution is therefore flattened (platykurtic) relative to a normal distribution. To some extent, this also holds for the other 2 age groups;
means as well as standard deviations seem to increase with increasing age.

Our histograms make these points much clearer than our boxplot: in boxplots, we can't see how scores are distributed within the “box” or between the “whiskers”.
A histogram, however, allows us to roughly reconstruct our original data values. A chart simply doesn't get any more informative than that.

Agree? Disagree? Throw me a comment below and let me know what you think.

Thanks for reading!

Clustered Bar Chart over Multiple Variables

Example Data
VARSTOCASES without VARSTOCASES
Restructuring the Data
SPSS Chart Builder - Basic Steps
Final Result

This tutorial shows how to create the clustered bar chart shown below in SPSS. As this requires restructuring our data, we'll first do so with a seriously cool trick.

Example Data

A sample of N = 259 respondents were asked “which means of transportation do you use on a regular basis?” Respondents could select one or more means of transportation for both work and leisure related travelling. The data thus obtained are in transportation.sav, partly shown below.

Note that our data consist of 2 sets (work and leisure) of 8 dichotomous variables (transportation options). For creating our chart, we need to stack these 2 sets vertically. The usual way to do so is with VARSTOCASES as illustrated below.

Now, we could use VARSTOCASES for our data but we find this rather tedious for 8 variables. We'll replace it with a trick that creates nicer results and requires less syntax too.

VARSTOCASES without VARSTOCASES

The screenshots below illustrate how to mimic VARSTOCASES without needing its syntax.

This method saves more effort insofar as you stack more variables: placing the IF command in step in a DO REPEAT loop does the trick.

Restructuring the Data

Let's now run these steps on transportation.sav with the syntax below. If you're not sure about some command, you can inspect its result in data view if you run EXECUTE right after it.

*Create checktable 1.
descriptives all.

*Add id to dataset.
compute id = $casenum.

*Vertically stack dataset onto itself.
add files file */file *.

*Identify copied cases.
compute Purpose = ($casenum > id).
value labels Purpose 0 'Work' 1 'Leisure'.

*Shift values from leisure into work variables.
do repeat #target = work_1 to work_8 / #source = leis_1 to leis_8.
if(purpose) #target = #source.
end repeat.

*Create checktable 2.
means work_1 to work_8 by Purpose.

If everything went right, the 2 checktables will show the exact same information.

Final Data Adjustments

Our data now have the structure required for running our chart. However, a couple of adjustments are still desirable:

since we want to see percentages instead of proportions, we'll RECODE our 0-1 variables into 0-100;
next, FORMATS adds percent signs to the recoded values;
the Chart Builder only computes means over quantitative variables. We'll therefore set all measurement levels to scale;
we'll remove a couple of variables that have become redundant.

*Convert 1 into 100 for percentages.
recode work_1 to work_8 (1 = 100).

*Set formats to percentages.
formats work_1 to work_8 (pct8).

*Set measurement levels (needed for SPSS chart builder & Custom Tables).
variable level work_1 to work_8 (scale).

*Remove (now) redundant variables.
add files file */drop leis_1 to id.

SPSS Chart Builder - Basic Steps

The screenshot below sketches some basic steps that'll result in our chart.

drag and drop the clustered bar chart onto the canvas;
select, drag and drop all outcome variables in one go into the y-axis box. Click “Ok” in the dialog that pops up;
drag “Purpose” (leisure or work) into the Color box;
go through these tabs, select “Transpose” and choose some title and subtitle for the chart;
clicking Paste results in the syntax below.

*Syntax pasted from SPSS Chart Builder.

GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=MEAN(work_1) MEAN(work_2) MEAN(work_3) MEAN(work_4)
    MEAN(work_5) MEAN(work_6) MEAN(work_7) MEAN(work_8) Purpose MISSING=LISTWISE REPORTMISSING=NO
    TRANSFORM=VARSTOCASES(SUMMARY="#SUMMARY" INDEX="#INDEX")
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: SUMMARY=col(source(s), name("#SUMMARY"))
DATA: INDEX=col(source(s), name("#INDEX"), unit.category())
DATA: Purpose=col(source(s), name("Purpose"), unit.category())
COORD: rect(dim(1,2), transpose(), cluster(3,0))
GUIDE: axis(dim(2), label("Mean"))
GUIDE: legend(aesthetic(aesthetic.color.interior), label("Purpose"))
GUIDE: text.title(label("Transportation Methods Used on Regular Basis"))
GUIDE: text.subsubtitle(label("All Respondents | N = 259"))
SCALE: cat(dim(3), reverse(), include("0", "1", "2", "3", "4", "5", "6", "7"))
SCALE: linear(dim(2), include(0))
SCALE: cat(aesthetic(aesthetic.color.interior), reverse(), include("0.00", "1.00"))
SCALE: cat(dim(1), include("0.00", "1.00"))
ELEMENT: interval(position(Purpose*SUMMARY*INDEX), color.interior(Purpose),
    shape.interior(shape.square))
END GPL.

Initial Result

Whoomp! There it is. We created the chart we were looking for. Sadly, it doesn't look too great:

despite setting all variable formats to PCT4 (percentages with zero decimal places), percent signs are missing from the x-axis;
the x-axis is labeled “Mean” instead of “Percentage”;
the chart doesn't show gridlines.

All such issues can be fixed in the Chart Editor which opens if we double-click our chart. A nicer option, though, is developing and applying a chart template. Our final result after doing so is shown below.

Final Result

Right. This tutorial has been pretty heavy on syntax but I hope you got the hang of it. Were you (not) able to recreate our chart? Do you have any other feedback? Please throw us a comment below.

Thanks for reading!

Creating Bar Charts with Means by Category

One of the most common research questions is do different groups have different mean scores on some variable? This question is best answered in 3 steps:

create a table showing mean scores per group -you'll probably want to include the frequencies and standard deviations as well;
create a chart showing mean scores per group;
run some statistical test -ANOVA in this case. However, this is only meaningful if your data are (roughly) a simple random sample from your target population.

We'll show the first 2 steps using an employee survey whose data are in bank-clean.sav. The screenshot below shows what these data basically look like.

Creating a Means Table

For creating a table showing means per category, we could mess around with Analyze Compare Means Means but its not worth the effort as the syntax is as simple as it gets. So let's just run it and inspect the result.

*Create table with mean ratings by job type.

means q1 by jtype
/cells count mean stddev.

Note that you could easily add more statistics to the CELLS subcommand such as

the median,
skewness,
kurtosis
and many others.

Result

SPSS Bar Chart Means By Category Means Table

Basically, our table tells us that the mean employee care ratings increase with higher job levels except for “upper management”. A proper descriptives table -always recommended- gives nicely detailed information. However, it's not very visual. So let's now run our chart.

SPSS Bar Chart Menu & Dialogs

As a rule of thumb, I create all charts from Graphs Legacy Dialogs and avoid the Chart Builder whenever I can -which is pretty much always unless I need a stacked bar chart with percentages.

In the dialog shown below, selecting Other statistic... enables you to enter a dependent variable;

Basic Bar Chart Means by Category Syntax

*Bar chart means by category syntax.

GRAPH
/BAR(SIMPLE)=MEAN(q1) BY jtype
/TITLE='Mean Employee Care Rating by Job Type'
/SUBTITLE='N = 423'.

Result

SPSS Bar Chart Means By Category With Chart Template

We now have our basic chart but it doesn't look too good.From SPSS version 25 onwards, it will look somewhat better. However, see New Charts in SPSS 25 - How Good Are They Really? Yet. We'll fix this by setting a chart template.

Note that a chart template can also sort this bar chart -for instance by descending means- but we'll skip that for now.

Bar Chart Means by Category Syntax II

*Set chart template (make sure the .sgt file is in default folder as shown by SHOW DIR.).

set ctemplate "bar-chart-means-trans-720-1.sgt".

*Rerun chart with template set.

GRAPH
/BAR(SIMPLE)=MEAN(q1) BY jtype
/TITLE='Mean Employee Care Rating by Job Type'
/SUBTITLE='N = 423'.

Result

Right, so that's about it. If you need a couple of similar charts, you could copy-paste-edit the last GRAPH command (no need to repeat the other commands). If you need many similar charts, you could loop over the GRAPH command with Python for SPSS.

If you're going to run other types of charts, don't forget to set a different chart template. Or switch if off altogether by running set ctemplate none.

Thanks for reading!

Summary

Finding the appropriate statistical test is easy if you're aware of

the basic type of test you're looking for and
the measurement levels of the variables involved.

For each type and measurement level, this tutorial immediately points out the right statistical test. We'll also briefly define the 6 basic types of tests and illustrate them with simple examples.

1. Overview Univariate Tests

MEASUREMENT LEVEL	NULL HYPOTHESIS	TEST
Dichotomous	Population proportion = x?	Binomial test Z-test for 1 proportion
Categorical	Population distribution = f(x)?	Chi-square goodness-of-fit test
Quantitative	Population mean = x?	One-sample t-test
	Population median = x?	Sign test for 1 median
	Population distribution = f(x)?	Kolmogorov-Smirnov test Shapiro-Wilk test

Univariate Tests - Quick Definition

Univariate tests are tests that involve only 1 variable. Univariate tests either test if

some population parameter -usually a mean or median- is equal to some hypothesized value or
some population distribution is equal to some function, often the normal distribution.

A textbook example is a one sample t-test: it tests if a population mean -a parameter- is equal to some value x. This test involves only 1 variable (even if there's many more in your data file).

2. Overview Within-Subjects Tests

MEASUREMENT LEVEL	2 VARIABLES	3+ VARIABLES
DICHOTOMOUS	McNemar test Z-test for dependent proportions	Cochran Q test
NOMINAL	Marginal homogeneity test	(Not available)
ORDINAL	Wilcoxon signed-ranks test Sign test for 2 related medians	Friedman test
QUANTITATIVE	Paired samples t-test	Repeated measures ANOVA

Within-Subjects Tests - Quick Definition

Within-subjects tests compare 2+ variables
measured on the same subjects (often people). An example is repeated measures ANOVA: it tests if 3+ variables measured on the same subjects have equal population means.

Within-subjects tests are also known as

paired samples tests (as in a paired samples t-test) or
related samples tests.

SPSS Nonparametric Tests Menu K Related Samples

“Related samples” refers to within-subjects and “K” means 3+.

3. Overview Between-Subjects Tests

OUTCOME VARIABLE	2 SUBPOPULATIONS	3+ SUBPOPULATIONS
Dichotomous	Z-test for 2 independent proportions	Chi-square independence test
Nominal	Chi-square independence test	Chi-square independence test
Ordinal	Mann-Whitney test (mean ranks) Median test for 2+ independent medians	Kruskal-Wallis test (mean ranks) Median test for 2+ independent medians
Quantitative	Independent samples t-test (means) Levene's test (variances)	One-way ANOVA (means) Levene's test (variances)

Between-Subjects Tests - Quick Definition

Between-subjects tests examine if 2+ subpopulations
are identical with regard to

a parameter (population mean, standard deviation or proportion) or
a distribution.

The best known example is a one-way ANOVA as illustrated below. Note that the subpopulations are represented by subsamples -groups of observations indicated by some categorical variable.

“Between-subjects” tests are also known as “independent samples” tests, such as the independent samples t-test. “Independent samples” means that subsamples don't overlap: each observation belongs to only 1 subsample.

4. Overview Association Measures

(VARIABLES ARE)	QUANTITATIVE	ORDINAL	NOMINAL	DICHOTOMOUS
QUANTITATIVE	Pearson correlation
ORDINAL	Spearman correlation Kendall’s tau Polychoric correlation	Spearman correlation Kendall’s tau Polychoric correlation
NOMINAL	Eta squared	Cramér’s V	Cramér’s V
DICHOTOMOUS	Point-biserial correlation Biserial correlation	Spearman correlation Kendall’s tau Polychoric correlation	Cramér’s V	Phi-coefficient Tetrachoric correlation

Association Measures - Quick Definition

Association measures are numbers that indicate
to what extent 2 variables are associated. The best known association measure is the Pearson correlation: a number that tells us to what extent 2 quantitative variables are linearly related. The illustration below visualizes correlations as scatterplots.

Correlation Coefficient Multiple Scatterplots

5. Overview Prediction Analyses

OUTCOME VARIABLE	ANALYSIS
Quantitative	(Multiple) linear regression analysis
Ordinal	Discriminant analysis or ordinal regression analysis
Nominal	Discriminant analysis or nominal regression analysis
Dichotomous	Logistic regression

Prediction Analyses - Quick Definition

Prediction tests examine how and to what extent
a variable can be predicted from 1+ other variables. The simplest example is simple linear regression as illustrated below.

Simple Linear Regression B Coefficient In Scatterplot

Prediction analyses sometimes quietly assume causality: whatever predicts some variable is often thought to affect this variable. Depending on the contents of an analysis, causality may or may not be plausible. Keep in mind, however, that the analyses listed below don't prove causality.

6. Classification Analyses

Classification analyses attempt to identify and
describe groups of observations or variables. The 2 main types of classification analysis are

factor analysis for finding groups of variables (“factors”) and
cluster analysis for finding groups of observations (“clusters”).

Factor analysis is based on correlations or covariances. Groups of variables that correlate strongly are assumed to measure similar underlying factors -sometimes called “constructs”. The basic idea is illustrated below.

Cluster analysis is based on distances among observations -often people. Groups of observations with small distances among them are assumed to represent clusters such as market segments.

Right. So that'll do for a basic overview. Hope you found this guide helpful! And last but not least,

thanks for reading!

New Charts in SPSS 25: How Good Are They Really?

SPSS hasn't exactly been known for generating nice charts, although you really can do so by using your own chart templates. SPSS 25, however, allows you to “create great looking charts by default!”

This made me wonder: are the new charts really that great? The remainder of this article walks through the old and new looks for the big 5 charts in data analysis. All charts are based on bank_clean.sav and I'll include all syntax as well.

Overview “Big 5” Charts

Chart	Purpose
Histogram	Summarize Single Metric Variable
Bar Chart Percentages	Summarize Single Categorical Variable
Scatterplot	Show Association Between 2 Metric Variables
Bar Chart Means by Category	Show Association Between Metric and Categorical Variable
Stacked Bar Chart Percentages	Show Association Between 2 Categorical Variables

1. Histogram

Histogram in SPSS 24

I'm not a big fan of the old histogram: the colors depress me and there is insufficient room between the title (“Histogram”) and the top of the figure.
The font sizes of the tick labels and statistics summary are too small. Unfortunately, you can't omit the statistics summary by syntax but you can style it or hide it altogether with a chart template.

Histogram in SPSS 25

The new histogram has handy grid lines by default and much nicer colors. I still think some font sizes are too small but altogether, it's a huge step forward.

Histogram Syntax Example

*Basic Histogram Syntax - Typed Manually.

frequencies whours
/format notable
/histogram.

2. Bar Chart Percentages

Bar Chart Percentages in SPSS 24

What really puzzles me with SPSS bar charts is that they are not transposed by default. With few categories, the bars become awkwardly wide. With many categories, there's insufficient room for the tick labels.

Bar Chart Percentages in SPSS 25

The new bar chart has much nicer styling. However, the bars are now even wider. I feel the transposed bar chart below works much better. Technically, it's the exact same chart as the previous examples, the only difference is that I applied a chart template. With this chart, I'll just increase or decrease its height, depending on the number of categories included.

Transposed Bar Chart with Chart Template

Bar Chart Percentages Syntax Example

*Basic Bar Chart Percentages over Categories from Legacy Dialogs.

GRAPH
/BAR(SIMPLE)=PCT BY gender
/TITLE "Frequency Distribution over Gender".

3. Scatterplot

Scatterplot in SPSS 24

The old scatterplot has a rather depressing grey background and no gridlines. However, I do think circles tend to clutter less than dots and thus were a good choice.

Scatterplot in SPSS 25

The new scatterplot has gridlines by default and a much nicer white background. However, I also feel the dots tend to clutter more than the old circles and it lacks color. If I need 50 shades of grey, I'll go and find myself a book shop.

Scatterplot Syntax Example

*Basic Scatterplot - Pasted from Legacy Dialogs.

GRAPH
/SCATTERPLOT(BIVAR)=whours WITH salary
/MISSING=LISTWISE
/TITLE='Scatterplot Working Hours versus Salary'.

4. Bar Chart Means by Category

Bar Chart Means by Category in SPSS 24

Apart from its dull styling: not transposing bar charts may leave little room for tick labels -in this case education levels. SPSS seems to “solve” the problem by using a very tiny font size that doesn't look nice and is hard to read.

Bar Chart Means by Category in SPSS 25

The new bar chart has a much nicer styling: it uses a reasonable font size for the x-axis categories. However, a much smaller font size is used for the y-axis. I think this looks rather awkward.
Perhaps even more awkward is the large white space beneath the chart. It doesn't attract a lot of attention in the output viewer window. However, after copy-pasting the chart into a report, it becomes clear that this can't be intentional -or at least I hope it isn't.

Bar Chart Means by Category Syntax Example

*Bar Chart Means by Category - Pasted from Legacy Dialogs.

GRAPH
/BAR(SIMPLE)=MEAN(salary) BY educ
/TITLE='Mean Salaries by Education Levels'.

5. Stacked Bar Chart Percentages

Stacked Bar Chart Percentages in SPSS 24

Regarding the old stacked bar chart, I can't help but wonder: You gotta be kidding me, right? Wrong. No kidding. This chart survived through SPSS 24. I'm not even going to discuss the looks. What I do want to point out, is that the y-axis label says “Count” instead of “Percent”. For adding injury to insult, the percent suffix is also missing from the tick labels.

Stacked Bar Chart Percentages in SPSS 25

The new version looks way better -especially the colors! A minor change to the chart builder dialog is that it adds a title by default to the syntax. Unfortunately, the title is wrong: the chart shows “job type by education”, not “education by job type”.
Note that the y-axis is appropriately labeled “Percent” in the new version but the percent suffix is still missing from the tick labels.
And once again, not transposing the chart leaves insufficient space for the tick labels which now had to be rotated and push down the x-axis label. Surely it's a matter of personal preference but I'd rather go for the layout shown below.

SPSS Stacked Bar Chart Percentages Chart Template Example

Transposed Stacked Bar Chart Percentages with Chart Template

Stacked Bar Chart Percentages Syntax Example

*Stacked Bar Chart Percentages - Pasted from Chart Builder SPSS 24.

GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=educ COUNT()[name="COUNT"] jtype MISSING=LISTWISE
    REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: educ=col(source(s), name("educ"), unit.category())
DATA: COUNT=col(source(s), name("COUNT"))
DATA: jtype=col(source(s), name("jtype"), unit.category())
GUIDE: axis(dim(1), label("Highest completed education level"))
GUIDE: axis(dim(2), label("Count"))
GUIDE: legend(aesthetic(aesthetic.color.interior), label("Current job type"))
SCALE: cat(dim(1), include("1", "2", "3", "4", "5", "6"))
SCALE: linear(dim(2), include(0))
SCALE: cat(aesthetic(aesthetic.color.interior), include("1", "2", "3", "4", "5"))
ELEMENT: interval.stack(position(summary.percent(educ*COUNT, base.coordinate(dim(1)))),
    color.interior(jtype), shape.interior(shape.square))
END GPL.

*Stacked Bar Chart Percentages - Pasted from Chart Builder SPSS 25.

GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=educ COUNT()[name="COUNT"] jtype MISSING=LISTWISE
    REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: educ=col(source(s), name("educ"), unit.category())
DATA: COUNT=col(source(s), name("COUNT"))
DATA: jtype=col(source(s), name("jtype"), unit.category())
GUIDE: axis(dim(1), label("Highest completed education level"))
GUIDE: axis(dim(2), label("Percent"))
GUIDE: legend(aesthetic(aesthetic.color.interior), label("Current job type"))
GUIDE: text.title(label("Stacked Bar Percent of Highest completed education level by Current ",
    "job type"))
SCALE: cat(dim(1), include("1", "2", "3", "4", "5", "6"))
SCALE: linear(dim(2), include(0))
SCALE: cat(aesthetic(aesthetic.color.interior), include("1", "2", "3", "4", "5"))
ELEMENT: interval.stack(position(summary.percent(educ*COUNT, base.coordinate(dim(1)))),
    color.interior(jtype), shape.interior(shape.square))
END GPL.

Subtitles and SPLIT FILE

Creating charts with SPLIT FILE on adds a subtitle to them. Since titles are centered by default, the logical place for the subtitle is right below the main title as shown below (“gender: female”).

Scatterplot with Subtitle in SPSS 25

Alternatively, one could left align the main title and right align the subtitle as shown below. It looks nicer and makes better use of the available space.

Scatterplot with Subtitle with Chart Template

Conclusion

Most charts have seriously improved in SPSS 25. I like the gridlines and the new color cycles -except for the new scatterplots which are entirely grey. Nevertheless, I can't help but feel it was a bit of a hasty job with sometimes insufficient eye for detail.
Next, there's the question: what if I liked the old looks better? Or what if my client does not want these changes? I had a quick look around but I couldn't find any way for returning to the old looks. If they're really really gone, then this violates SPSS’ backwards compatibility.
The same goes for the new charts being wider (854 by 504 pixels) than the old charts (629 by 504 pixels). I sure hope the larger width still fits into the layout of any reports that need to be delivered.

So what do you think about the old and new charts? Please let me know by dropping a comment below.

Thanks!

What Is a Histogram?

A histogram is a chart that shows frequencies for
intervals of values of a metric variable. Such intervals as known as “bins” and they all have the same widths. The example above uses $25 as its bin width. So it shows how many people make between $800 and $825, $825 and $850 and so on.
Note that the mode of this frequency distribution is between $900 and $925, which occurs some 150 times.

Histogram - Example

A company wants to know how monthly salaries are distributed over 1,110 employees having operational, middle or higher management level jobs. The screenshot below shows what their raw data look like.

Since these salaries are partly based on commissions, basically every employee has a slightly different salary. Now how can we gain some insight into the salary distribution?

Histogram Versus Bar Chart

We first try and run a bar chart of monthly salaries. The result is shown below.

Our bar chart is pretty worthless. The only thing we learn from it is that most salaries occur just once and some twice. The main problem here is that a bar chart shows the frequency with which each distinct value occurs in the data.
Importantly, note that the first interval is ($832 - $802 =) $30 wide. The last interval represents ($1206 - $1119 =) $87. But both are equally wide in millimeters on your screen. This tells us that the x-axis doesn't have a linear scale which renders this chart unsuitable for a metric variable such as monthly salary.

Histogram - Basic Example

Since our bar chart wasn't any good, we now try and run a histogram on our data. The result is shown below.

This chart looks much more useful but how was it generated? Well, we assigned each employee’s salary to a $25 interval ($800 - $825, $825 - $850 and so on). Next, we looked up the number of employees that fall within each such interval. We visualize these frequencies by bars in a chart.
Importantly, the x-axis of our chart has a linear scale: each $25 interval corresponds to the same width in millimeters even if it contains zero employees. The chart we end up with is known as a histogram and -as we'll see in a minute- it's a very useful one.

Histogram - Bin Width

The bin width is the width of the intervals
whose frequencies we visualize in a histogram. Our first example used a bin width of $25; the first bar represents the number of salaries between $800 and $825 and so on. This bin width of $25 is a rather arbitrary choice. The figure below shows histograms over the exact same data, using different bin widths.

Although different bin widths seem reasonable, we feel $10 is rather narrow and $100 is rather wide for the data at hand. Either $25 or $50 seems more suitable.

Histograms - Why Are They So Useful?

Why are histograms so useful? Well, first of all, charts are much more visual than tables; after looking at a chart for 10 seconds, you can tell much more about your data than after inspecting the corresponding table for 10 seconds. Generally, charts convey information about our data faster than tables -albeit less accurately.
On top of that, histograms also give us a much more complete information about our data. Keep in mind that you can reasonably estimate a variable’s mean, standard deviation, skewness and kurtosis from a histogram. However, you can't estimate a variable’s histogram from the aforementioned statistics. We'll illustrate this with an example.

Histogram Versus Descriptive Statistics

Let's say we find two age variables in our data and we're not sure which one we should use. We compare some basic descriptive statistics for both variables and they look almost identical.

So can we conclude that both age variables have roughly similar distributions? If you think so, take a look at their histograms shown below.

Histogram - Detecting Weird Distributions

Split Histogram - Frequencies

Each of the 1,110 employees in our data has a job level: operational, middle management or higher management. If we want to compare the salary distributions between these three groups, we may inspect a split histogram: we create a separate histogram for each job level and these three histograms have identical axes. The result is shown below.

Our split histogram totally sucks. The problem is that the group sizes are very unequal and these relate linearly to the surface areas of our histograms. The result is that the surface area for higher management (n = 10) is only 1% of the surface area for “operational” (n = 1,000). The histogram for higher management is so small that it's no longer visible.

Split Histogram - Percentages

We just saw how a split histogram with frequencies is useless for the data at hand. Does this mean that we can't compare salary distributions over job levels? Nope. If we choose percentages within job level groups, then each histogram will have the same surface area of 100%. The result is shown below.

Histogram - Final Notes

This tutorial aimed at explaining what histograms are and how they differ from bar charts. In our opinion, histograms are among the most useful charts for metric variables. With the right software (such as SPSS), you can create and inspect histograms very fast and doing so is an excellent way for getting to know your data.

SPSS Scatterplot Tutorial

A large bank wants to gain insight into their employees’ job satisfaction. They carried out a survey, the results of which are in bank_clean.sav. The survey included the number of hours people work per week and their gross monthly salaries.

Research Question

It seems obvious that working hours are related to monthly salaries: employees who work more hours earn more money. But we'd like to know more about this relationship so our research question is how (strongly) is monthly salary related to working hours? Since we already inspected this data file (and set missing values) we can simply run correlations whours salary. and see that the correlation is 0.648, quite a strong linear relation. Some would leave it at. However, a scatterplot will show that there's way more to this relation.

SPSS Scatterplot Creation

We'll first run our scatterplot the way most users find easiest: by following the screenshots below.

The aforementioned steps result in the syntax below. Running it creates our first basic scatterplot.

SPSS Scatterplot Syntax

*Minimal scatterplot syntax from legacy dialogs.

GRAPH
/SCATTERPLOT(BIVAR)=whours WITH salary
/MISSING=LISTWISE.

Note: you'll get the exact same result by running graph/scatter whours with salary. You probably prefer this second version if you want to create multiple scatterplots by copy-paste-editing the syntax. If you want to create a huge number of scatterplots, see SPSS with Python - Looping over Scatterplots.

Result

As we see, this is not a simple linear relation. First, we see that our dots become more dispersed as our respondents work more hours; the more hours people work, the greater the standard deviation of monthly salary. This is a textbook example of heteroscedasticity, the opposite of homoscedasticity, an important assumption for regression.
Second, the see the pattern of dots “bend upwards” towards the right side of our chart. This is a clear indication of nonlinearity, which also violates the regression assumptions.
So why do we see heteroscedasticity and nonlinearity in our scatterplot? Well, perhaps the higher hourly wages are only available for those in the more high level jobs which also require more hours per week. Interestingly, we have “job type” in our data, which comes somewhat close to job levels. Let's now add it to our scatterplot by following the screenshot below. Tip: use the dialog recall button for quick access to the scatter dialog.

SPSS Scatterplot with Legend

Set Markers by: uses a different colors for our dots, based on some variable. We'll enter jtype (job type).
Label Cases by: should label each dot with the value of a (unique identifier) variable but it doesn't work.You probably want to use this only for very small samples anyway. If you really need it: it does work in the chart builder but we'll skip it for now. We'll leave it empty.
Optionally, let's add some nice title to our chart.

SPSS Scatterplot with Legend Syntax

*Scatterplot with different colors for different job types.

GRAPH
/SCATTERPLOT(BIVAR)=whours WITH salary BY jtype
/MISSING=LISTWISE
/TITLE "Monthly Salary by Weekly Hours | n = 464".

Result

And there we have it. The cause for the heteroscedasticity and nonlinearity is that middle and upper managers have (very) high hourly wages and typically work more hours too than the other employees.
This plot also suggests that we should perhaps not lump together all job types: for sales employees (red dots), the relation between hours and salary looks very linear -presumably because their hourly wages are rather fixed. The precise opposite holds for upper management (black dots). We'll now confirm this by inspecting the correlation for each group separately.

Correlations for Job Types Separately

*Sort cases needed for split file.

sort cases by jtype.

*Split file.

split file by jtype.

*Separate correlations for job types.

correlations salary with whours.

split file off.

Result

Indeed, the correlation between hours and salary is 0.79 for sales employees and 0.21 for upper management. We'll leave it as an exercise to the reader to create scatterplots for separate job type groups.

Final Notes

Our first finding on these data was simply a correlation of 0.65 between working hours and salary. However, a scatterplot suggested that it wasn't quite as simple as that. I hope we gave you an idea how to create scatterplots easily in SPSS and why they can be very useful indeed.
Thanks for reading!

Creating APA Style Contingency Tables in SPSS

Running simple contingency tables in SPSS is easy enough. However, the default format is inconvenient and doesn't meet APA standards. This tutorial walks you through 3 options for creating the desired tables:

CROSSTABS is easy but requires some (manual) editing.
CTABLES is faster but a bit harder and requires a custom tables license.
TABLES is fast but rather challenging.

We'll use bank_clean.sav -partly shown below- for all examples. You can try them yourself after downloading the data.

So What Are Contingency Tables Anyway?

Contingency tables show frequencies for all
combinations of values of 2(+) variables. For example, the table below shows a contingency table -or “crosstab”- for 2 variables: marital status and education level.

Only 1 respondent had middle school and never married;
A total of 123 respondents never married;
Only 4 respondents have middle school or lower;
The table shows all combinations of marital status and education for 442 respondents.

Hopefully, contingency tables give some insight into the association -if any- between these variables. So how does that work? For a very simple explanation, see

Running Simple Contingency Tables in SPSS

The fastest way to create the table we just saw is running one line of SPSS syntax: crosstabs educ by marit. The categories of the first variable -educ or education- become rows in the table. The values of the second variable -marit or marital status- become the columns. As a rule of thumb, the columns hold the groups you want to compare on whatever goes into the rows. In this case, we're comparing marital status groups on education level.
If we had wanted to do the reverse -compare education level groups on marital status- we'd swap the rows and columns and run crosstabs marit by educ.

CROSSTABS with Column Percentages

Right. So how do our groups compare on education level? It's hard to see from our first table because each marital status group has a different n or sample size. We may see more of a pattern if we add column percentages. The syntax below does just that.

*Show only variable labels and value labels in following output tables.

set
tvars labels
tnumbers labels.

*Create contingency table with frequencies and column percentages.

crosstabs educ by marit
/cells count column.

Result

SPSS Contingency Table Frequencies Column Percentages

If we study the entire table, it seems that widowed respondents are somewhat higher educated than the others: no less than 56.8% hold a master's degree and another 13.5% even a PhD. The overall pattern is not very clear though and we'd perhaps better visualize it as a stacked bar chart.
In any case, the table format -percentages in rows- doesn't really help either and doesn't meet APA guidelines. So let's fix it.

Converting CROSSTABS to APA Tables

One solution is right-click our table and select Edit content In Separate Window In the pivot editor that opens, tick Pivoting trays as shown below.

SPSS Apa Contingency Tables Rearrange Table Dimensions 1

Now drag and drop statistics right underneath Marital status and just close the window.

Let's now make 2 text replacements:

use n instead of “Count”
use % instead of “% within Marital status”

Using the Ctrl + H shortkey in the output viewer should work. Or -much nicer- use the OUTPUT MODIFY syntax below if you're on SPSS version 22 or higher.

*Optionally: set proper column headers.

output modify
/select tables
/tablecells select = ['% within Marital status'] replace = '%' applyto = columnheader
/tablecells select = ['Count'] replace = 'n' applyto = columnheader.

Result

SPSS Apa Contingency Table From Crosstabs

APA Contingency Table Created with CROSSTABS

So that's the easiest way to create an APA style contingency table in SPSS. But -personally- I find it too much work, especially for several tables. So let's consider 2 alternatives.

APA Contingency Tables from CTABLES

The table we just created can be run in one go with CTABLES. However, this only works if you've a license for the custom tables module. You can check this by running show license. If the resulting table includes “Custom Tables”, try the syntax below.

*Create APA style contingency table with ctables.

ctables
/table educ by marit [count 'n' colpct '%']
/categories variables = educ marit total = yes.

Result

APA Contingency Table Created with CTABLES

CTABLES works great for this table but -unfortunately- you'll need a separate command for each table. The syntax is also somewhat challenging. I think you could paste it from Analyze Tables Custom Tables but I somehow never get the results I want from the menu.
A better way perhaps is to copy-paste-edit the syntax example for several tables. Or create and run the commands with Python for SPSS. This is faster but also harder. But if you like it fast and hard -I surely do- then go for it.

APA Contingency Tables from TABLES

So what if you don't have a license for CTABLES? Well, then there's TABLES. TABLES was replaced by CTABLES around 1990 and removed from all documentation. But -as very very few SPSS users know- TABLES still works in all recent SPSS versions.
Like so, the syntax below is an alternative for the CTABLES approach.

*Create APA style contingency table with tables.

tables
/format zero
/ftotal = ftot 'Total'
/table educ + ftot by marit > (statistics) + ftot
/statistics count((f3) 'n') cpct((pct5.1) '%':marit).

Result

APA Contingency Table Created with TABLES

This contingency table is almost identical to the CTABLES result. Unfortunately, the syntax is even harder and -worse- not documented. We might build a tool for creating the TABLES command from the menu but it won't come up any time soon.

Final Notes

We showed 3 ways for creating APA style contingency tables in SPSS:

CROSSTABS is easiest. You can create several tables in one go but they require quite some (manual) editing.
CTABLES runs the desired table straight away and could be run from the menu. However, it creates one table at the time and requires an additional license.
TABLES also comes up with the right table straight away. However, the syntax is difficult and there's no menu.

Admittedly, none of these options is ideal. So we might build a tool some time that runs many tables in one go from the menu.

Thanks for reading!