By Ruben Geert van den Berg on November 30, 2016 under SPSS Looping Tutorials.

# SPSS Python Loop Examples

What's the right way to run tables, charts or statistical tests in an iterative fashion in SPSS? LOOP or DO REPEAT won't help because they'll only loop over transformation commands. An SPSS macro can do the job but macros are hopelessly inefficient because they've no clue about your data; they can't look up any variable names, labels, data values or anything else that you typically need in your syntax.
The right way for looping over tables, charts and other procedures in SPSS is with Python. We'll show how to do so on some real world examples. We'll use alcotest.sav throughout, part of which is shown below.
Note that you need to have the SPSS Python Essentials proplery installed for running these examples on your own computer.

## Example 1: Simple Loop over Bar Charts

We'd like to visualize how mean reaction times are related to the order in which people went through the 3 alcohol conditions. We'll start by generating the syntax for the first chart from the menu as shown below.

As a rule of thumb, try to use Legacy Dialogs for generating charts. The interface and resulting syntax are wonderfully simple and often result in the exact same charts as the much more complex Chart Builder.

We'll remove all line breaks from the pasted syntax, resulting in GRAPH /BAR(SIMPLE)=MEAN(no_1) BY order. Running this line results the first desired bar chart. For running similar charts over different reaction times, we could copy-paste the line and replace `no_1` by `no_2` and so on. However, a cleaner way to go is with the Python syntax below.

## SPSS Python Loop Syntax 1

*Specify variable names manually as Python list object and just print it.

begin program.
import spss
varList = ['no_1','no_2','no_3','no_4','no_5']
print varList
end program.

*If variable list ok, loop over it.

begin program.
for var in varList:
spss.Submit('''
GRAPH /BAR(SIMPLE)=MEAN(%s) BY order.
'''%(var))
end program.

## Note

You'll probably recognize the bar chart syntax near the end of the second block. The only difference is that the variable name has been replaced by `%s`. This is a Python string placeholder and it'll be replaced by a different variable name in each iteration.

## Example 2: Look Up Variable Names from Data

One thing we don't like about the first example is spelling out the variable names. Python can retrieve them from your data in many ways. An approach that always works is specifying variable names with the SPSS TO and ALL keywords. As shown below, the specification can be expanded into a Python list over which you can loop as desired.

*Retrieve variable names from data and print for inspection.

begin program.
import spss,spssaux
varSpec = "no_1 to hi_5" #Specify variables with SPSS TO or ALL keywords
varDict = spssaux.VariableDict(caseless = True)
varList = varDict.expand(varSpec)
varList.sort(key = lambda x: varDict.VariableIndex(x))
print varList
end program.

*If variable list ok, loop over it.

begin program.
for var in varList:
spss.Submit('''
GRAPH /BAR(SIMPLE)=MEAN(%s) BY order.
'''%(var))
end program.

## Example 3: Parallel Looping

We'd now like to inspect scatterplots of reaction times of no alcohol versus medium alcohol over each of the 5 trials. Like previously, we'll first generate syntax for just one scatterplot as shown below.

After removing all line breaks, these steps result in GRAPH /SCATTERPLOT(BIVAR)=med_1 WITH no_1 /MISSING=LISTWISE.

## Retrieving Variable Names by Pattern

The syntax below sets up two empty Python lists and loops over all variable names in our data. Variable names starting with “no_” are added to one list and those that start with “med_” go into the other. Finally, we'll loop over both lists in parallel for generating our scatterplots.

*Retrieve variable names by pattern in name and print them.

begin program.
import spss
noVars,medVars = [],[] #set up two empty lists
for varInd in range(spss.GetVariableCount()): #loop over all variable indices
varName = spss.GetVariableName(varInd)
if varName.startswith('no_'): #if pattern in variable name...
elif varName.startswith('med_'):
medVars.append(varName)
print noVars,medVars
end program.

*If variable lists ok, run parallel loop over them.

begin program.
for listInd in range(len(noVars)):
spss.Submit('''
GRAPH /SCATTERPLOT(BIVAR)= %s WITH %s /MISSING=LISTWISE.
'''%(noVars[listInd],medVars[listInd]))
end program.

## Note

The second block loops over list indices (“listInd”) that refer to the first, second, ... element in either list. Python then retrieves the first, second, ... variable name from either list with `noVars[listInd]`.

## Example 4: Create Variable Names with Concatenation

We'll now show an easier option for our scatterplots that'll work if variable names end in simple numeric suffixes. We'll simply loop over a list holding numbers 1 through 5 (generated by `range(1,6)`) and concatenate these numbers to the variable name roots.

*Generate variable names by concatenating variable name root with numeric suffix.

begin program.
import spss
for varSuffix in range(1,6): #range(1,6) evaluates to [1, 2, 3, 4, 5]
spss.Submit('''
GRAPH /SCATTERPLOT(BIVAR)=no_%(varSuffix)d WITH med_%(varSuffix)d /MISSING=LISTWISE.
'''%locals())
end program.

## Note

In Python, `%d` is a general integer placeholder. It's replaced by some integer number that's specified later.
Alternatively, `%(varSuffix)d` is replaced by the integer number in `varSuffix` if `%locals()` is specified at the end. Using `%locals()` makes your code more readable and shorter, especially with multiple (text or number) placeholders.

## Example 5: Lower Triangular Loop

Our final example creates all possible different scatterplots among a set of variables. That is, if we'd run a correlation matrix of these variables, each cell underneath the main diagonal (hence “lower triangle”) is visualized in a scatterplot. This time we'll look up the variable names by their indices under variable view as shown below.

## Syntax

*Retrieve variable names by indices.

begin program.
import spss,spssaux
noVars = spssaux.GetVariableNamesList()[4:9] #variables 5 through 9 in SPSS variable view
print noVars
end program.

*Lower triangular loop.

begin program.
for i in range(len(noVars)):
for j in range(len(noVars)):
if i < j:
spss.Submit('''
GRAPH /SCATTERPLOT(BIVAR)=%s WITH %s /MISSING=LISTWISE.
'''%(noVars[i],noVars[j]))
end program.

## Final Note

Explaining every single line of Python code was way beyond the scope of this tutorial. However, with a bit of trial and error (and Google), you can adapt and reuse these examples in your own projects. Or so we hope anyway. Give it a shot. You'll get there.

# Comment on this Tutorial

*Required field. Your comment will show up after approval from a moderator.

# This Tutorial has 6 Comments

• ### By Jon Peck on December 2nd, 2016

Continuing...

If you just pass a list of names with no TO or ALL, you get back that list in the order you gave. You can have multiple TO's in the list.

If you see a case where the file order is not preserved , let me know.

• ### By Jon Peck on December 2nd, 2016

You can get simple expansion of lists with spssaux.VariableDict in file order like this.
import spssaux

print spssaux.VariableDict().expand(["jobtime", "educ"])
print spssaux.VariableDict(variableType="numeric").expand(["id","to", "salary"])
print spssaux.VariableDict().expand(["all"])

If you just pass a list of names with no TO or ALL,

• ### By Ruben Geert van den Berg on December 2nd, 2016

Hi Jon, thanks for your feedback!

I'm happy that installing Python with SPSS is so much easier now. I should add that quite some users choose not to install Python and then later have difficulties to add it to an installed SPSS version. That's a question I got quite a few times.

Honestly, I hardly ever use regexes for selecting variables. Most of the time, vDict.expand() works best for me because we're usually after ranges of variables rather than groups whose name matches some regex.

It's a pity that vDict.expand() requires so much code, though. It would be great if there would be some shortcut for it that's also fool proof regarding the variable order. Why not simply spssaux.ExpandVariables("id to agegroup")? IMHO, that would be a very welcome addition to this module.

• ### By jon peck on December 2nd, 2016

Since recent versions of Statistics install Python and the Python Essentials by default, using Python with Statistics is easier than ever.

Along those lines, I want to point out that the spssaux.VariableDict class can select variables using a pattern via the pattern argument, making this even easier.

• ### By Ruben Geert van den Berg on December 1st, 2016

Hi Jon! I know what you're going to say: there's no need to sort the varList after `varDict.expand` because the original variable order is retained.

However, I've come across cases where the variable order was distorted even with `caseless = True`. This was a long time ago so it may take some time to look up a working example but the only fix to this that always worked was sorting the varList on `VariableIndex`.