SPSS Tutorials

BASICS REGRESSION T-TEST ANOVA CORRELATION

SPSS – Process Variables Based on Names

We held a reaction time experiment in which people had to resolve 20 puzzles as fast as possible. The 20 answers and reactions times are in an SPSS data file which we named trials.sav. Part of if is shown below.

SPSS Python Example - Data View Trials.sav

In first instance, we'd just like to inspect the histograms of our reaction time variables. The easiest way is running FREQUENCIES as shown below but specifiying the right variable names is cumbersome even for this very tiny data file.

1
2
3
4
5
*Required command may look something like...

frequencies Trial_1_Reaction_Time_Milliseconds Trial_2_Reaction_Time_Milliseconds Trial_3_Reaction_Time_Milliseconds /*and so on through 20.
/format notable
/histogram.

1. Retrieve Variable Names by Index

A very easy option here is to include only those variables in our command that have “Time” in their name. Filtering variables by (part of their) names, labels, measurement levels or variable types could hardly be any easier if we have Python do it for us.
Now, we've 43 variables in our data but Python starts counting from 0. So our first and last variables should be indexed 0 and 42 by Python. Let's see if that's right by running the syntax below.

1
2
3
4
5
6
7
*Retrieve names of first and last variable by Python index.

begin program.
import spss
print spss.GetVariableName(0)
print spss.GetVariableName(42)
end program.

2. Retrieve All Variable Indices

So if we can retrieve the first and last variable names by their Python indices -0 and 42- then we can retrieve all of them if we have all indices. A standard way for doing just that is using the Python range method. As shown below, it generates a Python list holding all indices.

1
2
3
4
5
6
7
*Create Python list of all variable indices.

begin program.
import spss
print spss.GetVariableCount()
print range(spss.GetVariableCount())
end program.

Result

Python List Holding SPSS Variable Indices in Output

3. Retrieve All Variable Names

We'll now run a very simple Python loop over our variable indices. In each iteration, we'll retrieve one variable name, resulting in the names of all variables in our data. We'll filter out our target variables in the next step.

1
2
3
4
5
6
7
*Retrieve all variable names.

begin program.
import spss
for ind in range(spss.GetVariableCount()):
    print spss.GetVariableName(ind)
end program.

4. Filter Variable Names

Before we'll create and run the required syntax, we still need to filter out only those variables having “Time” in their variable names. We'll use a very simple Python if statement for doing so. The syntax below retrieves exactly our target variables.

1
2
3
4
5
6
7
8
9
*Retrieve all variable names holding "Time".

begin program.
import spss
for ind in range(spss.GetVariableCount()):
    varNam = spss.GetVariableName(ind)
    if 'Time' in varNam:
        print varNam
end program.

5. Create Python String Holding Target Variables

We'd like to create some SPSS syntax containing the variables we selected in the previous syntax. We'll first pass the names into a Python string we'll call timeVars. We first create it as an empty string and then concatenate each variable name and a space to it.

1
2
3
4
5
6
7
8
9
10
11
*Create Python string holding all reaction time variables.

begin program.
import spss
timeVars = ''
for ind in range(spss.GetVariableCount()):
    varNam = spss.GetVariableName(ind)
    if 'Time' in varNam:
        timeVars += varNam + ' '
print timeVars
end program.

6. Create Required SPSS Syntax

We're almost there now. The main step left is to create a the basic FREQUENCIES command we're after and insert the string holding variable names into it. We'll do so with a simple Python text replacement.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
*Create entire SPSS command for histograms.

begin program.
import spss
timeVars = ''
for ind in range(spss.GetVariableCount()):
    varNam = spss.GetVariableName(ind)
    if 'Time' in varNam:
        timeVars += varNam + ' '
spssSyntax = '''
FREQUENCIES %s
/FORMAT NOTABLE
/HISTOGRAM.
'''%timeVars
print spssSyntax
end program.

Result

Create SPSS FREQUENCIES Syntax with Python

7. Create and Run Desired Syntax

Let's take a close look at the syntax we just created. Is it exactly what we need? Sure? Then we'll comment out the print command and run our syntax with spss.Submit instead.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
*Create and run required command.

begin program.
import spss
timeVars = ''
for ind in range(spss.GetVariableCount()):
    varNam = spss.GetVariableName(ind)
    if 'Time' in varNam:
        timeVars += varNam + ' '
spssSyntax = '''
FREQUENCIES %s
/FORMAT NOTABLE
/HISTOGRAM.
'''%timeVars
#print spssSyntax
spss.Submit(spssSyntax)
end program.

Final Notes

While developing a solution, we quietly assumed we weren't allowed to make any changes to the data. In our opinion, the very first thing that should be done here is set the crazy long variable names as variable labels and use short variable names instead.

Also, we'd rather reorder our variables (first all reaction times, then all answers). The basic way for doing so is shown in SPSS - Reorder Variables with Syntax. Like so, we can use TO in a short FREQUENCIES command.

Thanks for reading!

Previous tutorial: SPSS – Change Value Labels with Python

Next tutorial: SPSS with Python – Looping over Scatterplots

Let me know what you think!

*Required field. Your comment will show up after approval from a moderator.

This tutorial has 6 comments