SPSS - How to Sort Variables Using Python?
SPSS tutorials website header logo SPSS TUTORIALS VIDEO COURSE BASICS ANOVA REGRESSION FACTOR

SPSS – How to Sort Variables Using Python?

We held a reaction time experiment in which people had to resolve 20 puzzles as fast as possible. The 20 answers and reactions times are in an SPSS data file which we named trials.sav. Part of if is shown below.

SPSS Python Example - Data View Trials.sav

In first instance, we'd just like to inspect the histograms of our reaction time variables. The easiest way is running FREQUENCIES as shown below but specifying the right variable names is cumbersome even for this very tiny data file.

*Required command may look something like...

frequencies Trial_1_Reaction_Time_Milliseconds Trial_2_Reaction_Time_Milliseconds Trial_3_Reaction_Time_Milliseconds /*and so on through 20.
/format notable
/histogram.

If our reaction times were adjacent, we could address the entire block with the TO keyword. We'll therefore reorder our variables with ADD FILES as shown in SPSS - Reorder Variables with Syntax. The easiest way to get the job done is ADD FILES FILE */KEEP id to agegroup [reaction time variables here] ALL. ALL refers to all variables in our data that we haven't specified yet -in this case all answer variables. This command still requires spelling out all reaction time variables unless we have Python do that for us. Let's first just look up all variable names and proceed from there.

1. Retrieve Variable Names by Index

The spss module allows us to retrieve variable names by index. Now, we've 43 variables in our data but Python starts counting from 0. So our first and last variables should be indexed 0 and 42 by Python. Let's see if that's right by running the syntax below.

*Retrieve names of first and last variable by Python index.

begin program python3.
import spss
print(spss.GetVariableName(0))
print(spss.GetVariableName(42))
end program.

2. Retrieve All Variable Indices

So if we can retrieve the first and last variable names by their Python indices -0 and 42- then we can retrieve all of them if we have all indices. A standard way for doing just that is using range.

Technically, range is an iterable object which means that we can loop over it. The syntax below shows how that works.

*Print integers from 0 through 9.

begin program python3.
for ind in range(10):
    print(ind)
end program.


*Print all variable indices.

begin program python3.
import spss
for ind in range(spss.GetVariableCount()):
    print(ind)
end program.

Result

SPSS Python Variable Indices In Output

3. Retrieve All Variable Names

We'll now run a very simple Python for loop over our variable indices. In each iteration, we'll retrieve one variable name, resulting in the names of all variables in our data. We'll filter out our target variables in the next step.

*Retrieve all variable names.

begin program python3.
import spss
for ind in range(spss.GetVariableCount()):
    print(spss.GetVariableName(ind))
end program.

4. Filter Variable Names

Before we'll create and run the required syntax, we still need to filter out only those variables having “Time” in their variable names. We'll use a very simple Python if statement for doing so. The syntax below retrieves exactly our target variables.

*Retrieve all variable names holding "Time".

begin program python3.
import spss
for ind in range(spss.GetVariableCount()):
    varNam = spss.GetVariableName(ind)
    if 'Time' in varNam:
        print(varNam)
end program.

5. Create Python String Holding Target Variables

We'd like to create some SPSS syntax containing the variables we selected in the previous syntax. We'll first pass the names into a Python string we'll call timeVars. We first create it as an empty string and then concatenate each variable name and a space to it.

*Create Python string holding all reaction time variables.

begin program python3.
import spss
timeVars = ''
for ind in range(spss.GetVariableCount()):
    varNam = spss.GetVariableName(ind)
    if 'Time' in varNam:
        timeVars += varNam + ' '
print(timeVars)
end program.

Minor note: editing Python strings is considered bad practice because they're immutable. Our reason for doing so anyway is that it keeps things simple. It gets the job done just fine unless we're processing a truly massive amount of code -which we basically never do in SPSS.

6. Create Required SPSS Syntax

We're almost there. We'll now create our basic ADD FILES command as a Python string. In the syntax below, %s is a placeholder that we'll replace with our time variable names. For more details on this technique, please consult SPSS Python Text Replacement Tutorial.

*Create required ADD FILES command and inspect it.

begin program python3.
import spss
timeVars = ''
for ind in range(spss.GetVariableCount()):
    varNam = spss.GetVariableName(ind)
    if 'Time' in varNam:
        timeVars += varNam + ' '
spssSyntax = "ADD FILES FILE */KEEP id to agegroup %s ALL."%timeVars
print(spssSyntax)
end program.

7. Create and Run Desired Syntax

Let's take a close look at the syntax we just created. Is it exactly what we need? Sure? Then we'll comment out the print command and run our syntax with spss.Submit instead.

*Create syntax and have Python run it in SPSS.

begin program python3.
import spss
timeVars = ''
for ind in range(spss.GetVariableCount()):
    varNam = spss.GetVariableName(ind)
    if 'Time' in varNam:
        timeVars += varNam + ' '
spssSyntax = "ADD FILES FILE */KEEP id to agegroup %s ALL."%timeVars
#print spssSyntax
spss.Submit(spssSyntax)
end program.

execute.

8. Run Histograms

Now that we nicely sorted our variables, running the desired histograms is easily done with the TO keyword as shown below.

*Run histograms over all reaction times.

frequencies Trial_1_Reaction_Time_Milliseconds to Trial_20_Reaction_Time_Milliseconds
/format notable
/histogram.

Although we got our first job done, we're rather dissatisfied with the crazy long variable names. In our humble opinion, variable names should be short and simple. More elaborate descriptions of what variables mean should go into their variable labels.

We're going to do just that in our next lesson.

Let's move on.

Set SPSS Variable Names as Labels with Python

Our previous lesson started out with a rather problematic data file. We reordered our variables with Python and saved our data as trials-ordered.sav, the starting point for this lesson. The screenshot below shows part of the data.

We find the long variable names problematic for two reasons: first, typing them into a syntax window is too much work and results in overly long, unmanageable syntax. More importantly, the underscores don't look nice in our output but variable names can't hold spaces instead.
We'll therefore set our variable names as variable labels and replace the underscores by spaces. Finally, we'll replace the long names by nice and short ones.

1. Retrieve All Variable Names from Data

Retrieving all variable names from our data is a standard technique that we cover in Sort Variables in SPSS with Python. We'll do it with the syntax below once again.

*Look up all variable names.

begin program python3.
import spss
for ind in range(spss.GetVariableCount()):
    varNam = spss.GetVariableName(ind)
    print(varNam)
end program.

2. Retrieve All Variable Labels from Data

Note that our first 3 variables already have a label. In order to ensure we don't overwrite them, we'll now inspect all variable labels as well, which is a simple function covered by the spss module.

*Look up all variable labels.

begin program python3.
import spss
for ind in range(spss.GetVariableCount()):
    varNam = spss.GetVariableName(ind)
    varLab = spss.GetVariableLabel(ind)
    print(varLab)
end program.

3. Create Variable Labels with Python

If some variable does not have a label yet, Python will return an empty string. We'll check if this holds with if not varLab:, which is True if the label is empty. For those variables, we'll create a variable label by replacing the underscores in their names by spaces. For now, we'll just print these labels.

*If variable label empty, variable label = variable name with underscores replaced by spaces.

begin program python3.
import spss
for ind in range(spss.GetVariableCount()):
    varNam = spss.GetVariableName(ind)
    varLab = spss.GetVariableLabel(ind)
    if not varLab: # True if varLab is empty string (no VARIABLE LABEL set)
        varLab = varNam.replace("_"," ")
        print(varLab)
end program.

4. Create VARIABLE LABELS Commands

We'll now create and inspect VARIABLE LABELS commands with "VARIABLE LABELS %s '%s'."%(varNam,varLab) The first %s is replaced by the variable name, the second by its newly created label. This technique is explained in SPSS Python Text Replacement Tutorial.

*Create and inspect required VARIABLE LABELS commands.

begin program python3.
import spss
for ind in range(spss.GetVariableCount()):
    varNam = spss.GetVariableName(ind)
    varLab = spss.GetVariableLabel(ind)
    if not varLab:
        varLab = varNam.replace("_"," ")
        print("VARIABLE LABELS %s '%s'."%(varNam,varLab))
end program.

Note: since we use single quotes around the variable label, there may be no single quotes within it. This is no issue here but if it is, escape each single quote within the label with 2 single quotes.

Result

5. Run VARIABLE LABELS Commands

Since our VARIABLE LABELS commands look fine, we'll now have Python run them in SPSS. We basically just replace print with spss.Submit and add some parentheses. This concludes the first part of our job.

*Have Python run VARIABLE LABELS commands.

begin program python3.
import spss
for ind in range(spss.GetVariableCount()):
    varNam = spss.GetVariableName(ind)
    varLab = spss.GetVariableLabel(ind)
    if not varLab:
        varLab = varNam.replace("_"," ")
        spss.Submit("VARIABLE LABELS %s '%s'."%(varNam,varLab))
end program.

6. RENAME VARIABLES

At this point, at least our output will look good if we display only variable labels (not names) with SET TVARS LABELS. However, we still prefer nice and short variable names. If we use the TO keyword, a single line RENAME VARIABLES command does the trick for us.

*Set nice, short variable names for reaction times and answers.

rename variables(Trial_1_Reaction_Time_Milliseconds to Trial_20_Answer = time_1 to time_20 answer_1 to answer_20).

Final Result

As shown, our variable names are now nice and short and we've decent variable labels as well. This'll surely pay off when further editing or analyzing these data...

Thanks for reading!

Looping over SPSS Commands with Python

A nasty limitation of SPSS is that some commands take only one variable. Now, DO REPEAT and LOOP allow us to loop over variables but they are limited to SPSS transformation commands.

Python, however, allows us to loop over any command. On top of that, we can use the TO keyword, thus circumventing the need to spell out all variable names.

This lesson -covering both techniques- is among the most important of this course. Let's dive in!

Population Pyramids

We previously cleaned up some reaction time data and this resulted in trials-renamed.sav, part of which is shown below.

We'd now like to visualize the performance of our female versus male participants. A great way for doing so is running population pyramids. They allow for a “quick and dirty” comparison of

and so on between males and females. We'll first just create one by following the screenshots below.

SPSS Population Pyramid Syntax

*Population pyramid syntax, as pasted.

XGRAPH CHART=[HISTOBAR] BY time_1[s] BY gender[c]
/COORDINATE SPLIT=YES
/BIN START=AUTO SIZE=AUTO
/TITLES TITLE='Reaction Time by Gender'.

Apparently, we need 4 lines of syntax for one population pyramid and we'd like to have a quick peek at 20 of them.Admittedly, we could shorten the syntax somewhat but let's not waste time on it. So we need at least 80 lines of syntax. Right?

Expanding SPSS’ TO Keyword

Wrong. As in previous lessons, we'll have Python loop over this command and use a different variable name in each iteration. In this case, our reaction time variables have such simple names that we can generate all of them with a Python list comprehension like ["time_%s"%ind for ind in range(1,21)] But what about variable names that don't follow such a simple pattern? In SPSS, we'll usually specify a block of variables with the first and last variable names separated by SPSS’ TO keyword. Additional variables may be added, separated by spaces as in time_1 time_3 to time_6 time_8 time_12 Python can expand this SPSS variable specification into a Python list of variable names. This is among the most important SPSS Python techniques and it's demonstrated below.

*Retrieve block of variables with TO keyword.

begin program python3.
import spssaux
sDict = spssaux.VariableDict(caseless = True)
varList = sDict.expand("tiME_1 to time_20")
print(varList)
end program.

Note: since Python is case sensitive, we often need to use the correct casing for variable names and any other SPSS objects. For VariableDict(), however, adding caseless = True allows us to use any casing we like, which is usually all lowercase.

Running our Population Pyramids

We can now use a simple Python for loop for iterating over our XGRAPH commands. In each iteration, Python replaces %s with a variable name. This gets our job done.

*Run population pyramid over block of variables.

begin program python3.
import spssaux,spss
sDict = spssaux.VariableDict(caseless = True)
varList = sDict.expand("time_1 to time_20")
for var in varList:
    spss.Submit('''
XGRAPH CHART=[HISTOBAR] BY %s[s] BY gender[c]
/COORDINATE SPLIT=YES
/BIN START=AUTO SIZE=AUTO
/TITLES TITLE='Reaction Time by Gender'.
'''%var)
end program.

Result

Insert Variable Labels into Titles

We got our basic job done. However, we'll now insert our variable labels into our chart titles. Now, spss.GetVariableLabel() can retrieve variable labels by variable indices. For getting variable labels by variable names, however, the aforementioned VariableDict() object comes in handy (line 8, below).
In our final example, we have two text replacements in each XGRAPH command. When using multiple text replacements, using locals() is often a nice way to get things done.

*Use locals() for text replacements and throw variable label into title.

begin program python3.
import spssaux,spss
sDict = spssaux.VariableDict(caseless = True)
varList = sDict.expand("time_1 to time_20")
for var in varList:
    varLab = sDict[var].VariableLabel
    spss.Submit('''
XGRAPH CHART=[HISTOBAR] BY %(var)s[s] BY gender[c]
/COORDINATE SPLIT=YES
/BIN START=AUTO SIZE=AUTO
/TITLES TITLE='%(varLab)s by Gender'.
'''%locals())
end program.

So I guess that'll do for this lesson. If you've any feedback, please let us know.

Hope you found it helpful!

SPSS – Extract ISO Weeks from Date Variable

For 2022, the Spring holidays in the Netherlands fall in week 8. Now, this “week 8” refers to ISO week 8, 2022, which runs from 21 through 28 February 2022.

Now, if we've an SPSS data file containing dates, how can we extract ISO week numbers from those? We'll quickly walk you through.

Creating Test Data

Let's first create some test data: the syntax below creates an SPSS date variable for today and its previous 999 days.

*CREATE DATE VARIABLE FOR TODAY + PREVIOUS 999 DAYS.

input program.
loop mydate = 1 to 1000.
end case.
end loop.
end file.
end input program.
execute.

compute mydate = datesum($time,-mydate + 1,'days').
execute.

formats mydate (date11).

Result

SPSS Extract Isoweek From Date Test Data

SPSS’ Nonsensical Week Numbers

Right. Now, SPSS seems able to extract week numbers, which is done with something like compute week = xdate.week(mydate). Sadly, these week numbers are nonsensical and useless as discussed in SPSS Computes Wrong Week Numbers? For example, some weeks may consist of only a single day as shown below.

SPSS Week Numbers Wrong

In short: SPSS can't extract “normal” week numbers from dates. However, Python presents us with a workaround.

Week Numbers from Python Isocalendar

Python's datetime module contains an ISO calendar. Given a Python datetime object, this returns a tuple containing its ISO year, week and day.

Fortunately, the datetime module is part of the SPSS Python Essentials so there's no need for installing it. The syntax below presents a very quick example for how to use it.

*CREATE PYTHON DATETIME OBJECT AND EXTRACT ISOCALENDAR FROM IT.

begin program python3.
import datetime
myDate = datetime.datetime(2022,1,1,8,31,0)
print("My date is {}.".format(myDate))
print("But my ISOdate is {}.".format(myDate.isocalendar()))
end program.

Result

Python Datetime Isocalendar Result

Note that 1 January 2022 falls in ISO week 52 for 2021. Our Google calendar confirms that this is correct as shown below.

Isoweeks In Google Calendar

Right. So now we're left with 2 challenges:

The first step is done with the CvtSpssDatetime function from the spssdata module. Next up, we can apply any Python function to one or many SPSS variables with an SPSS extension: Programmability Transformation as found under Transform SPSS Menu Arrow Programmability Transformation If this is not present in your SPSS version, you may download and install SPSSINC_TRANS.spe from this Github page. Let's now apply both fixes with our final syntax below.

Final Syntax

*DEFINE FUNCTION FOR ISO WEEK FROM SPSS DATE.

begin program python3.
import datetime
from spssdata import CvtSpssDatetime
def iso(spssdate):
    return CvtSpssDatetime(spssdate).isocalendar()
end program.

*APPLY FUNCTION TO SPSS VARIABLE.

spssinc trans result=isoyear isoweek isoday TYPE = 0
/formula "iso(mydate)".

Result

SPSS Extract Isoweek From Date Result

As shown, we have now correctly added the ISO year, week and day for our SPSS date variable.

Final Notes

Before I round off, I'd like to emphasize that SPSSINC TRANS is useful for many more interesting challenges such as

This extension really deserves a separate tutorial and I may write just that if my time will permit it.

Credits

I'd like to thank my dear colleague Jon Peck for creating the SPSSINC TRANS extension as well as the final syntax for this tutorial.

SPSS – Edit Value Labels with Python

A local supermarket held a small survey, the data of which are in minisurvey.sav. Unfortunately, the software for downloading the data in SPSS format prefixes all variable and value labels with the variable names. The screenshot below shows part of the data.

SPSS Batch Change Value Labels With Python

Undesired Prefixes in Value Labels

Clicking on some value labels in variable view confirms that they've undesired prefixes as shown below. Obviously, we don't want to see these value labels in our output but we don't want to adjust all of them manually either. Fortunately, SPSS with Python allows us to fix the problem with just a few lines of code.

SPSS Remove Prefix From Value Labels

Removing Characters with Python

First off, you need to have the SPSS Python Essentials properly installed for running this tutorial’s syntax. We'll first create a string holding just one value label. We adjust it by extracting a substring in Python. Precisely, we want characters 9 through last. Since Python starts counting at 0, valLab[8:] does just that.

*Extract characters 9 through last with Python substring.

begin program python3.
valLab = 'v13_2A: Neutral'
print(valLab[8:])
end program.

Finding the Colon in our Label

Unfortunately, our prefixes have different lengths so we can't just extract characters 9 through last. However, we do see that the prefixes always end with a colon and a space. The position of the (first) colon is found with find and tells us which characters to extract.

*Find (first occurrence of) ": ".

begin program python3.
print(valLab.find(": "))
end program.

Fixing One Value Label

Our colon and space occur at position 6. Because we want our label to start after these 2 characters, we'll add another 2 to it as shown below. In short, valLab[valLab.find(": ") + 2:] always returns the desired value label.

*Remove prefix from just 1 value label.

begin program python3.
valLab = 'v1: Neutral'
print(valLab[valLab.find(": ") + 2:])
end program.

Result

SPSS Python Substring Value Labels

Look up SPSS Dictionary Information

We can easily look up SPSS dictionary information with the Python spss module. Some examples are

where ind is the Python variable index (0 for the first variable, 1 for the second and so on). For value labels, however, we prefer using VariableDict() from the spssaux module. But let's first just find all variable names.

*Inspect variable information with spssaux.VariableDict().

begin program python3.
import spssaux
sDict = spssaux.VariableDict()
for var in sDict:
    print(var,type(var))
end program.

Result

SPSS Python Vardict Vars

Look up Value Labels

We'll now look up our value labels. For each variable, we'll get a Python dictionary holding each labeled value and its label. Don't confuse a Python dict object with the SPSS dictionary; these are totally unrelated.

*Retrieve value labels (Python dict objects).

begin program python3.
import spssaux
sDict = spssaux.VariableDict()
for var in sDict:
    valLabs = var.ValueLabels
    print(valLabs)
end program.

Result

SPSS Value Labels As Python Dict In Output

Loop Over Values and Labels

A Python dict holds key-value pairs of which the keys are unique within the dict. We'll loop over these pairs and look up the key and value by using iteritems() as shown below.

*For each variable, loop through values with labels.

begin program python3.
import spssaux
sDict = spssaux.VariableDict()
for var in sDict:
    valLabs = var.ValueLabels
    for key,val in valLabs.items():
        print(key,val)
end program.

A major source of confusion here is that SPSS values are keys in our Python dict. The Python dict values hold SPSS value labels. The figure below illustrates this mapping.

SPSS Value Labels As Python Dict

Create ADD VALUE LABELS Commands

So far, our syntax finds all variable names, values and value labels. Inserting these into ADD VALUE LABELS commands will set all value labels for the entire dataset. We'll create this syntax by concatenating these commands in a loop. Note that \n adds a line break after each line.

*Create basic SPSS syntax for adjusting all value labels.

begin program python3.
import spssaux
spssSyntax = ''
sDict = spssaux.VariableDict()
for var in sDict:
    valLabs = var.ValueLabels
    for key,val in valLabs.items():
        spssSyntax += "ADD VALUE LABELS %s %s '%s'.\n"%(var,key,val)
print(spssSyntax)
end program.

Create Syntax for Adjusting Value Labels

At this point we'll add the correction for each value label that we developed earlier. The resulting syntax is almost what we need. Bonus points if you detect a problem with it before reading on.

*Create SPSS syntax for adjusting value labels.

begin program python3.
import spssaux
spssSyntax = ''
sDict = spssaux.VariableDict()
for var in sDict:
    valLabs = var.ValueLabels
    for key,val in valLabs.items():
        val = val[val.find(": ") + 2:]
        spssSyntax += "ADD VALUE LABELS %s %s '%s'.\n"%(var,key,val)
print(spssSyntax)
end program.

Result

If you're really good with SPSS, you'll see that some value labels contain a single quote. Since the labels are enclosed in single quotes too, they'll end the label prematurely. In Python, we'd escape them with \' but this is SPSS syntax so we need '' instead.

SPSS Escape Single Quote With Two Single Quotes

Final Syntax

We'll now replace all single quotes within value labels by two single quotes. Second, we'll check if the colon and space we're looking for are actually present in each label and if not, we'll skip it. Third, we'll now run our SPSS syntax with spss.Submit so we need to import the spss module as well as spssaux.

*Create and run final syntax.

begin program python3.
import spssaux,spss
spssSyntax = ''
sDict = spssaux.VariableDict()
for var in sDict:
    valLabs = var.ValueLabels
    for key,val in valLabs.items():
        if(": ") in val:
            val = val[val.find(": ") + 2:]
        val = val.replace("'","''")
        spssSyntax += "ADD VALUE LABELS %s %s '%s'.\n"%(var,key,val)
spss.Submit(spssSyntax)
end program.

*We're all done after running this.

Final Notes

Our final syntax does the job: when running FREQUENCIES or some other command, we'll have nice, clean value labels in our output. I'm sure our clients will appreciate it.

SPSS Clean Value Labels In Output

The syntax could be shorter but it's simple and readable. You can easily modify it for capitalizing value labels or removing unwanted characters from them. I hope this tutorial also shows how to develop SPSS Python syntax in small steps.

Thanks for reading!