SPSS tutorials

BASICS DATA ANALYSIS T-TEST ANOVA CHI-SQUARE TEST

SPSS – Change Value Labels with Python

A local supermarket held a small survey, the data of which are in minisurvey.sav. Unfortunately, the software for downloading the data in SPSS format prefixes all variable and value labels with the variable names. The screenshot below shows part of the data.

SPSS Batch Change Value Labels With Python

Undesired Prefixes in Value Labels

Clicking on some value labels in variable view confirms that they've undesired prefixes as shown below. Obviously, we don't want to see these value labels in our output but we don't want to adjust all of them manually either. Fortunately, SPSS with Python allows us to fix the problem with just a few lines of code.

SPSS Remove Prefix From Value Labels

Removing Characters with Python

First off, you need to have the SPSS Python Essentials properly installed for running this tutorial’s syntax. We'll first create a string holding just one value label. We adjust it by extracting a substring in Python. Precisely, we want characters 9 through last. Since Python starts counting at 0, valLab[8:] does just that.

1
2
3
4
5
6
*Extract characters 9 through last with Python substring.

begin program.
valLab = 'v13_2A: Neutral'
print valLab[8:]
end program.

Finding the Colon in our Label

Unfortunately, our prefixes have different lengths so we can't just extract characters 9 through last. However, we do see that the prefixes always end with a colon and a space. The position of the (first) colon is found with find and tells us which characters to extract.

1
2
3
4
5
*Find (first occurrence of) ": ".

begin program.
print valLab.find(": ")
end program.

Fixing One Value Label

Our colon and space occur at position 6. Because we want our label to start after these 2 characters, we'll add another 2 to it as shown below. In short, valLab[valLab.find(": ") + 2:] always returns the desired value label.

1
2
3
4
5
6
*Remove prefix from just 1 value label.

begin program.
valLab = 'v1: Neutral'
print valLab[valLab.find(": ") + 2:]
end program.

Result

SPSS Python Substring Value Labels

Look up SPSS Dictionary Information

We can easily look up SPSS dictionary information with the Python spss module. Some examples are

where ind is the Python variable index (0 for the first variable, 1 for the second and so on). For value labels, however, we prefer using VariableDict() from the spssaux module. But let's first just find all variable names.

1
2
3
4
5
6
7
8
*Inspect variable information with spssaux.VariableDict().

begin program.
import spssaux
sDict = spssaux.VariableDict()
for var in sDict:
    print var,type(var)
end program.

Result

SPSS Python Vardict Vars

Look up Value Labels

We'll now look up our value labels. For each variable, we'll get a Python dictionary holding each labeled value and its label. Don't confuse a Python dict object with the SPSS dictionary; these are totally unrelated.

1
2
3
4
5
6
7
8
9
*Retrieve value labels (Python dict objects).

begin program.
import spssaux
sDict = spssaux.VariableDict()
for var in sDict:
    valLabs = var.ValueLabels
    print valLabs
end program.

Result

SPSS Value Labels As Python Dict In Output

Loop Over Values and Labels

A Python dict holds key-value pairs of which the keys are unique within the dict. We'll loop over these pairs and look up the key and value by using iteritems() as shown below.

1
2
3
4
5
6
7
8
9
10
*For each variable, loop through values with labels.

begin program.
import spssaux
sDict = spssaux.VariableDict()
for var in sDict:
    valLabs = var.ValueLabels
    for key,val in valLabs.iteritems():
        print key,val
end program.

A major source of confusion here is that SPSS values are keys in our Python dict. The Python dict values hold SPSS value labels. The figure below illustrates this mapping.

SPSS Value Labels As Python Dict

Create ADD VALUE LABELS Commands

So far, our syntax finds all variable names, values and value labels. Inserting these into ADD VALUE LABELS commands will set all value labels for the entire dataset. We'll create this syntax by concatenating these commands in a loop. Note that \n adds a line break after each line.

1
2
3
4
5
6
7
8
9
10
11
12
*Create basic SPSS syntax for adjusting all value labels.

begin program.
import spssaux
spssSyntax = ''
sDict = spssaux.VariableDict()
for var in sDict:
    valLabs = var.ValueLabels
    for key,val in valLabs.iteritems():
        spssSyntax += "ADD VALUE LABELS %s %s '%s'.\n"%(var,key,val)
print spssSyntax
end program.

Create Syntax for Adjusting Value Labels

At this point we'll add the correction for each value label that we developed earlier. The resulting syntax is almost what we need. Bonus points if you detect a problem with it before reading on.

1
2
3
4
5
6
7
8
9
10
11
12
13
*Create SPSS syntax for adjusting value labels.

begin program.
import spssaux
spssSyntax = ''
sDict = spssaux.VariableDict()
for var in sDict:
    valLabs = var.ValueLabels
    for key,val in valLabs.iteritems():
        val = val[val.find(": ") + 2:]
        spssSyntax += "ADD VALUE LABELS %s %s '%s'.\n"%(var,key,val)
print spssSyntax
end program.

Result

If you're really good with SPSS, you'll see that some value labels contain a single quote. Since the labels are enclosed in single quotes too, they'll end the label prematurely. In Python, we'd escape them with \' but this is SPSS syntax so we need '' instead.

SPSS Escape Single Quote With Two Single Quotes

Final Syntax

We'll now replace all single quotes within value labels by two single quotes. Second, we'll check if the colon and space we're looking for are actually present in each label and if not, we'll skip it. Third, we'll now run our SPSS syntax with spss.Submit so we need to import the spss module as well as spssaux.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
*Create and run final syntax.

begin program.
import spssaux,spss
spssSyntax = ''
sDict = spssaux.VariableDict()
for var in sDict:
    valLabs = var.ValueLabels
    for key,val in valLabs.iteritems():
        if(": ") in val:
            val = val[val.find(": ") + 2:]
        val = val.replace("'","''")
        spssSyntax += "ADD VALUE LABELS %s %s '%s'.\n"%(var,key,val)
spss.Submit(spssSyntax)
end program.

*We're all done after running this.

Final Notes

Our final syntax does the job: when running FREQUENCIES or some other command, we'll have nice, clean value labels in our output. I'm sure our clients will appreciate it.

SPSS Clean Value Labels In Output

The syntax could be shorter but it's simple and readable. You can easily modify it for capitalizing value labels or removing unwanted characters from them. I hope this tutorial also shows how to develop SPSS Python syntax in small steps.

Thanks for reading!

Previous tutorial: SPSS – Create Several Excel Files with Python

Next tutorial: SPSS with Python – Looping over Scatterplots

Let me know what you think!

*Required field. Your comment will show up after approval from a moderator.

This tutorial has 4 comments