SPSS TUTORIALS BASICS ANOVA REGRESSION FACTOR CORRELATION

# SPSS – Change Value Labels with Python

A local supermarket held a small survey, the data of which are in minisurvey.sav. Unfortunately, the software for downloading the data in SPSS format prefixes all variable and value labels with the variable names. The screenshot below shows part of the data.

## Undesired Prefixes in Value Labels

Clicking on some value labels in variable view confirms that they've undesired prefixes as shown below. Obviously, we don't want to see these value labels in our output but we don't want to adjust all of them manually either. Fortunately, SPSS with Python allows us to fix the problem with just a few lines of code.

## Removing Characters with Python

First off, you need to have the SPSS Python Essentials properly installed for running this tutorial’s syntax. We'll first create a string holding just one value label. We adjust it by extracting a substring in Python. Precisely, we want characters 9 through last. Since Python starts counting at 0, `valLab[8:]` does just that.

 ```1 2 3 4 5 6``` *Extract characters 9 through last with Python substring. begin program. valLab = 'v13_2A: Neutral' print valLab[8:] end program.

## Finding the Colon in our Label

Unfortunately, our prefixes have different lengths so we can't just extract characters 9 through last. However, we do see that the prefixes always end with a colon and a space. The position of the (first) colon is found with `find` and tells us which characters to extract.

 ```1 2 3 4 5``` *Find (first occurrence of) ": ". begin program. print valLab.find(": ") end program.

## Fixing One Value Label

Our colon and space occur at position 6. Because we want our label to start after these 2 characters, we'll add another 2 to it as shown below. In short, `valLab[valLab.find(": ") + 2:]` always returns the desired value label.

 ```1 2 3 4 5 6``` *Remove prefix from just 1 value label. begin program. valLab = 'v1: Neutral' print valLab[valLab.find(": ") + 2:] end program.

## Look up SPSS Dictionary Information

We can easily look up SPSS dictionary information with the Python `spss` module. Some examples are

• `spss.GetVariableName(ind)`
• `spss.GetVariableLabel(ind)`
• `spss.GetVariableType(ind)`
• `spss.GetVariableFormat(ind)`

where `ind` is the Python variable index (0 for the first variable, 1 for the second and so on). For value labels, however, we prefer using `VariableDict()` from the `spssaux` module. But let's first just find all variable names.

 ```1 2 3 4 5 6 7 8``` *Inspect variable information with spssaux.VariableDict(). begin program. import spssaux sDict = spssaux.VariableDict() for var in sDict:     print var,type(var) end program.

## Look up Value Labels

We'll now look up our value labels. For each variable, we'll get a Python dictionary holding each labeled value and its label. Don't confuse a Python dict object with the SPSS dictionary; these are totally unrelated.

 ```1 2 3 4 5 6 7 8 9``` *Retrieve value labels (Python dict objects). begin program. import spssaux sDict = spssaux.VariableDict() for var in sDict:     valLabs = var.ValueLabels     print valLabs end program.

## Loop Over Values and Labels

A Python dict holds key-value pairs of which the keys are unique within the dict. We'll loop over these pairs and look up the key and value by using `iteritems()` as shown below.

 ```1 2 3 4 5 6 7 8 9 10``` *For each variable, loop through values with labels. begin program. import spssaux sDict = spssaux.VariableDict() for var in sDict:     valLabs = var.ValueLabels     for key,val in valLabs.iteritems():         print key,val end program.

A major source of confusion here is that SPSS values are keys in our Python dict. The Python dict values hold SPSS value labels. The figure below illustrates this mapping.

## Create ADD VALUE LABELS Commands

So far, our syntax finds all variable names, values and value labels. Inserting these into ADD VALUE LABELS commands will set all value labels for the entire dataset. We'll create this syntax by concatenating these commands in a loop. Note that `\n` adds a line break after each line.

 ```1 2 3 4 5 6 7 8 9 10 11 12``` *Create basic SPSS syntax for adjusting all value labels. begin program. import spssaux spssSyntax = '' sDict = spssaux.VariableDict() for var in sDict:     valLabs = var.ValueLabels     for key,val in valLabs.iteritems():         spssSyntax += "ADD VALUE LABELS %s %s '%s'.\n"%(var,key,val) print spssSyntax end program.

## Create Syntax for Adjusting Value Labels

At this point we'll add the correction for each value label that we developed earlier. The resulting syntax is almost what we need. Bonus points if you detect a problem with it before reading on.

 ```1 2 3 4 5 6 7 8 9 10 11 12 13``` *Create SPSS syntax for adjusting value labels. begin program. import spssaux spssSyntax = '' sDict = spssaux.VariableDict() for var in sDict:     valLabs = var.ValueLabels     for key,val in valLabs.iteritems():         val = val[val.find(": ") + 2:]         spssSyntax += "ADD VALUE LABELS %s %s '%s'.\n"%(var,key,val) print spssSyntax end program.

## Result

If you're really good with SPSS, you'll see that some value labels contain a single quote. Since the labels are enclosed in single quotes too, they'll end the label prematurely. In Python, we'd escape them with `\'` but this is SPSS syntax so we need `''` instead.

## Final Syntax

We'll now replace all single quotes within value labels by two single quotes. Second, we'll check if the colon and space we're looking for are actually present in each label and if not, we'll skip it. Third, we'll now run our SPSS syntax with `spss.Submit` so we need to import the `spss` module as well as `spssaux`.

 ```1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17``` *Create and run final syntax. begin program. import spssaux,spss spssSyntax = '' sDict = spssaux.VariableDict() for var in sDict:     valLabs = var.ValueLabels     for key,val in valLabs.iteritems():         if(": ") in val:             val = val[val.find(": ") + 2:]         val = val.replace("'","''")         spssSyntax += "ADD VALUE LABELS %s %s '%s'.\n"%(var,key,val) spss.Submit(spssSyntax) end program. *We're all done after running this.

## Final Notes

Our final syntax does the job: when running FREQUENCIES or some other command, we'll have nice, clean value labels in our output. I'm sure our clients will appreciate it.

The syntax could be shorter but it's simple and readable. You can easily modify it for capitalizing value labels or removing unwanted characters from them. I hope this tutorial also shows how to develop SPSS Python syntax in small steps.

# Tell us what you think!

*Required field. Your comment will show up after approval from a moderator.

# THIS TUTORIAL HAS 4 COMMENTS:

• ### By Lea on April 22nd, 2014

Thanks for the example.

Is there any way to achieve the same thing with the regular SPSS syntax commands?
I'm writing a syntax that will be used on several computers that might not have the Python Essentials installed.
To avoid the hassle I'd like to use regular syntax only.

• ### By Ruben Geert van den Berg on April 23rd, 2014

Dear Lea,

For this particular case you can. However, it requires more syntax, allows for less control and is less "safe". An old trick is using syntax generating syntax. We'll use the OMS to create a new dataset containing the value labels and use string manipulations to create a huge string variable containing syntax (text). We'll save that variable as a text file with the ".sps" extension for SPSS Syntax. Finally, we'll run this generated syntax file on the original data. For an example with some comments, please see below. If you need any further assistance, please get back at us.

Kind regards,

Ruben

cd 'd:temp'.
get file 'supermarket.sav'.
dataset name supermarket.

*I'll use Python for ADDING prefixes to value labels but it can be done similarly to REMOVING them without Python as shown below.

begin program.
variables = 'v5 to v14' #Please specify variables whose value labels should be prefixed.
import spss,spssaux
sdict = spssaux.VariableDict()
for var in sdict.expand(variables):
vallabs = sdict[sdict.VariableIndex(var)].ValueLabels
for val,lab in vallabs.iteritems():
vallabs[val] = pref + lab
sdict[var].ValueLabels=vallabs
end program.

* OMS.
DATASET DECLARE vals.
OMS
/SELECT TABLES
/IF COMMANDS=['File Information'] SUBTYPES=['Variable Values']
/DESTINATION FORMAT=SAV NUMBERED=TableNumber_
OUTFILE='vals' VIEWER=NO
/TAG='vals'.

display dictionary.

omsend tag = [vals].

dataset activate vals. /*dataset containing value labels*/

string stripped_label(a1000). /*will probably be long enough but not guaranteed.
compute stripped_label = label.
compute separator = index(stripped_label,': ').
if separator > 0 stripped_label = char.substring(stripped_label,separator + 2). /*"+ 2" => 1 for the ":" and 1 for the space behind it.
compute stripped_label = replace(stripped_label,"'","''")./*double up ("escape") single quotes within labels since we'll use single quotes around labels.
exe.

string syntax(a2000)./*will probably be long enough but not guaranteed.
compute syntax = concatenate('add value labels ',rtrim(var1),' ',string(var2,f2),"'",rtrim(stripped_label),"'.").
exe./*Now you'll see a variable containing syntax in the dataset "vals".

write outfile 'value_label_syntax.sps' / syntax. /*Write SPSS syntax file containing written syntax.
exe.

dataset close vals. /*Close temp dataset.
dataset activate supermarket. /*Revert to actual data.

insert file = 'value_label_syntax.sps'. /*Run generated syntax file.

• ### By Lea on July 8th, 2015

Thank you for your quick answer and sorry for taking so long to say thank you!

After fiddling around a bit, I ended up using a spreadsheet to rename the variables - Excel, LibreOffice Calc and Google Spreadsheets all work equally well. This might also be a workaround for people who find SPSS syntax a little daunting:
I wrote all the different syntax elements in different cells and then used the concatenate function to combine them - the rest is copy/paste.

It is advisable to write only the first cell of any given element and make all other cells refer to the first one. That way, if someone asks if you could just add a prefix to a few hundred variables you only have to make one change, copy/paste the syntax and run it again!
I used it to rename several hundred variables + variable labels that were all just a little bit different. So far I have no regrets using a spreadsheet for renaming. It's very easy, very flexible, and very robust.

But thank you for your suggestion. When working on my own data, I'll be sure to try your way!

• ### By Ruben Geert van den Berg on July 8th, 2015

You're welcome! Anyway, we often used this "spreadsheet trick" too in the past. However, a big problem with any manual edits is that it precludes smoothly rerunning projects. Nowadays, more and more clients and supervisors demand that they can quickly rerun your project themselves and it doesn't look very professional to say "and here I copied and typed in ...".

So spreadsheets allow for quick and dirty fixes but we nevertheless discourage making it a habit. Also note that if your "formula" is so simple you can generate syntax with a spreadsheet, the proper Python solution isn't too far away - but does require quite some initial effort.

Best regards and good luck with the project!

Ruben