# SPSS Tutorials

BASICS REGRESSION T-TEST CHI-SQUARE TEST ANOVA

# Creating Dummy Variables in SPSS

Dummy coding a variable means representing each of its values by a separate dichotomous variable. These so-called dummy variables contain only ones and zeroes (and sometimes missing values). The figure below shows how the variable “pet” from favorite_pets.sav has been dummy coded as pet_d1 through pet_d4.

Dummy Variables in SPSS

Note that, for example, pet_d2 represents the value 2 on pet; all cases having 2 on pet have a 1 on pet_d2 and a 0 otherwise. The same logic goes for the other three dummy variables, representing values 1, 3 and 4.

## Dummy Variables - Why Use Them?

Dummy coding is mainly used for including nominal and ordinal variables in linear regression analysis. Since such variables don't have a fixed unit of measurement, assuming a linear relation between them and an outcome variable doesn't make sense. However, dichotomous variables are metric by definition; since they have only two values, there is only a single interval.
There are various schemes for creating dummy variables. The one presented here is known as indicator coding. Note that for each original variables, exactly one of its dummy variables should be excluded from regression analysis. Cases having `1` on this excluded dummy variable are referred to as the reference group.
A more in-depth theoretical discussion on dummy variables is beyond the scope of this tutorial but you'll find one in most standard texts on multivariate statistics.

## Creating Dummy Variables in SPSS

We recommend using our SPSS Create Dummy Variables Tool for creating dummy variables in SPSS. However, we'll now show how to do so manually as well; we'll create dummy variables for “pet” from favorite_pets.sav.
One reasonable option for doing so is using RECODE with DO REPEAT; each distinct value in pet is recoded to a 1 in a separate new variable. In this new variable, all other values of pet are recoded into zero. The syntax below shows how to do this and next screenshot should further clarify how it works.

## SPSS Create Dummy Variables Syntax Example

*1. Create dummy variables pet_d1 through pet_d4, representing values 1 through 4 in pet.

do repeat #newvar = pet_d1 to pet_d4 / #petval = 1 to 4.
recode pet (#petval = 1)(else = 0) into #newvar.
end repeat print.

*2. Make result visible in data view.

execute.

Adding PRINT to END REPEAT has SPSS print back the commands that result from expanding the DO REPEAT command. The screenshot below shows the result in the Output Viewer window (just ignore the first columns with line numbers).

Commands Generated by DO REPEAT in Output Window

## Labelling the Dummy Variables

In principle, we're done now. However, it's usually good practice to label any new variables.
First, this will make our output more readable. For instance, a variable label such as “Dummy variable for pet = 1 (“Dog”).” conveys much more meaning than simple “pet_d1”. Second, if we (or somebody else) return to our project after some time, there's a good chance we're not quite sure what “pet_d1” actually represents if we don't explain its meaning in a variable label.
Unfortunately, SPSS doesn't offer an efficient way for applying proper variable labels here. The fastest way is copy-pasting a single VARIABLE LABELS command. Next, adjust the second through last commands as needed (see syntax below).
Although the meaning of the values (0 and 1) is reasonably obvious, we could apply basic value labels to them as well. We'll do so for all variables in one go as shown in step 2 below.

*1. Apply variable labels.

variable labels pet_d1 "Dummy variable for pet = 1 (Dog)".
variable labels pet_d2 "Dummy variable for pet = 2 (Cat)".
*And so on...

*2. Apply value labels.

value labels pet_d1 to pet_d4 0 'False' 1 'True'.

## Labelling Dummy Variables - Result

Variable Labels Applied to Dummy Variables

## Creating Dummy Variables - Possible Complications

So far, creating dummy variables for a single nominal variable hasn't been too much of a hassle, except for applying variable labels perhaps. However, the scenario we covered was the simplest you'll encounter in practice. For instance, if a variables has other values than 1, 2, ..., n, it may not be possible to simplify things with DO REPEAT like we did.
Second, when many variables have to be dummified, this will result in quite a lot of syntax, the writing of which may require some effort and may involve some typos or other imperfections.
Fortunately, the process of creating dummy variables can be automated but basic syntax or a macro won't do the job because access is needed to the value labels of the original variables. As often, the only reasonable way to go here is Python. Because the required syntax is rather advanced, we wrapped it up in an SPSS custom dialog. This super easy tool is explained by and can be freely downloaded from SPSS Create Dummy Variables Tool. We hope you'll like it!

# Let me know what you think!

*Required field. Your comment will show up after approval from a moderator.

# This tutorial has 23 comments

• ### By V on November 11th, 2013

... supermarket.sav was the only dataset.

GET
FILE='D:\supermarket.sav'.
DATASET NAME DataSet1 WINDOW=FRONT.

ERROR : (1) No error.

Traceback (most recent call last):
File "D:\my_spss.py", line 23, in
dummify('v1 to v5') #Please specify variables that should be dummy coded.
File "D:\my_spss.py", line 4, in dummify
for var in vdict.expand(variables):
File "C:\Program Files\IBM\SPSS\Statistics\22\Python\lib\site-packages\spssaux\spssaux.py", line 1179, in expand
raise ValueError, _msg19 + v
ValueError: Invalid variable or TO usage: v1

And the same error is when I have no active database... Then I renamed the vars (in script and in database), but the error appeared once again. I use Windows 8 Pro, SPSS 22...

Also i tested the following script from Python Reference Guide for IBM SPSS:

import spss
string1="DESCRIPTIVES VARIABLES="
N=spss.GetVariableCount()
scaleVarList=[]
for i in xrange(N):
if spss.GetVariableMeasurementLevel(i)=='scale':
scaleVarList.append(spss.GetVariableName(i))
string2="."
spss.Submit([string1, ' '.join(scaleVarList), string2])

and get error:

>Error # 105. Command name: DESCRIPTIVES
>This command is not valid before a working file has been defined.
>Execution of this command stops.
Traceback (most recent call last):
File "D:\my_spss.py", line 9, in
spss.Submit([string1, ' '.join(scaleVarList), string2])
File "C:\Program Files\IBM\SPSS\Statistics\22\Python\lib\site-packages\spss\spss.py", line 1527, in Submit
raise SpssError,error
spss.errMsg.SpssError: [errLevel 3] Serious error.

that means also that my database is not open...

• ### By admin on November 9th, 2013

Did you have more than one SPSS Dataset open when you ran your syntax? A likely cause for this error is that a different Dataset (without `v1` followed at some point by `v5`) was the Active Dataset (on which the syntax is run).

Please try again with supermarket.sav being the only open Dataset and let me know whether that works. On my system that (still) runs fine.

By the way, two brief tutorials on SPSS Datasets are coming up in a couple of days - perhaps even tomorrow.

• ### By V on November 8th, 2013

It is strange, because I work with your dataset (supermarket.sav). Can it be caused by existing of two others version of Python?)) Also I deleted the "begin program." and "end program." because they aren't python syntax...

• ### By admin on November 5th, 2013

There's two likely options for this error.

1) there's no variable "v1" in the active dataset. Note that - in contrast to SPSS - variable names are case sensitive in Python. You can't reference "V1" as "v1" in the tool since it has Python under the hood.

2) "v1 to v5" will only run if there is a variable v1 as wel as a variable v5 and v1 comes before v5 in the active dataset.

Please check whether any of these is the case.

• ### By V on November 5th, 2013

Get ERROR : (1) No error.

Traceback (most recent call last):
File "D:\my_spss.py", line 23, in
dummify('v1 to v5') #Please specify variables that should be dummy coded.
File "D:\my_spss.py", line 4, in dummify
for var in vdict.expand(variables):
File "C:\Program Files\IBM\SPSS\Statistics\22\Python\lib\site-packages\spssaux\spssaux.py", line 1179, in expand
raise ValueError, _msg19 + v
ValueError: Invalid variable or TO usage: v1