SPSS Tutorials

BASICS REGRESSION T-TEST CHI-SQUARE TEST ANOVA

Creating Dummy Variables in SPSS

Dummy coding a variable means representing each of its values by a separate dichotomous variable. These so-called dummy variables contain only ones and zeroes (and sometimes missing values). The figure below shows how the variable “pet” from favorite_pets.sav has been dummy coded as pet_d1 through pet_d4.

SPSS Dummy Variables Example Dummy Variables in SPSS

Note that, for example, pet_d2 represents the value 2 on pet; all cases having 2 on pet have a 1 on pet_d2 and a 0 otherwise. The same logic goes for the other three dummy variables, representing values 1, 3 and 4.

Dummy Variables - Why Use Them?

Dummy coding is mainly used for including nominal and ordinal variables in linear regression analysis. Since such variables don't have a fixed unit of measurement, assuming a linear relation between them and an outcome variable doesn't make sense. However, dichotomous variables are metric by definition; since they have only two values, there is only a single interval.
There are various schemes for creating dummy variables. The one presented here is known as indicator coding. Note that for each original variables, exactly one of its dummy variables should be excluded from regression analysis. Cases having 1 on this excluded dummy variable are referred to as the reference group.
A more in-depth theoretical discussion on dummy variables is beyond the scope of this tutorial but you'll find one in most standard texts on multivariate statistics.

Creating Dummy Variables in SPSS

We recommend using our SPSS Create Dummy Variables Tool for creating dummy variables in SPSS. However, we'll now show how to do so manually as well; we'll create dummy variables for “pet” from favorite_pets.sav.
One reasonable option for doing so is using RECODE with DO REPEAT; each distinct value in pet is recoded to a 1 in a separate new variable. In this new variable, all other values of pet are recoded into zero. The syntax below shows how to do this and next screenshot should further clarify how it works.

SPSS Create Dummy Variables Syntax Example

*1. Create dummy variables pet_d1 through pet_d4, representing values 1 through 4 in pet.

do repeat #newvar = pet_d1 to pet_d4 / #petval = 1 to 4.
recode pet (#petval = 1)(else = 0) into #newvar.
end repeat print.

*2. Make result visible in data view.

execute.

Adding PRINT to END REPEAT has SPSS print back the commands that result from expanding the DO REPEAT command. The screenshot below shows the result in the Output Viewer window (just ignore the first columns with line numbers).

SPSS Dummy Variables Example Commands Generated by DO REPEAT in Output Window

Labelling the Dummy Variables

In principle, we're done now. However, it's usually good practice to label any new variables.
First, this will make our output more readable. For instance, a variable label such as “Dummy variable for pet = 1 (“Dog”).” conveys much more meaning than simple “pet_d1”. Second, if we (or somebody else) return to our project after some time, there's a good chance we're not quite sure what “pet_d1” actually represents if we don't explain its meaning in a variable label.
Unfortunately, SPSS doesn't offer an efficient way for applying proper variable labels here. The fastest way is copy-pasting a single VARIABLE LABELS command. Next, adjust the second through last commands as needed (see syntax below).
Although the meaning of the values (0 and 1) is reasonably obvious, we could apply basic value labels to them as well. We'll do so for all variables in one go as shown in step 2 below.

*1. Apply variable labels.

variable labels pet_d1 "Dummy variable for pet = 1 (Dog)".
variable labels pet_d2 "Dummy variable for pet = 2 (Cat)".
*And so on...

*2. Apply value labels.

value labels pet_d1 to pet_d4 0 'False' 1 'True'.

Labelling Dummy Variables - Result

SPSS Dummy Variables with Variable Labels Variable Labels Applied to Dummy Variables

Creating Dummy Variables - Possible Complications

So far, creating dummy variables for a single nominal variable hasn't been too much of a hassle, except for applying variable labels perhaps. However, the scenario we covered was the simplest you'll encounter in practice. For instance, if a variables has other values than 1, 2, ..., n, it may not be possible to simplify things with DO REPEAT like we did.
Second, when many variables have to be dummified, this will result in quite a lot of syntax, the writing of which may require some effort and may involve some typos or other imperfections.
Fortunately, the process of creating dummy variables can be automated but basic syntax or a macro won't do the job because access is needed to the value labels of the original variables. As often, the only reasonable way to go here is Python. Because the required syntax is rather advanced, we wrapped it up in an SPSS custom dialog. This super easy tool is explained by and can be freely downloaded from SPSS Create Dummy Variables Tool. We hope you'll like it!

Previous tutorial: SPSS Stepwise Regression – Example 2

Next tutorial: SPSS Create Dummy Variables Tool

Let me know what you think!

*Required field. Your comment will show up after approval from a moderator.

This tutorial has 23 comments