Dummy coding a variable means representing each of its values by a separate dichotomous variable. These so-called dummy variables contain only ones and zeroes (and sometimes missing values). The figure below shows how the variable “pet” from favorite_pets.sav has been dummy coded as pet_d1 through pet_d4.

Dummy Variables in SPSSNote that, for example, pet_d2 represents the value 2 on pet; all cases having 2 on pet have a 1 on pet_d2 and a 0 otherwise. The same logic goes for the other three dummy variables, representing values 1, 3 and 4.

## Dummy Variables - Why Use Them?

Dummy coding is mainly used for including nominal and ordinal variables in **linear regression** analysis.
Since such variables don't have a fixed unit of measurement, assuming a linear relation between them and an outcome variable doesn't make sense. However, **dichotomous variables are metric** by definition; since they have only two values, there is only a single interval.

There are various schemes for creating dummy variables. The one presented here is known as **indicator coding**.
Note that for each original variables, exactly one of its dummy variables should be **excluded** from regression analysis.
Cases having `1`

on this excluded dummy variable are referred to as the **reference group**.

A more in-depth theoretical discussion on dummy variables is beyond the scope of this tutorial but you'll find one in most standard texts on multivariate statistics.

## Creating Dummy Variables in SPSS

We recommend using our SPSS Create Dummy Variables Tool for creating dummy variables in SPSS. However, we'll now show how to do so manually as well; we'll create dummy variables for “pet” from favorite_pets.sav.

One reasonable option for doing so is using RECODE with DO REPEAT; each distinct value in pet is recoded to a 1 in a separate new variable. In this new variable, all other values of pet are recoded into zero. The syntax below shows how to do this and next screenshot should further clarify how it works.

## SPSS Create Dummy Variables Syntax Example

***1. Create dummy variables pet_d1 through pet_d4, representing values 1 through 4 in pet.**

do repeat #newvar = pet_d1 to pet_d4 / #petval = 1 to 4.

recode pet (#petval = 1)(else = 0) into #newvar.

end repeat print.

***2. Make result visible in data view.**

execute.

Adding PRINT to END REPEAT has SPSS print back the commands that result from expanding the DO REPEAT command. The screenshot below shows the result in the Output Viewer window (just ignore the first columns with line numbers).

Commands Generated by DO REPEAT in Output Window## Labelling the Dummy Variables

In principle, we're done now. However, it's usually good practice to label any new variables.

First, this will make our **output more readable**. For instance, a variable label such as “Dummy variable for pet = 1 (“Dog”).” conveys much more meaning than simple “pet_d1”. Second, if we (or somebody else) return to our project after some time, there's a good chance we're not quite sure what “pet_d1” actually represents if we don't explain its meaning in a variable label.

Unfortunately, SPSS doesn't offer an efficient way for applying proper variable labels here. The fastest way is copy-pasting a single VARIABLE LABELS command. Next, adjust the second through last commands as needed (see syntax below).

Although the meaning of the values (0 and 1) is reasonably obvious, we could apply basic value labels to them as well. We'll do so for all variables in one go as shown in step 2 below.

***1. Apply variable labels.**

variable labels pet_d1 "Dummy variable for pet = 1 (Dog)".

variable labels pet_d2 "Dummy variable for pet = 2 (Cat)".

*And so on...

***2. Apply value labels.**

value labels pet_d1 to pet_d4 0 'False' 1 'True'.

## Labelling Dummy Variables - Result

Variable Labels Applied to Dummy Variables## Creating Dummy Variables - Possible Complications

So far, creating dummy variables for a single nominal variable hasn't been too much of a hassle, except for applying variable labels perhaps. However, the scenario we covered was the simplest you'll encounter in practice. For instance, if a variables has other values than 1, 2, ..., n, it may not be possible to simplify things with DO REPEAT like we did.

Second, when **many variables** have to be dummified, this will result in quite a lot of syntax, the writing of which may require some effort and may involve some typos or other imperfections.

Fortunately, the process of **creating dummy variables can be automated** but basic syntax or a macro won't do the job because access is needed to the value labels of the original variables. As often, the only reasonable way to go here is Python. Because the required syntax is rather advanced, we wrapped it up in an SPSS custom dialog. This super easy tool is explained by and can be freely downloaded from SPSS Create Dummy Variables Tool. We hope you'll like it!

## This tutorial has 24 comments

## By Ruben Geert van den Berg on July 5th, 2016

Hi! I'm currently on holiday so I can be of limited help.

However, it seems as if you're running CATREG or something, not the usual linear regression analysis. "Values less than or equal to zero" are obviously created by standard dummification and are fine for linear regression (REGRESSION command in SPSS). For other regression procedures such as CATREG, such values are seen as missing, which may be the cause of your problem.

## By AB on July 4th, 2016

This tutorial was fantastic and got me farther than I ever would have otherwise, but I'm having an issue that I am hoping someone might be able to help me with.

I dummy coded and have 0's and 1's in my columns but I keep getting the following message-

The number of valid active objects is less than 3. This may be due to treating missing data as listwise, or to the specification of supplementary objects, or to weighting objects with zero weights.

Execution of this command stops.

The following variables have values less than or equal to zero, which are considered as missing in this procedure:

After the last sentence, it then lists every variable I input. The output also says that every single one of my 1110 cases is an active case with a missing value. Any thoughts on why? Any help would be much appreciated. Thanks!

## By helal islam on March 28th, 2016

very nice. i love it.

## By Romlus on August 19th, 2015

Very well captured.

## By Ruben Geert van den Berg on August 6th, 2015

Thanks for your comment. This tutorial is restricted to the data editing part (hence, "creating" rather than "using" dummy variables). We'd love to discuss the main multivariate analyses - including regression with dummy variables and interaction effects - but our main focus right now is improving and expanding our beginners' tutorials. Honestly, regression tutorials are probably not coming up any time soon.