SPSS tutorials

BASICS REGRESSION T-TEST ANOVA CORRELATION

SPSS Create Dummy Variables Tool

Summary

Creating dummy variables for several categorical variables by basic syntax is usually not hard. However, applying proper variable labels to the newly created dummy variables requires quite a bit of effort. The tool presented in this tutorial will take care of this -and some other issues- more easily.

SPSS Dummy Variables Tool

SPSS Create Dummy Variables Tool SPSS Create Dummy Variables Tool

Instructions

Overview Result Dummy Variables Tool

Let's say we'd like to dummify “education_type” from employees.sav. First note that 5 value labels have been defined for this variable. This can be seen under variable view as shown below. SPSS Dummy Variables Tool - Value Labels Using the dummy variables tool results in 6 new dummy variables, the variable labels of which contain the values and corresponding value labels of the values they represent. This is shown in the screenshot below.

SPSS Dummy Variables Tool - Result SPSS Dummy Variables Tool - Result

Final Notes

First, note that the suffixes for the new variable names (for instance, “_d1” in our example) don't have any substantive meaning. That is, “_d1” says nothing about which value this variable represents (in fact, it represent a system missing value in our example). The actual meaning of the dummy variables is solely contained in their variable labels.

Second, a dummy variable will be created for each distinct value that's actually present in the original variable, regardless whether a value label has been defined for it. A value that does not occur in the original variable (but may have a value label nevertheless) does not need a dummy variable and is therefore skipped.

SPSS Python Syntax Example

Instead of using the Custom Dialog we just discussed, you may click here for an SPSS Python syntax version of this tool. It includes the creation of some nasty test data we used for testing the tool (containing labelled and unlabelled string variables, user missing values, system missing values and so on).

********00. CREATE NIGHTMARE TEST DATA FOR DUMMIFYING.

set seed 2.

data list free/v1(a10).
begin data
"don't know" rat bat dog cat '' 'don"t know'
end data.

string v2(a10).
compute v2 = v1.

do repeat @v = v3 to v4.
compute @v = rv.binom(3,.5).
end repeat.

if $casenum = 5 v3 = $sysmis.
missing values v3 v4 (2).
if $casenum = 4 v3 = 4.

value labels v2 'cat' 'CAT!' 'rat' 'RAT!'.
value labels v3 0 'No' 1 'Yes' 2 'Don''t know' 4 'Don"t know'.
value labels v4 0 'Bad' 1 'Good' 2 'Not applicable'.

execute.

********10. DEFINE FUNCTION.

begin program.
def dummify(varSpec,sep = '_d'):
    import spss,spssaux,spssdata
    varDict = spssaux.VariableDict()
    varList = varDict.expand(varSpec)
    varList.sort(key = lambda x: varDict.VariableIndex(x))
    for var in varList:
        type = varDict.VariableType(var) # 0 = numeric, else strlength
        if type == 0: #numeric variable, spssdata as floats but ValueLabels strings so convert to floats
            valList = sorted(set([val[0] for val in spssdata.Spssdata(var,convertUserMissing=False).fetchall()]))
            valLabs = dict((float(key),val) for key,val in varDict.ValueLabels(var).items())
        else:
            valList = sorted(set([val[0].strip() for val in spssdata.Spssdata(var,convertUserMissing=False).fetchall()]))
            valLabs = varDict.ValueLabels(var)
        for cnt,val in enumerate(valList):
            varLabExt = ' (""' + valLabs[val].replace('"','""') + '"")' if val in valLabs else ''
            if type > 0:
                val = val.strip().replace('"','""')
                spss.Submit('''recode %s ("%s" = 1)(else = 0) into %s%s%d.'''%(var,val,var,sep,cnt + 1))
                spss.Submit('''variable labels %s%s%d "Dummy variable indicating that %s = ""%s""%s.".'''\
                %(var,sep,cnt + 1,var,val,varLabExt))
            elif val == None:
                spss.Submit('''recode %s (sysmis = 1)(else = 0) into %s%s%d.'''%(var,var,sep,cnt + 1))
                spss.Submit('''variable labels %s%s%d "Dummy variable indicating that %s = (system missing).".'''%(var,sep,cnt + 1,var))
            else:
                spss.Submit('''recode %s (%f = 1)(else = 0) into %s%s%d.'''%(var,val,var,sep,cnt + 1))
                spss.Submit('''variable labels %s%s%d "Dummy variable indicating that %s = %s%s.".'''%(var,sep,cnt + 1,var,str(val),varLabExt)) #converting float to str suppresses excessive decimal places in varlab
            spss.Submit('''value labels %s%s%d 0 'False' 1 'True'.'''%(var,sep,cnt + 1))
    spss.Submit('execute.')
end program.

******20. TEST V1******.

*match files file */keep v1 to v4.
*execute.
output close all.

begin program.
dummify('v1')
end program.

******30. TEST V2******.

*match files file */keep v1 to v4.
execute.
output close all.

begin program.
dummify('v2')
end program.

******40. TEST V3******.

*match files file */keep v1 to v4.
execute.
output close all.

begin program.
dummify('v3',sep = '_dummy_')
end program.

******50. TEST V4******.

*match files file */keep v1 to v4.
execute.
output close all.

begin program.
dummify('v4')
end program.

Previous tutorial: Creating Dummy Variables in SPSS

Next tutorial: How to Mean Center Predictors in SPSS?

Let me know what you think!

*Required field. Your comment will show up after approval from a moderator.

This tutorial has 13 comments