SPSS tutorials website header logo SPSS TUTORIALS FULL COURSE BASICS ANOVA REGRESSION FACTOR

SPSS – Cloning Variables with Python

In this lesson, we'll develop our own SPSS Python module. As we're about to see, this is easier and more efficient than you might think. We'll use hotel-evaluation.sav, part of which is shown below.

SPSS Python Hotel Evaluation Data View 720

Cloning Variables

Whenever using RECODE, I prefer recoding into the same variable. So how can I compare the new values with the old ones? Well, I'll first make a copy of some variable and then recode the original. A problem here is that the copy does not have any dictionary information.
We're going to solve that by cloning variables with all their dictionary properties. The final tool (available at SPSS Clone Variables Tool) is among my favorites. This lesson will deal with its syntax.

The RECODE Problem

Let's first demonstrate the problem with q2. We'll make a copy with the syntax below and compare it to the original.


*Show variable names, values and labels in output.

set tnumbers both tvars both.

*Set 6 ("no answer") as user missing value.

missing values q2 (6).

*Copy q2 into ori_q2.

recode q2 (else = copy) into ori_q2.

*Inspect result.

crosstabs q2 by ori_q2.

Result

SPSS Crosstabs No Dictionary Information

APPLY DICTIONARY

We could manually set all labels/missing values/format and so on for our new variable. However, an SPSS command that does everything in one go is APPLY DICTIONARY. We'll demonstrate it below and rerun our table.

*Apply all dictionary properties from q2 to ori_q2.

apply dictionary from *
/source variables = q2
/target variables = ori_q2.

*Now we have a true clone as we can verify by running...

crosstabs q2 by ori_q2.

*Delete new variable for now, we need something better.

delete variables ori_q2.

Result

SPSS Crosstabs Dictionary Information

Create Clone Module

APPLY DICTIONARY as we used it here takes only one variable at a time. Therefore, cloning several variables is still cumbersome -at least, for now. We'll speed things up by creating a module in Notepad++. We'll first just open it and set its language to Python as shown below.

Notepad Pp Set Language To Python

We now add the following code to our module in Notepad.

def clone(varSpec,prefix):
    import spssaux
    sDict = spssaux.VariableDict(caseless = True)
    varList = sDict.expand(varSpec)
    print(varList)

This defines a Python function which we call clone. It doesn't work yet but we're going to fix that step by step. For one thing, our function needs to know which variables to clone. We'll tell it by passing an argument that we call varSpec (short for “variable specification”).
Now which names should be used for our clones? A simple option is some prefix and the original variable name. We'll pass our prefix as a second argument to our function.

Move clone.py to Site-Packages Folder

Now we save this file as clone.py in some easy location (for me, that's Windows’ desktop) and then we'll move it into C:\Program Files\IBM\SPSS\Statistics\24\Python\Lib\site-packages or wherever the site-packages folder is located. We may get a Windows warning as shown below. Just click “Continue” here.

Windows Destination Folder Access Denied

Import Module and Run Function

We now turn back to SPSS Syntax Editor. We'll import our module and run our function as shown below. Note that it specifies all variables with SPSS’ ALL keyword and it uses ori_ (short for “original”) as a prefix.

*After creating C:\Program Files\IBM\SPSS Statistics\Python3\Lib\site-packages\clone.py, we'll import it.

begin program python3.
import clone
clone.clone(varSpec = 'all',prefix = 'ori_')
end program.

Result

SPSS Python Output Window Variable Names List Clone

Create New Variable Names

We'll now reopen clone.py in Notepad++ and develop it step by step. After each step, we'll save it in Notepad++ and then import and run it in SPSS. Let's first create our new variable names by concatenating the prefix to the old names and print the result.

New Contents Clone.py (Notepad++)

def clone(varSpec,prefix):
    import spssaux
    sDict = spssaux.VariableDict(caseless = True)
    varList = sDict.expand(varSpec)
    for var in varList:
        newVar = prefix + var # concatenation
        print (var, newVar)

Because we already imported clone.py, Python will ignore any subsequent import requests. However, we do need to “reimport” our module because we made changes to it after our first import. We'll therefore reload it with the syntax below.

*reload module after each edit from now on.

begin program python3.
import clone,importlib
importlib.reload(clone)
clone.clone(varSpec = 'all',prefix = 'ori_')
end program.

Result

SPSS Python Old New Variable Names In Output Clone

Add RECODE and APPLY DICTIONARY to Function

We'll now have our function create an empty Python string called spssSyntax. We'll concatenate our SPSS syntax to it while looping over our variables.
The syntax we'll add to it is basically just the RECODE and APPLY DICTIONARY commands that we used earlier. We'll replace all instances of the old variable name by %(var)s. %(newVar)s is our placeholder for our new variable name. This is explained in SPSS Python Text Replacement Tutorial.

New Contents Clone.py (Notepad++)

def clone(varSpec,prefix):
    import spssaux
    spssSyntax = '' # empty string for concatenating to
    sDict = spssaux.VariableDict(caseless = True)
    varList = sDict.expand(varSpec)
    for var in varList:
        newVar = prefix + var
        # three quotes below because line breaks in string
        spssSyntax += '''
RECODE %(var)s (ELSE = COPY) INTO %(newVar)s.

APPLY DICTIONARY FROM *
/SOURCE VARIABLES = %(var)s
/TARGET VARIABLES = %(newVar)s.

'''%locals()
    print(spssSyntax)

Just as previously, we reload our module and run our function with the syntax below.

*reload module after each edit from now on.

begin program python3.
import clone,importlib
importlib.reload(clone)
clone.clone(varSpec = 'all',prefix = 'ori_')
end program.

Result

SPSS Python Apply Dictionary Commands In Output Window

Check for String Variables

Our syntax looks great but there's one problem: our RECODE will crash on string variables. We first need to declare those with something like STRING ori_fname (A18). We can detect string variables and their lengths with sDict[var].VariableType This returns the string length for string variables and 0 for numeric variables. Let's try that.

New Contents Clone.py (Notepad++)

def clone(varSpec,prefix):
    import spssaux
    spssSyntax = ''
    sDict = spssaux.VariableDict(caseless = True)
    varList = sDict.expand(varSpec)
    for var in varList:
        newVar = prefix + var
        varTyp = sDict[var].VariableType # 0 = numeric, > 0 = string length
        print(var,varTyp)
        spssSyntax += '''
RECODE %(var)s (ELSE = COPY) INTO %(newVar)s.

APPLY DICTIONARY FROM *
/SOURCE VARIABLES = %(var)s
/TARGET VARIABLES = %(newVar)s.
'''%locals()
    print(spssSyntax)

(Reload and run in SPSS as previously.)

Result

SPSS Python Variable Types In Output Window Clone

Declare Strings Before RECODE

For each string variable we specified, we'll now add the appropriate STRING command to our syntax.

def clone(varSpec,prefix):
    import spssaux
    spssSyntax = ''
    sDict = spssaux.VariableDict(caseless = True)
    varList = sDict.expand(varSpec)
    for var in varList:
        newVar = prefix + var
        varTyp = sDict[var].VariableType # 0 = numeric, > 0 = string length
        if varTyp > 0: # need to declare new string variable in SPSS
            spssSyntax += 'STRING %(newVar)s (A%(varTyp)s).'%locals()
        spssSyntax += '''
RECODE %(var)s (ELSE = COPY) INTO %(newVar)s.

APPLY DICTIONARY FROM *
/SOURCE VARIABLES = %(var)s
/TARGET VARIABLES = %(newVar)s.

'''%locals()
    print(spssSyntax)

Run All SPSS Syntax

At this point, we can use our clone function: we'll comment-out our print statement and replace it by spss.Submit for running our syntax. As we'll see, Python now creates perfect clones of all variables we specified.

def clone(varSpec,prefix):
    import spssaux,spss # spss module needed for submitting syntax
    spssSyntax = ''
    sDict = spssaux.VariableDict(caseless = True)
    varList = sDict.expand(varSpec)
    for var in varList:
        newVar = prefix + var
        varTyp = sDict[var].VariableType
        if varTyp > 0:
            spssSyntax += 'STRING %(newVar)s (A%(varTyp)s).'%locals()
        spssSyntax += '''
RECODE %(var)s (ELSE = COPY) INTO %(newVar)s.

APPLY DICTIONARY FROM *
/SOURCE VARIABLES = %(var)s
/TARGET VARIABLES = %(newVar)s.

'''%locals()
    spssSyntax += "EXECUTE." # execute RECODE (transformation) commands
    #print(spssSyntax) # comment out, uncomment if anything goes wrong
    spss.Submit(spssSyntax) # have Python run spssSyntax

Final Notes

Our clone function works fine but there's still one more thing we could add: a check if the new variables don't exist yet. Since today's lesson may be somewhat challenging already, we'll leave this as an exercise to the reader.

Tell us what you think!

*Required field. Your comment will show up after approval from a moderator.