SPSS Tutorials


Split String Variable into Components


"I have a long string variable in my data that actually holds the answers to several questions. These are separated by a semicolon (";"). How can I split this variable into the original answers?"

SPSS Python Syntax Example

Note that the first two blocks of SPSS syntax have to be run unaltered just once. The actual splitting of string variables will then need just a single line of syntax as demonstrated in the last program block.
*1. Create Test Data.

begin program.
import random,spss
data = ''
for case in range(10):
    val = '"'
    for novars in range(random.randrange(12)):
        for vallen in range(random.randrange(8)):
            val += chr(random.randrange(97,123))
        val += ';'
    val += '"'
    data += val + '\n'
spss.Submit('''data list list/s1(a%s).\nbegin data\n\n%s.'''%(max(len(s) for s in data.split('"')),data))
end program.

*2. Define the function.

begin program.
def stringsplitter(varNam,sep):
    import spss,spssaux
    varInd = spssaux.VariableDict().VariableIndex(varNam)
    stringLengths = []
    curs_1 = spss.Cursor(accessType='r')
    for case in range(curs_1.GetCaseCount()):
        for cnt,val in enumerate(curs_1.fetchone()[varInd].split(sep)):
            if not len(stringLengths)>cnt:
                stringLengths.append(len(val.strip())) #strip() because SPSS right padding causes excessive lengths otherwise.
            elif len(val.strip())>stringLengths[cnt]:
                stringLengths[cnt] = len(val.strip())
    curs_2 = spss.Cursor(accessType='w')
    curs_2.SetVarNameAndType([varNam + '_s' + str(cnt + 1) for cnt in range(len(stringLengths))],[1 if leng==0 else leng for leng in stringLengths])
    for case in range(curs_2.GetCaseCount()):
        for cnt,val in enumerate(curs_2.fetchone()[varInd].split(sep)):
            curs_2.SetValueChar(varNam + '_s' + str(cnt + 1),val.strip())
end program.

*3. Apply the function.

begin program.
stringsplitter('s1',';') #Please specify string variable and separator.
end program.



Previous tutorial: Find Within Subjects Favorite over Several Variables

Next tutorial: Remove Value Label from Multiple Variables

Let me know what you think!

*Required field. Your comment will show up after approval from a moderator.

This tutorial has 4 comments