# SPSS Tutorials

BASICS REGRESSION T-TEST ANOVA CORRELATION

# Split String Variable into Components

## Question

"I have a long string variable in my data that actually holds the answers to several questions. These are separated by a semicolon (";"). How can I split this variable into the original answers?"

## SPSS Python Syntax Example

Note that the first two blocks of SPSS syntax have to be run unaltered just once. The actual splitting of string variables will then need just a single line of syntax as demonstrated in the last program block.
*1. Create Test Data.

begin program.
import random,spss
random.seed(1)
data = ''
for case in range(10):
val = '"'
for novars in range(random.randrange(12)):
for vallen in range(random.randrange(8)):
val += chr(random.randrange(97,123))
val += ';'
val += '"'
data += val + '\n'
spss.Submit('''data list list/s1(a%s).\nbegin data\n\n%s.'''%(max(len(s) for s in data.split('"')),data))
end program.

*2. Define the function.

begin program.
def stringsplitter(varNam,sep):
import spss,spssaux
varInd = spssaux.VariableDict().VariableIndex(varNam)
stringLengths = []
curs_1 = spss.Cursor(accessType='r')
for case in range(curs_1.GetCaseCount()):
for cnt,val in enumerate(curs_1.fetchone()[varInd].split(sep)):
if not len(stringLengths)>cnt:
stringLengths.append(len(val.strip())) #strip() because SPSS right padding causes excessive lengths otherwise.
elif len(val.strip())>stringLengths[cnt]:
stringLengths[cnt] = len(val.strip())
curs_1.close()
curs_2 = spss.Cursor(accessType='w')
curs_2.SetVarNameAndType([varNam + '_s' + str(cnt + 1) for cnt in range(len(stringLengths))],[1 if leng==0 else leng for leng in stringLengths])
curs_2.CommitDictionary()
for case in range(curs_2.GetCaseCount()):
for cnt,val in enumerate(curs_2.fetchone()[varInd].split(sep)):
curs_2.SetValueChar(varNam + '_s' + str(cnt + 1),val.strip())
curs_2.CommitCase()
curs_2.close()
end program.

*3. Apply the function.

begin program.
stringsplitter('s1',';') #Please specify string variable and separator.
end program.

## Description

• Note that this syntax uses Python. You need to have the SPSS Python Essentials installed for using it.
• The first program block will create a test data set containing a single (long) string variable. If you already have your actual data open in SPSS, you may skip it.
• The second program block defines the function that will split up a string variable into components, given some separator. After running this block just once, the function can be used as many times as necessary until the end of your session. This definition is something you'd typically place in a module.
• After the stringsplitter has been defined, only one short line of code is needed to actually use the function. This is demonstrated in the third program block. Note that the name of the input variable comes first, followed by the separator and both are quoted.
• The new variable names are the original variable names, suffixed by "_sn, where n refers to the nth component of the string.

## Assumptions

• It is assumed that the to be created variables do not yet exist in the data to which you apply the function. If so, you may first rename them or modify the default prefix ("_s").
• It is assumed that every occurrence of the separator is meaningful. So if the separator is ";" and a string value ";no;;yes;yes;" occurs, it will be split into 6 new variables holding the values (missing),"no",(missing),"yes","yes",(missing). If this is not to your liking, an easy solution may be to apply basic SPSS string functions (most likely RTRIM, LTRIM and REPLACE) to your string before using the splitter.
• Elaborating on the previous point, if a new variable is empty for all cases, it will be an empty string variable with length 1 (ideally it would have length 0 but this is not allowed in SPSS). It is again presumed that the empty values are present in the string for a reason.

# Let me know what you think!

*Required field. Your comment will show up after approval from a moderator.

# This tutorial has 4 comments

• ### By Ruben Geert van den Berg on March 5th, 2015

@Natalka: could you share (a sample of) your data with me by email? That would be very helpful in troubleshooting and -if necessary- improving the script.

• ### By Natalka on March 4th, 2015

Hi Ruben, you have a great website! I am trying to use your prog for splitting the string var. My variable is called 'Name', and my separator is '(', and my variable is already existing in the data file. I was trying to simply replace your 's1' in *3 by my 'Name', and your ';' by my ')', but it's obviously not enough. What else do I have to do: feed my symbols into *2, or define my variable differently? thanks, N

• ### By Ruben Geert van den Berg on January 5th, 2015

Dear Martin,

Thank you. The 'two mistakes' refer to replacing "sep" by ";", right? I'll correct it in the original syntax.

Best,

Ruben

• ### By Martin on January 4th, 2015

Dear Ruben:

Thanks for the code at https://www.spss-tutorials.com/split-string-variable-into-components/. I found two errors and corrected them:

def stringsplitter(variable,sep):
import spss,spssaux
lens = []
curs = spss.Cursor([spssaux.VariableDict().VariableIndex(variable)],accessType='w')
for case in range(curs.GetCaseCount()):
for cnt,val in enumerate(curs.fetchone()[0].split(sep)):
print cnt, val
if not len(lens)>cnt:
lens.append(len(val.strip()))
elif len(val.strip())>lens[cnt]:
lens[cnt] = len(val.strip())
curs.close()
curs = spss.Cursor(accessType='w')
curs.SetVarNameAndType([variable + '_s' + str(cnt + 1) for cnt in range(len(lens))],[1 if leng==0 else leng for leng in lens])
curs.CommitDictionary()
for case in range(curs.GetCaseCount()):
for cnt,val in enumerate(curs.fetchone()[spssaux.VariableDict().VariableIndex(variable)].split(sep)):
curs.SetValueChar(variable+'_s'+str(cnt + 1),str.lstrip(val))
curs.CommitCase()
curs.close()

Best regards,

Martin