A teacher held a course evaluation. The data in SPSS format are in course-evaluation.sav, part of which is shown below. Students from different majors participated and now the education director wants separate Excel files for each major. A super easy way for creating these is by using SPSS with Python. We'll walk you through.
1. Save SPSS Data as Excel File
We'll first just create syntax for saving all data as a single Excel file from SPSS’ menu. We'll navigate to
and fill it out as below.This results in the syntax below.
SAVE TRANSLATE OUTFILE='D:\data\course_evaluation.xlsx'
/TYPE=XLS
/VERSION=12
/FIELDNAMES
/CELLS=LABELS
/REPLACE.
2. Selecting Which Cases to Save
Our first syntax saves all cases -that is, all majors- as one Excel file. A neat way to save only a subset of cases -one major at the time- is using SELECT IF and preceding it with TEMPORARY. Like so, we could copy-paste-edit our syntax for each major. We'll have Python do just that for us. But first we'll convert our study majors into an SPSS string variable with the syntax below.
string smajor (a20).
compute smajor = valuelabel(major).
crosstabs major by smajor.
Result
3. Look Up Data Values with Spssdata
A Python SPSS module developed for looking up data values is spssdata
. Doing so is super easy as shown by the syntax below. The main pitfall here is using the correct casing, which is spssdata.Spssdata
.
begin program python3.
import spssdata
for case in spssdata.Spssdata('smajor'):
print(case)
end program.
Result
4. Cleaning Up Our Majors
As we see, Python returns a tuple for each case holding a student’s major as a Python string. We'll first extract this first (and only) element from each tuple by succeeding it with [0]
as shown below.
Note that each major is right padded with spaces up to a total length of 20. This is because its variable format
in SPSS is A20. In Python, we'll remove these spaces with strip()
.
begin program python3.
import spssdata
for case in spssdata.Spssdata('smajor'):
print(case[0].strip())
end program.
5. Unduplicating Our Majors
At this point, Python retrieves all majors in a good, clean way. But since we only need the create one Excel file for each distinct major, we need to unduplicate these. We'll first create an empty Python list object. We'll then add each major to this list only if it's not yet in there.
begin program python3.
import spssdata
majors = []
for case in spssdata.Spssdata('smajor'):
major = case[0].strip()
if major not in majors:
majors.append(major)
print(majors)
end program.
Result
6. Create and Inspect SPSS Syntax
Our Python list of distinct majors looks great! If you're very paranoid, run
FREQUENCIES SMAJOR.
in SPSS to confirm that these really are the majors in our data. Now that we have this in place, we won't recreate it but proceed with the second step.
The syntax below first creates an empty Python string. Next, we'll loop over distinct majors and concatenate some SPSS syntax to it that we created earlier from the menu. In this syntax, we added %s
which we use for a Python text replacement. The result is a substantial block of SPSS syntax which we'll first inspect.
begin program python3.
spssSyntax = ''
for major in majors:
spssSyntax += '''
TEMPORARY.
SELECT IF (smajor = '%s').
SAVE TRANSLATE OUTFILE='D:\data\course_evaluation_%s.xlsx'
/TYPE=XLS
/VERSION=12
/FIELDNAMES
/CELLS=LABELS
/REPLACE.
'''%(major,major.lower())
print(spssSyntax)
end program.
Result
7. Run Our Syntax
Since our syntax looks fine, we'll run it with spss.Submit
. In order to use this, we still need to import the spss module because we hadn't done so thus far.
begin program python3.
import spss
spss.Submit(spssSyntax)
end program.
Result
Final Notes
In short, Python can very easily look up data values with spssdata.Spssdata()
. Restrict it to a single variable by specifying its name in the parentheses. Leaving the parentheses empty looks up the data values of all variables.
Once we have our data values in Python, we can insert them into SPSS syntax or do a ton of other stuff with them.
Thanks for reading!
THIS TUTORIAL HAS 34 COMMENTS:
By Ruben Geert van den Berg on August 23rd, 2018
Hey man!
"So I've tried the OMS but this is not really doable, I've tried on a big dataset and it is really too long."
How is it not doable? I do such stuff for breakfast and it tends to be super fast and pretty easy. Is it the processing time?
What's the size of your data (cases * vars)? If you have tons of vars but (way) fewer cases, perhaps FLIP the dataset? If this succeeds, then I think the remainder of the process can be very fast.
In any case, you want to minimize the number of values you need to inspect so preferably
break
as soon as you find a missing -either in Python or LOOP. Also, you probably don't want to loop through all cases variable after variable. Going through all variables just once should be faster as explained in SPSS EXECUTE – What and Why?.I'm not sure how this generalizes to spssdata.Spssdata though.
By LoC on August 24th, 2018
Hey! My bad I tried with frequencies and not DESCRIPTIVES. Very fast with descriptives, but this is ignoring the strings. I put the MISSING VALUES ' ' on strings variables, and DESCRIPTIVES is not doing anything on them. Will have to think a little, but thank you very much for your help!
By Ruben Geert van den Berg on August 24th, 2018
DESCRIPTIVES will always ignore string variables. Besides, string variables don't have any missing values by default so they should never end up in your list of variables with missings anyway.
However, if you feel empty string values should be seen as "missing", try something like RECODE s1 s2 s3 ('' = copy)(else = 0). and then ALTER TYPE them into numeric variables. You don't always need to know all variable names for doing so, something like ALTER TYPE ALL (A = F1). may work here. I didn't test that but I use similar constructions quite often. Now, this'll obviously screw up your data but you can restore it by closing and reopening it after collecting all variables with "missings".
P.s. it would be nice to connect on LinkedIn or Facebook, I'm kinda curious who you are and what you're working on.
By Python Training in Bangalore on February 18th, 2019
Thanks a lot for sharing us about this update. Hope you will not get tired on making posts as informative as this.
By python Training in Chennai on April 30th, 2019
Just found your post by searching on the Google, I am Impressed and Learned Lot of new thing about how to create several excel files with python, from your post.