SPSS tutorials website header logo SPSS TUTORIALS BASICS ANOVA REGRESSION FACTOR CORRELATION

SPSS – Batch Process Files with Python

Running syntax over several SPSS data files in one go is fairly easy. If we use SPSS with Python we don't even have to type in the file names. The Python os (for operating system) module will do it for us.
Try it for yourself by downloading spssfiles.zip. Unzip these files into d:\spssfiles as shown below and you're good to go.

SPSS Data And Syntax Files In Folder

Find All Files and Folders in Root Directory

The syntax below creates a Python list of files and folders in rDir, our root directory. Prefixing it with an r as in r'D:\spssfiles' ensures that the backslash doesn't do anything weird.

*Find all files and folders in root directory.

begin program.
import os
rDir = r'D:\spssfiles'
print os.listdir(rDir)
end program.

Result

Python List Of All Files In Folder

Filter Out All .Sav Files

As we see, os.listdir() creates a list of all files and folders in rDir but we only want SPSS data files. For filtering them out, we first create and empty list with savs = []. Next, we'll add each file to this list if it endswith(".sav").

*Add all .sav (SPSS data) files to Python list.

begin program.
import os
rDir = r'D:\spssfiles'
savs = []
for fil in os.listdir(rDir):
    if fil.endswith(".sav"):
        savs.append(fil)
print savs
end program.

Using Full Paths for SPSS Files

For doing anything whatsoever with our data files, we probably want to open them. For doing so, SPSS needs to know in which folder they are located. We could simply set a default directory in SPSS with CD as in CD "d:\spssfiles". However, having Python create full paths to our files with os.path.join() is a more fool proof approach for this.

*Create full paths to all .sav files.

begin program.
import os
rDir = r'D:\spssfiles'
savs = []
for fil in os.listdir(rDir):
    if fil.endswith(".sav"):
        savs.append(os.path.join(rDir,fil))
for sav in savs:
    print sav
end program.

Result

SPSS Full Paths To Sav Files In Output

Have SPSS Open Each Data File

Generally, we open a data file in SPSS with something like GET FILE "d:\spssfiles\mydata.sav". If we replace the file name with each of the paths in our Python list, we'll open each data file, one by one. We could then add some syntax we'd like to run on each file. Finally, we could save our edits with SAVE OUTFILE "...". and that'll batch process multiple files. In this example, however, we'll simply look up which variables each file contains with spssaux.GetVariableNamesList().

*Open all SPSS data files and print the variables they contain.

begin program.
import os,spss,spssaux
rDir = r'D:\spssfiles'
savs = []
for fil in os.listdir(rDir):
    if fil.endswith(".sav"):
        savs.append(os.path.join(rDir,fil))
for sav in savs:
    spss.Submit("GET FILE '%s'."%sav)
    print sav,spssaux.GetVariableNamesList()
end program.

Result

SPSS File Names And Variable Names With Python

Inspect which Files Contain “Salary”

Now suppose we'd like to know which of our files contain some variable “salary”. We'll simply check if it's present in our variable names list and -if so- print back the name of the data file.

*Report all .sav files that contain a variable "salary" (case sensitive).

begin program.
import os,spss,spssaux
rDir = r'D:\spssfiles'
findVar = 'salary'
savs = []
for fil in os.listdir(rDir):
    if fil.endswith(".sav"):
        savs.append(os.path.join(rDir,fil))
for sav in savs:
    spss.Submit("get file '%s'."%sav)
    if findVar in spssaux.GetVariableNamesList():
        print sav
end program.

Result

SPSS Find Variable Across Files With Python

Circumvent Python’s Case Sensitivity

There's one more point I'd like to cover: since we search for “salary”, Python won't detect “Salary” or “SALARY” because it's fully case sensitive. I you don't like that, the simple solution is to convert all variable names for all files to lower()case.
A basic way to change all items in a Python list is [i... for i in list] where i... is a modified version of i, in our case i.lower(). This technique is known as a Python list comprehension and the syntax below uses it to lowercase all variable names (line 13).

*Report all .sav files that contain a variable "salary" (case insensitive).

begin program.
import os,spss,spssaux
rDir = r'D:\spssfiles'
findVar = 'salary'
savs = []
for fil in os.listdir(rDir):
    if fil.endswith(".sav"):
        savs.append(os.path.join(rDir,fil))
for sav in savs:
    spss.Submit("get file '%s'."%sav)
    if findVar.lower() in [varNam.lower() for varNam in spssaux.GetVariableNamesList()]:
        print sav
end program.

Note: since I usually avoid all uppercasing in SPSS variable names, the result is identical to our case sensitive search.

Thanks for reading.

Tell us what you think!

*Required field. Your comment will show up after approval from a moderator.

THIS TUTORIAL HAS 4 COMMENTS: