Running syntax over several SPSS data files in one go is fairly easy. If we use SPSS with Python we don't even have to type in the file names. The Python os (for operating system) module will do it for us.
Try it for yourself by downloading spssfiles.zip. Unzip these files into d:\spssfiles
as shown below and you're good to go.
Find All Files and Folders in Root Directory
The syntax below creates a Python list of files and folders in rDir
, our root directory. Prefixing it with an r as in r'D:\spssfiles'
ensures that the backslash doesn't do anything weird.
begin program python3.
import os
rDir = r'D:\spssfiles'
print(os.listdir(rDir))
end program.
Result
Filter Out All .Sav Files
As we see, os.listdir()
creates a list of all files and folders in rDir
but we only want SPSS data files. For filtering them out, we first create and empty list with savs = []
. Next, we'll add each file to this list if it endswith(".sav")
.
begin program python3.
import os
rDir = r'D:\spssfiles'
savs = []
for fil in os.listdir(rDir):
if fil.endswith(".sav"):
savs.append(fil)
print(savs)
end program.
Using Full Paths for SPSS Files
For doing anything whatsoever with our data files, we probably want to open them. For doing so, SPSS needs to know in which folder they are located. We could simply set a default directory in SPSS with CD as in
CD "d:\spssfiles".
However, having Python create full paths to our files with os.path.join()
is a more fool-proof approach for this.
begin program python3.
import os
rDir = r'D:\spssfiles'
savs = []
for fil in os.listdir(rDir):
if fil.endswith(".sav"):
savs.append(os.path.join(rDir,fil))
for sav in savs:
print(sav)
end program.
Result
Have SPSS Open Each Data File
Generally, we open a data file in SPSS with something like
GET FILE "d:\spssfiles\mydata.sav".
If we replace the file name with each of the paths in our Python list, we'll open each data file, one by one. We could then add some syntax we'd like to run on each file. Finally, we could save our edits with
SAVE OUTFILE "...".
and that'll batch process multiple files. In this example, however, we'll simply look up which variables each file contains with spssaux.GetVariableNamesList()
.
begin program python3.
import os,spss,spssaux
rDir = r'D:\spssfiles'
savs = []
for fil in os.listdir(rDir):
if fil.endswith(".sav"):
savs.append(os.path.join(rDir,fil))
for sav in savs:
spss.Submit("GET FILE '%s'."%sav)
print(sav,spssaux.GetVariableNamesList())
end program.
Result
Inspect which Files Contain “Salary”
Now suppose we'd like to know which of our files contain some variable “salary”. We'll simply check if it's present in our variable names list and -if so- print back the name of the data file.
begin program python3.
import os,spss,spssaux
rDir = r'D:\spssfiles'
findVar = 'salary'
savs = []
for fil in os.listdir(rDir):
if fil.endswith(".sav"):
savs.append(os.path.join(rDir,fil))
for sav in savs:
spss.Submit("get file '%s'."%sav)
if findVar in spssaux.GetVariableNamesList():
print(sav)
end program.
Result
Circumvent Python’s Case Sensitivity
There's one more point I'd like to cover: since we search for “salary”, Python won't detect “Salary” or “SALARY” because it's fully case sensitive. I you don't like that, the simple solution is to convert all variable names for all files to lower()
case.
A basic way to change all items in a Python list is
[i... for i in list]
where i...
is a modified version of i
, in our case i.lower()
. This technique is known as a Python list comprehension and the syntax below uses it to lowercase all variable names (line 13).
begin program python3.
import os,spss,spssaux
rDir = r'D:\spssfiles'
findVar = 'salary'
savs = []
for fil in os.listdir(rDir):
if fil.endswith(".sav"):
savs.append(os.path.join(rDir,fil))
for sav in savs:
spss.Submit("get file '%s'."%sav)
if findVar.lower() in [varNam.lower() for varNam in spssaux.GetVariableNamesList()]:
print(sav)
end program.
Note: since I usually avoid all uppercasing in SPSS variable names, the result is identical to our case sensitive search.
Thanks for reading!