Summary
Reading multiple sheet Excel workbooks into SPSS is easily done with this Custom Dialog. This tutorial demonstrates how to use it.
Before You Start
SPSS Read and Merge Excel Files Tool- Make sure you have the SPSS Python Essentials installed.
- Download and install the xlrd module.
- If you'd like to generate some test data as done in the syntax example, you'll need the xlwt module as well.
- Download and install Excel to SPSS Tool. Note that this is an SPSS custom dialog. You'll now find under .
- Close all datasets in SPSS.
SPSS Syntax Example for Generating Test Data
* Create some small Excel workbooks for testing.
begin program.
rdir=r'd:\temp' # Specify folder for writing test files.
import xlwt,random,datetime,os
fmt = xlwt.easyxf(num_format_str='M/D/YY')
wBooks = ["book_" + str(cnt) for cnt in range(1,5)]
for noSheets,wBook in enumerate(wBooks):
wb=xlwt.Workbook()
for sheetNo in range(noSheets + 1):
ws=wb.add_sheet("sheet_%d"%(sheetNo + 1))
for col,cont in enumerate(['date','ID','JobTitle','Revenue']):
ws.write(0,col,cont)
for row in range(1,6):
ws.write(row,0,datetime.datetime(2008 + sheetNo,1,1) + datetime.timedelta(days=random.randrange(1,365)),fmt)
ws.write(row,1,random.choice([None,104,21,60,2,1030]))
ws.write(row,2,random.choice([None,'Developer','Tester','Manager']))
ws.write(row,3,random.randrange(40,80)*1000)
wb.save(os.path.join(rdir,wBook + '.xls'))
end program.
begin program.
rdir=r'd:\temp' # Specify folder for writing test files.
import xlwt,random,datetime,os
fmt = xlwt.easyxf(num_format_str='M/D/YY')
wBooks = ["book_" + str(cnt) for cnt in range(1,5)]
for noSheets,wBook in enumerate(wBooks):
wb=xlwt.Workbook()
for sheetNo in range(noSheets + 1):
ws=wb.add_sheet("sheet_%d"%(sheetNo + 1))
for col,cont in enumerate(['date','ID','JobTitle','Revenue']):
ws.write(0,col,cont)
for row in range(1,6):
ws.write(row,0,datetime.datetime(2008 + sheetNo,1,1) + datetime.timedelta(days=random.randrange(1,365)),fmt)
ws.write(row,1,random.choice([None,104,21,60,2,1030]))
ws.write(row,2,random.choice([None,'Developer','Tester','Manager']))
ws.write(row,3,random.randrange(40,80)*1000)
wb.save(os.path.join(rdir,wBook + '.xls'))
end program.
Reading All Data Into SPSS
Since we created our test data in
d:\temp
, this folder will hold the Excel files. We can simply copy-paste this into the dialog. Other than that, we don't have to change anything. The first row holds the variable names and we'd like all sheets from all workbooks to be read.Description
- By default, the program will read in all .xls files in a folder specified by the user.
- By default, all data from all sheets will be imported. The default of all sheets can be overridden by specifying one or more sheets (see below).
- In order for this to make sense, all sheets in all workbooks are assumed to have similar formats (numbers of columns, column contents).
- By default, it is assumed that the first row of each sheet contains column names. If these conflict, the column names of the last sheet of the last workbook that's read will be used. If no column names are present,
column_1
,column_2
and so on will be used as variable names in SPSS.
Converting Date Variables
Date variables in the Excel files are not automatically converted to SPSS date variables. After reading in the data, they can be converted with the syntax below.
* Convert "date" to date format.
compute date=datesum(date.dmy(30,12,1899),date,"days").
format date(edate10).
exe.
compute date=datesum(date.dmy(30,12,1899),date,"days").
format date(edate10).
exe.
What if I Don't Want All Sheets to be Read?
- In this case, the desired sheets can be specified. Note that the first sheet is referenced by 1 (rather than 0).
- If two or more sheets are to be read, separate them with commas.
- If sheets that are specified do not exist in one or more workbooks, the command will not run. An error message will indicate the first workbook where this occurred.
What if I Don't Want All Workbooks to be Read?
This default can not be overridden. A workaround may be to move irrelevant workbooks to a different folder.
THIS TUTORIAL HAS 17 COMMENTS:
By Yaser Pourdavar on May 4th, 2016
Hi Ruben
When I run the code followed by
"SPSS Syntax Example for Generating Test Data",
it sends the following error messages:
>Warning # 207 in column 11. Text: book_
>A '+' was found following a text string, indicating continuation, but the next
>non-blank character was not a quotation mark or an apostrophe.
and
>Error # 5712 in column 8. Text: xlwt
>A valid IMPORT subcommand name is expected but not found. Recognized
>subcommands are FILE, TYPE, KEEP, DROP, RENAME, and MAP.
(The above messages related to line 5)
Thanks in advance
By Ruben Geert van den Berg on May 4th, 2016
Hi Yaser, great to hear from you!
The errors are SPSS errors but they refer to lines of Python so if anything's wrong, you should receive Python warnings instead. This usually means that Python lines aren't recognized as such because
-you don't have the SPSS Python Essentials properly installed or
-you didn't run the first line ("BEGIN PROGRAM.") with the rest of the syntax.
On which SPSS version are you?
By Yaser Pourdavar on May 7th, 2016
Thanks Ruben
By Carolina da Motta on July 28th, 2016
I am trying to import an excel file with 17 worksheets to spss. There is something I can't follow in this tutorial. After going to all the trouble of installing the modules and dialog box, I set the path where my excel book is and nothing happens. I only get a lot of text in the output file. From this point on, I had to deduce that I should paste everything between the lines "begin program" and "end program" on the syntax editor, but nothing happens either after I run it.
By Tal Spalter on May 1st, 2017
Hi Ruben,
I think you are doing a GREAT job! Thank you!
I need to read multiple excel sheets into SPSS (version 24) and used your tutorial for that. I downloaded Python 3.6 (I have a 64 bit OS and installed Python to 32 bit - is this a problem?) and installed it.
I downloaded XLWT and XLRD and I installed the "read and merge excel files" custom dialog which seems fine on the SPSS. However,
When I run the text syntax or try to consolidate three excel example sheets through the custom dialog of read and merge, I get this error:
DATASET ACTIVATE DataSet0.
begin program.
'''
This syntax was pasted from an SPSS custom dialog found at http://www.spss-tutorials.com/read-and-merge-multiple-sheet-excel-workbooks/.
Version: 0.1.
'''
def xlsToSpss(rdir,varnames=True,sheets='all'):
import xlrd,spss,os
fils=[fil for fil in os.listdir(rdir) if fil.endswith(".xls")]
allData=[]
firstSheet = True #First sheet that's actually read => only get varnames once
for cnt,fil in enumerate(fils):
#print fil
wb=xlrd.open_workbook(os.path.join(rdir,fil))
if sheets != 'all':
if "," in sheets: #multiple comma separated sheet indices (starting from 1!)
sheetIndices = [int(i) - 1 for i in sheets.split(',')]
else: #single sheet index
sheetIndices = [int(sheets) - 1]
else: #all sheets in each workbook
sheetIndices = range(wb.nsheets)
for sheetIndex in sheetIndices:
ws = wb.sheets()[sheetIndex]
if firstSheet: #specify variable names just once
firstSheet = False
if varnames: #read variable names from sheet
vNames = ["source_file"]+["source_sheet"]+ws.row_values(0)
else: #standard variable names
vNames = ["source_file"]+["source_sheet"]+["column_%d"%(i + 1) for i in range(ws.ncols)]
#print vNames
fRow = 1 if varnames else 0 #if varnames, first row doesn't hold data values
for row in range(fRow,ws.nrows):
allData.append([fil]+[ws.name]+[val for val in ws.row_values(row)])
mxLens=[0]*len(vNames) #find required lengths for SPSS string variables, zero for numeric
for line in allData:
for cnt in range(len(line)):
if isinstance(line[cnt],basestring) and len(line[cnt])>mxLens[cnt]:
mxLens[cnt]=len(line[cnt])
with spss.DataStep(): #start sending data to SPSS
nds = spss.Dataset('*')
for var in zip(vNames,mxLens):
nds.varlist.append(var[0],var[1]) #var[1] is zero (numeric) or length (string)
for line in allData:
nds.cases.append([None if val=='' else val for val in line])
xlsToSpss(r'C:\Users\SpalterTa2\Desktop\SPSS\Read and merge excel files_test',varnames=True,sheets='all')
end program.
Traceback (most recent call last):
File "", line 44, in
File "", line 7, in xlsToSpss
ImportError: No module named xlrd
Can you help me? What do I need to do?
Thank you in advance,
Tal.