Summary
Reading multiple sheet Excel workbooks into SPSS is easily done with this Custom Dialog. This tutorial demonstrates how to use it.
Before You Start
SPSS Read and Merge Excel Files Tool- Make sure you have the SPSS Python Essentials installed.
- Download and install the xlrd module.
- If you'd like to generate some test data as done in the syntax example, you'll need the xlwt module as well.
- Download and install Excel to SPSS Tool. Note that this is an SPSS custom dialog. You'll now find under .
- Close all datasets in SPSS.
SPSS Syntax Example for Generating Test Data
* Create some small Excel workbooks for testing.
begin program.
rdir=r'd:\temp' # Specify folder for writing test files.
import xlwt,random,datetime,os
fmt = xlwt.easyxf(num_format_str='M/D/YY')
wBooks = ["book_" + str(cnt) for cnt in range(1,5)]
for noSheets,wBook in enumerate(wBooks):
wb=xlwt.Workbook()
for sheetNo in range(noSheets + 1):
ws=wb.add_sheet("sheet_%d"%(sheetNo + 1))
for col,cont in enumerate(['date','ID','JobTitle','Revenue']):
ws.write(0,col,cont)
for row in range(1,6):
ws.write(row,0,datetime.datetime(2008 + sheetNo,1,1) + datetime.timedelta(days=random.randrange(1,365)),fmt)
ws.write(row,1,random.choice([None,104,21,60,2,1030]))
ws.write(row,2,random.choice([None,'Developer','Tester','Manager']))
ws.write(row,3,random.randrange(40,80)*1000)
wb.save(os.path.join(rdir,wBook + '.xls'))
end program.
begin program.
rdir=r'd:\temp' # Specify folder for writing test files.
import xlwt,random,datetime,os
fmt = xlwt.easyxf(num_format_str='M/D/YY')
wBooks = ["book_" + str(cnt) for cnt in range(1,5)]
for noSheets,wBook in enumerate(wBooks):
wb=xlwt.Workbook()
for sheetNo in range(noSheets + 1):
ws=wb.add_sheet("sheet_%d"%(sheetNo + 1))
for col,cont in enumerate(['date','ID','JobTitle','Revenue']):
ws.write(0,col,cont)
for row in range(1,6):
ws.write(row,0,datetime.datetime(2008 + sheetNo,1,1) + datetime.timedelta(days=random.randrange(1,365)),fmt)
ws.write(row,1,random.choice([None,104,21,60,2,1030]))
ws.write(row,2,random.choice([None,'Developer','Tester','Manager']))
ws.write(row,3,random.randrange(40,80)*1000)
wb.save(os.path.join(rdir,wBook + '.xls'))
end program.
Reading All Data Into SPSS
Since we created our test data in
d:\temp
, this folder will hold the Excel files. We can simply copy-paste this into the dialog. Other than that, we don't have to change anything. The first row holds the variable names and we'd like all sheets from all workbooks to be read.Description
- By default, the program will read in all .xls files in a folder specified by the user.
- By default, all data from all sheets will be imported. The default of all sheets can be overridden by specifying one or more sheets (see below).
- In order for this to make sense, all sheets in all workbooks are assumed to have similar formats (numbers of columns, column contents).
- By default, it is assumed that the first row of each sheet contains column names. If these conflict, the column names of the last sheet of the last workbook that's read will be used. If no column names are present,
column_1
,column_2
and so on will be used as variable names in SPSS.
Converting Date Variables
Date variables in the Excel files are not automatically converted to SPSS date variables. After reading in the data, they can be converted with the syntax below.
* Convert "date" to date format.
compute date=datesum(date.dmy(30,12,1899),date,"days").
format date(edate10).
exe.
compute date=datesum(date.dmy(30,12,1899),date,"days").
format date(edate10).
exe.
What if I Don't Want All Sheets to be Read?
- In this case, the desired sheets can be specified. Note that the first sheet is referenced by 1 (rather than 0).
- If two or more sheets are to be read, separate them with commas.
- If sheets that are specified do not exist in one or more workbooks, the command will not run. An error message will indicate the first workbook where this occurred.
What if I Don't Want All Workbooks to be Read?
This default can not be overridden. A workaround may be to move irrelevant workbooks to a different folder.
THIS TUTORIAL HAS 17 COMMENTS:
By Ruben Geert van den Berg on June 14th, 2018
... so make really sure that you also select the
begin program.
line before running it. The F2 shortkey does not work properly here as it will not select the entire program block including BEGIN PROGRAM. and END PROGRAM. You must select and run these 2 lines (!) and everything between them.By Kim on June 14th, 2018
I Love You!!!
hahahah...
Selecting the code before running was my solution.
Now i'm fighting with my headers format (no spaces and no blank column headers allowed).
Thank you very much for your fast and helping words.
Kim
By Kim on June 21st, 2018
Dear Ruben,
Hi again!
After long pre-processing work with mi excel files (such as making all headers equal and without spaces), i ran your "Read and Merge Excel Files" routine, and noticed that the routine brakes when arriving to the "date header".
It does not make the job, it just stops.
An error reporting only strings are allowed made me understand that was because of Date Variable is automatically set as string by the routine.
Same thing happens when it finds only an integer value in a column where before were strings (so the variable was set as string accordingly).
What format should i have in my .xls files for date to work properly after import (and to be able to import them at all..!) ????
Thanks a lot!
Best regards, Kim
By Ruben Geert van den Berg on June 21st, 2018
Hi Kim!
I wrote the tool ages ago and I haven't worked on it ever since so I'm not sure. I don't really have the time to look into it either. But how many xls workbooks do you have and how many sheets does each one have? Perhaps there's a workaround.
By Kim on June 21st, 2018
Thanks Ruben.
I understand.
I have 10 groups x 48 Excel Workbooks x 1 sheet x 30.000 rows aprox.
Yes, i'll try to pre-format well my columns.
Only if possible for you, any suggestion would be useful.
Kind regards, Kim