"I have a data file on which I'd like to carry out several regression analyses. I have four dependent variables, v1 through v4. The independent variables (v5 through v14) are the same for all analyses. How can I carry out these four analyses in an efficient way that would also work for 100 dependent variables?"
SPSS Python Syntax Example
*Run REGRESSION repeatedly over different dependent variables.
begin program.
import spss,spssaux
dependent = 'v1 to v4' # dependent variables.
spssSyntax = '' # empty Python string that we add SPSS REGRESSION commands to
depList = spssaux.VariableDict(caseless = True).expand(dependent) # create Python list of variable names
for dep in depList: # "+=" (below) concatenates SPSS REGRESSION commands to spssSyntax
spssSyntax += '''
REGRESSION
/MISSING PAIRWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT %s
/METHOD=STEPWISE v5 to v14.
'''%dep # replace "%s" in syntax by by dependent var
print spssSyntax # prints REGRESSION commands to SPSS output window
end program.
*If REGRESSION commands look good, have SPSS run them.
begin program.
spss.Submit(spssSyntax)
end program.
begin program.
import spss,spssaux
dependent = 'v1 to v4' # dependent variables.
spssSyntax = '' # empty Python string that we add SPSS REGRESSION commands to
depList = spssaux.VariableDict(caseless = True).expand(dependent) # create Python list of variable names
for dep in depList: # "+=" (below) concatenates SPSS REGRESSION commands to spssSyntax
spssSyntax += '''
REGRESSION
/MISSING PAIRWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT %s
/METHOD=STEPWISE v5 to v14.
'''%dep # replace "%s" in syntax by by dependent var
print spssSyntax # prints REGRESSION commands to SPSS output window
end program.
*If REGRESSION commands look good, have SPSS run them.
begin program.
spss.Submit(spssSyntax)
end program.
Description
- That this syntax uses Python so you need to have the SPSS Python Essentials installed in order to run it;
- The syntax will simply run a standard SPSS regression analysis analysis over different dependent variables one-by-one;
- Except for the occurrence of
%s
, Python will submit to SPSS a textbook example of regression syntax generated by the GUI. It can be modified as desired. - The TO and ALL keywords may be used for specifying the dependent and independent variables. The entire specification is enclosed in quotes.
- As a test file for this solution, you could use supermarket.sav.
THIS TUTORIAL HAS 22 COMMENTS:
By K.G. on December 6th, 2017
I have a question about batch data analysis using SPSS. I have binary data that I want to do Fisher Chi square 2x2 analysis on. I have 5 dependent variables and ~6800 independent variables I want to analyze. SPSS has a limit of 76 variables that it will do Chi square analysis on at one time. Is there a way to automate this process or do I have to manually input the variables for each iteration (which will be a little less than 90 sets I will have to manually input x 5)? There is a second set of independent variables I have decided to hold off on analyzing which has 36000 variables until I can find a way to do this that is not very labor & time intensive. I see on this page the automation for regression analysis but I need to find a way to do this for Chi Square analysis. Any help with this is greatly appreciated.
By Ruben Geert van den Berg on December 6th, 2017
Dear Kassatihun,
Sure, you could modify the syntax presented here for a chi-square test as well -any test basically. I may take some -or a lot of- processing time, depending on the number of cases in your data and how you set things up. That may be a real issue for your data.
The more interesting question is: what are you looking for in the first place? How do you plan to make sense of so many results? I suggest you read up on SPSS OMS Tutorial - Creating Data from Output.
It sounds like quite a challenge even with the right tools. I can probably run the whole thing for you but I'd have to charge a couple of hours for doing so.
Hope that helps!
By K. G. on December 6th, 2017
There's 25 cases so not too high an N value, which is why I am doing a 2x2 analysis since I don't have enough power to do Pearson Chi square or use other statistical tests which require higher power. I am new to statistics, SPSS and to Python, although I have Unix programming experience. I am taking this on as a challenge. From what I've done so far with just 76 variables at a time, the data runs very fast. I just need to automate it since the time limiting step is me literally selecting variables and adding to the Independent Variable box, and removing the ones that already ran. With regards to output, I've passed the "panic" stage ... finally. My basic plan was to do step 2-4 of what you listed on your OMS site. I will now revise that approach ;-)
My question now is how do I modify the algorithm above for Chi Square Fisher instead of Regression? Do I replace "REGRESSION" on line 10 with "FISHER" ... ?
Thank you.
By Ruben Geert van den Berg on December 6th, 2017
First off, I think you shouldn't run so many tests on 25 observations only. If you apply a Bonferroni correction, perhaps nothing could be statistically significant even in theory.
But if you want to proceed anyway, build things up in steps: first create the syntax for one test. Then build the Python loop around it for some 10 variables. If that runs ok, only then expand the loop for including all variables. That's usually the right order to set things up.
Hope that helps!
By Ruben Geert van den Berg on December 6th, 2017
P.s. another tutorial that comes close to what you're looking for is SPSS with Python - Looping over Scatterplots.