Quick Overview Contents
In SPSS, SELECT IF permanently removes
a selection of cases (rows) from your data.
- Example 1 - Selection for 1 Variable
- Example 2 - Selection for 2 Variables
- Example 3 - Selection for (Non) Missing Values
- Tip 1 - Inspect Selection Before Deletion
- Tip 2 - Use TEMPORARY
Summary
SELECT IF in SPSS basically means “delete all cases that don't satisfy one or more conditions”. Like so, select if(gender = 'female'). permanently deletes all cases whose gender is not female. Let's now walk through some real world examples using bank_clean.sav, partly shown below.
Example 1 - Selection for 1 Variable
Let's first delete all cases who don't have at least a Bachelor's degree. The syntax below:
- inspects the frequency distribution for education level;
- deletes unneeded cases;
- inspects the results.
set tnumbers both.
*Run minimal frequencies table.
frequencies educ.
*Select cases with a Bachelor's degree or higher. Delete all other cases.
select if(educ >= 4).
*Reinspect frequencies.
frequencies educ.
Result
As we see, our data now only contain cases having a Bachelor's, Master's or PhD degree. Importantly, cases having
on education level have been removed from the data as well.
Example 2 - Selection for 2 Variables
The syntax below selects cases based on gender and education level: we'll keep only female respondents having at least a Bachelor's degree in our data.
crosstabs educ by gender.
*Select females having a Bachelor's degree or higher.
select if(gender = 0 & educ >= 4).
*Reinspect contingency table.
crosstabs educ by gender.
Result
Example 3 - Selection for (Non) Missing Values
Selections based on (non) missing values are straightforward if you master SPSS Missing Values Functions. For example, the syntax below shows 2 options for deleting cases having fewer than 7 valid values on the last 10 variables (overall to q9).
select if(nvalid(overall to q9) >= 7)./*At least 7 valid values or at most 3 missings.
execute.
*Alternative way, exact same result.
select if(nmiss(overall to q9) < 4)./*Fewer than 4 missings or more than 6 valid values.
execute.
Tip 1 - Inspect Selection Before Deletion
Before deleting cases, I sometimes want to have a quick look at them. A good way for doing so is creating a FILTER variable. The syntax below shows the right way for doing so.
compute filt_1 = 0.
*Set filter variable to 1 for cases we want to keep in data.
if(nvalid(overall to q9) >= 7) filt_1 = 1.
*Move unselected cases to bottom of dataset.
sort cases by filt_1 (d).
*Scroll to bottom of dataset now. Note that cases 459 - 464 will be deleted because they have 0 on filt_1.
*If selection as desired, delete other cases.
select if(filt_1).
execute.
Quick note: select if(filt_1). is a shorthand for select if(filt_1 <> 0). and deletes cases having either a zero or a missing value on filt_1.
Result
Cases that will be deleted are at the bottom of our data. We also readily see we'll have 458 cases left after doing so.Tip 2 - Use TEMPORARY
A final tip I want to mention is combining SELECT IF with TEMPORARY. By doing so, SELECT IF only applies to the first procedure that follows it. For a quick example, compare the results of the first and second FREQUENCIES commands below.
temporary.
*Select only female cases.
select if(gender = 0).
*Any procedure now uses only female cases. This also reverses case selection.
frequencies gender educ.
*Rerunning frequencies now uses all cases in data again.
frequencies gender educ.
Final Notes
First off, parentheses around conditions in syntax are not required. Therefore, select if(gender = 0). can also be written as select if gender = 0. I used to think that shorter syntax is always better but I changed my mind over the years. Readability and clear structure are important too. I therefore use (and recommend) parentheses around conditions. This also goes for IF and DO IF.
Right, I guess that should do. Did I miss anything? Please let me know by throwing a comment below.
Thanks for reading!
THIS TUTORIAL HAS 29 COMMENTS:
By Renate on May 19th, 2018
Got it made: SELECT IF (v_223 = 3) or (Feedb ~= '-99').
But now it is getting more complicated. Participants were randomly assigned to one of three conditions andat the end they were asked what condition they remember. There are two variables: V=1,2,3 and V2=1,2,3. Now I want all cases with V=1 or 2 and V2 =1 or 2 but not V2 = 3 and all cases V =3 and V2=3 but not V2= 1 or 2. Is that understandable? And is ther a solution for it? Thanks
By Ruben Geert van den Berg on May 19th, 2018
Hi Renate, I'll fix this in half an hour or so. I'm starving so I really gotta have lunch first.
By Ruben Geert van den Berg on May 19th, 2018
Hi Renate, please try the syntax below. I'll first create and inspect a selector variable. Only if that's correct, then I'll delete unneeded cases.
*Set up test data with all possible combinations, assuming there's no missings.
data list free/v v2.
begin data
1 1 2 1 3 1 1 2 2 2 3 2 1 3 2 3 3 3
end data.
*Compute selector, all zeroes.
compute selector = 0.
execute.
*Set selector to 1 if either condition below is met.
if(any(v,1,2) & any(v2,1,2)) selector = 1.
if(v = 3 & v2 = 3) selector = 1.
execute.
*If correct, delete unneeded cases.
select if(selector = 1).
execute.
delete variables selector.
*Note: you only need the last execute command.
By Nik on May 22nd, 2018
Hi Ruben! Thank you for your answer! I was wrong tor write. Really I wanted to write that "a ~= b is true if a is not b". So, for a variable N, I need SPSS select not only all numeric values different than b but ALSO missing values.
For example, I have a variable where value 1 has 10 frequencies, value 2 has 20 frequencies, variable 3 has 30 frequencies and 100 Missing values. I need to select all cases different from 1 including ALSO 100 missing cases. If now I write ANY(N,2,3) will be selected only 50 cases (2 and 3) but I would select also missing values. How could i do it?
I hope you can help me!
Thank you!
By Ruben Geert van den Berg on May 22nd, 2018
Hi Nik!
Maybe something like
IF(ANY(N,2,3) OR MISSING(N)) ....
will do the trick?