## Quick Overview Contents

In SPSS, SELECT IF permanently removes

a selection of cases (rows) from your data.

- Example 1 - Selection for 1 Variable
- Example 2 - Selection for 2 Variables
- Example 3 - Selection for (Non) Missing Values
- Tip 1 - Inspect Selection Before Deletion
- Tip 2 - Use TEMPORARY

## Summary

SELECT IF in SPSS basically means “delete all cases that *don't* satisfy one or more conditions”. Like so,
select if(gender = 'female').
permanently deletes all cases whose gender is *not* female. Let's now walk through some real world examples using bank_clean.sav, partly shown below.

## Example 1 - Selection for 1 Variable

Let's first delete all cases who don't have at least a Bachelor's degree. The syntax below:

- inspects the frequency distribution for education level;
- deletes unneeded cases;
- inspects the results.

***Show values and value labels in new output tables.**

set tnumbers both.

***Run minimal frequencies table.**

frequencies educ.

***Select cases with a Bachelor's degree or higher. Delete all other cases.**

select if(educ >= 4).

***Reinspect frequencies.**

frequencies educ.

## Result

As we see, our data now only contain cases having a Bachelor's, Master's or PhD degree. Importantly, cases having

on education level have been removed from the data as well.

## Example 2 - Selection for 2 Variables

The syntax below selects cases based on gender *and* education level: we'll keep only female respondents having at least a Bachelor's degree in our data.

***Inspect contingency table sex and education.**

crosstabs educ by gender.

***Select females having a Bachelor's degree or higher.**

select if(gender = 0 & educ >= 4).

***Reinspect contingency table.**

crosstabs educ by gender.

## Result

## Example 3 - Selection for (Non) Missing Values

Selections based on (non) missing values are straightforward if you master SPSS Missing Values Functions. For example, the syntax below shows 2 options for deleting cases having fewer than 7 valid values on the last 10 variables (overall to q9).

***Select cases having at least 7 non missing values out of last 10 questions.**

select if(nvalid(overall to q9) >= 7)./*At least 7 valid values or at most 3 missings.

execute.

***Alternative way, exact same result.**

select if(nmiss(overall to q9) < 4)./*Fewer than 4 missings or more than 6 valid values.

execute.

## Tip 1 - Inspect Selection Before Deletion

Before deleting cases, I sometimes want to have a quick look at them. A good way for doing so is creating a FILTER variable. The syntax below shows the right way for doing so.

***Create filter variable holding only zeroes.**

compute filt_1 = 0.

***Set filter variable to 1 for cases we want to keep in data.**

if(nvalid(overall to q9) >= 7) filt_1 = 1.

***Move unselected cases to bottom of dataset.**

sort cases by filt_1 (d).

***Scroll to bottom of dataset now. Note that cases 459 - 464 will be deleted because they have 0 on filt_1.**

***If selection as desired, delete other cases.**

select if(filt_1).

execute.

Quick note: select if(filt_1). is a shorthand for select if(filt_1 <> 0). and deletes cases having either a zero or a missing value on filt_1.

## Result

Cases that will be deleted are at the bottom of our data. We also readily see we'll have 458 cases left after doing so.## Tip 2 - Use TEMPORARY

A final tip I want to mention is combining SELECT IF with TEMPORARY. By doing so, SELECT IF only applies to the first procedure that follows it. For a quick example, compare the results of the first and second FREQUENCIES commands below.

***Make sure case deletion only applies to first procedure.**

temporary.

***Select only female cases.**

select if(gender = 0).

***Any procedure now uses only female cases. This also reverses case selection.**

frequencies gender educ.

***Rerunning frequencies now uses all cases in data again.**

frequencies gender educ.

## Final Notes

First off, parentheses around conditions in syntax are not required. Therefore, select if(gender = 0). can also be written as select if gender = 0. I used to think that shorter syntax is always better but I changed my mind over the years. Readability and clear structure are important too. I therefore use (and recommend) parentheses around conditions. This also goes for IF and DO IF.

Right, I guess that should do. Did I miss anything? Please let me know by throwing a comment below.

Thanks for reading!

## THIS TUTORIAL HAS 29 COMMENTS:

## By Renate on May 19th, 2018

Got it made: SELECT IF (v_223 = 3) or (Feedb ~= '-99').

But now it is getting more complicated. Participants were randomly assigned to one of three conditions andat the end they were asked what condition they remember. There are two variables: V=1,2,3 and V2=1,2,3. Now I want all cases with V=1 or 2 and V2 =1 or 2 but not V2 = 3 and all cases V =3 and V2=3 but not V2= 1 or 2. Is that understandable? And is ther a solution for it? Thanks

## By Ruben Geert van den Berg on May 19th, 2018

Hi Renate, I'll fix this in half an hour or so. I'm starving so I really gotta have lunch first.

## By Ruben Geert van den Berg on May 19th, 2018

Hi Renate, please try the syntax below. I'll first create and inspect a selector variable. Only if that's correct, then I'll delete unneeded cases.

`*Set up test data with all possible combinations, assuming there's no missings.`

data list free/v v2.

begin data

1 1 2 1 3 1 1 2 2 2 3 2 1 3 2 3 3 3

end data.

*Compute selector, all zeroes.

compute selector = 0.

execute.

*Set selector to 1 if either condition below is met.

if(any(v,1,2) & any(v2,1,2)) selector = 1.

if(v = 3 & v2 = 3) selector = 1.

execute.

*If correct, delete unneeded cases.

select if(selector = 1).

execute.

delete variables selector.

`*Note: you only need the last execute command.`

## By Nik on May 22nd, 2018

Hi Ruben! Thank you for your answer! I was wrong tor write. Really I wanted to write that "a ~= b is true if a is not b". So, for a variable N, I need SPSS select not only all numeric values different than b but ALSO missing values.

For example, I have a variable where value 1 has 10 frequencies, value 2 has 20 frequencies, variable 3 has 30 frequencies and 100 Missing values. I need to select all cases different from 1 including ALSO 100 missing cases. If now I write ANY(N,2,3) will be selected only 50 cases (2 and 3) but I would select also missing values. How could i do it?

I hope you can help me!

Thank you!

## By Ruben Geert van den Berg on May 22nd, 2018

Hi Nik!

Maybe something like

`IF(ANY(N,2,3) OR MISSING(N)) ....`

will do the trick?