SPSS TUTORIALS FULL COURSE BASICS ANOVA REGRESSION FACTOR

SPSS SELECT IF – Tutorial & Examples

Quick Overview Contents

In SPSS, SELECT IF permanently removes
a selection of cases (rows) from your data.

Summary

SELECT IF in SPSS basically means “delete all cases that don't satisfy one or more conditions”. Like so, select if(gender = 'female'). permanently deletes all cases whose gender is not female. Let's now walk through some real world examples using bank_clean.sav, partly shown below.

Example 1 - Selection for 1 Variable

Let's first delete all cases who don't have at least a Bachelor's degree. The syntax below:

• inspects the frequency distribution for education level;
• deletes unneeded cases;
• inspects the results.
*Show values and value labels in new output tables.

set tnumbers both.

*Run minimal frequencies table.

frequencies educ.

*Select cases with a Bachelor's degree or higher. Delete all other cases.

select if(educ >= 4).

*Reinspect frequencies.

frequencies educ.

Result

As we see, our data now only contain cases having a Bachelor's, Master's or PhD degree. Importantly, cases having

on education level have been removed from the data as well.

Example 2 - Selection for 2 Variables

The syntax below selects cases based on gender and education level: we'll keep only female respondents having at least a Bachelor's degree in our data.

*Inspect contingency table sex and education.

crosstabs educ by gender.

*Select females having a Bachelor's degree or higher.

select if(gender = 0 & educ >= 4).

*Reinspect contingency table.

crosstabs educ by gender.

Example 3 - Selection for (Non) Missing Values

Selections based on (non) missing values are straightforward if you master SPSS Missing Values Functions. For example, the syntax below shows 2 options for deleting cases having fewer than 7 valid values on the last 10 variables (overall to q9).

*Select cases having at least 7 non missing values out of last 10 questions.

select if(nvalid(overall to q9) >= 7)./*At least 7 valid values or at most 3 missings.
execute.

*Alternative way, exact same result.

select if(nmiss(overall to q9) < 4)./*Fewer than 4 missings or more than 6 valid values.
execute.

Tip 1 - Inspect Selection Before Deletion

Before deleting cases, I sometimes want to have a quick look at them. A good way for doing so is creating a FILTER variable. The syntax below shows the right way for doing so.

*Create filter variable holding only zeroes.

compute filt_1 = 0.

*Set filter variable to 1 for cases we want to keep in data.

if(nvalid(overall to q9) >= 7) filt_1 = 1.

*Move unselected cases to bottom of dataset.

sort cases by filt_1 (d).

*Scroll to bottom of dataset now. Note that cases 459 - 464 will be deleted because they have 0 on filt_1.

*If selection as desired, delete other cases.

select if(filt_1).
execute.

Quick note: select if(filt_1). is a shorthand for select if(filt_1 <> 0). and deletes cases having either a zero or a missing value on filt_1.

Result

Cases that will be deleted are at the bottom of our data. We also readily see we'll have 458 cases left after doing so.

Tip 2 - Use TEMPORARY

A final tip I want to mention is combining SELECT IF with TEMPORARY. By doing so, SELECT IF only applies to the first procedure that follows it. For a quick example, compare the results of the first and second FREQUENCIES commands below.

*Make sure case deletion only applies to first procedure.

temporary.

*Select only female cases.

select if(gender = 0).

*Any procedure now uses only female cases. This also reverses case selection.

frequencies gender educ.

*Rerunning frequencies now uses all cases in data again.

frequencies gender educ.

Final Notes

First off, parentheses around conditions in syntax are not required. Therefore, select if(gender = 0). can also be written as select if gender = 0. I used to think that shorter syntax is always better but I changed my mind over the years. Readability and clear structure are important too. I therefore use (and recommend) parentheses around conditions. This also goes for IF and DO IF.

Right, I guess that should do. Did I miss anything? Please let me know by throwing a comment below.

Tell us what you think!

*Required field. Your comment will show up after approval from a moderator.

• By Renate on May 19th, 2018

Got it made: SELECT IF (v_223 = 3) or (Feedb ~= '-99').
But now it is getting more complicated. Participants were randomly assigned to one of three conditions andat the end they were asked what condition they remember. There are two variables: V=1,2,3 and V2=1,2,3. Now I want all cases with V=1 or 2 and V2 =1 or 2 but not V2 = 3 and all cases V =3 and V2=3 but not V2= 1 or 2. Is that understandable? And is ther a solution for it? Thanks

• By Ruben Geert van den Berg on May 19th, 2018

Hi Renate, I'll fix this in half an hour or so. I'm starving so I really gotta have lunch first.

• By Ruben Geert van den Berg on May 19th, 2018

Hi Renate, please try the syntax below. I'll first create and inspect a selector variable. Only if that's correct, then I'll delete unneeded cases.

```*Set up test data with all possible combinations, assuming there's no missings. data list free/v v2. begin data 1 1 2 1 3 1 1 2 2 2 3 2 1 3 2 3 3 3 end data.```

``` *Compute selector, all zeroes. compute selector = 0. execute. *Set selector to 1 if either condition below is met. if(any(v,1,2) & any(v2,1,2)) selector = 1. if(v = 3 & v2 = 3) selector = 1. execute. *If correct, delete unneeded cases. select if(selector = 1). execute. delete variables selector. ```

`*Note: you only need the last execute command.`

• By Nik on May 22nd, 2018

Hi Ruben! Thank you for your answer! I was wrong tor write. Really I wanted to write that "a ~= b is true if a is not b". So, for a variable N, I need SPSS select not only all numeric values different than b but ALSO missing values.
For example, I have a variable where value 1 has 10 frequencies, value 2 has 20 frequencies, variable 3 has 30 frequencies and 100 Missing values. I need to select all cases different from 1 including ALSO 100 missing cases. If now I write ANY(N,2,3) will be selected only 50 cases (2 and 3) but I would select also missing values. How could i do it?
I hope you can help me!
Thank you!

• By Ruben Geert van den Berg on May 22nd, 2018

Hi Nik!

Maybe something like `IF(ANY(N,2,3) OR MISSING(N)) .... ` will do the trick?