SPSS FILTER temporarily excludes a selection of cases
from all data analyses.
For excluding cases from data editing, use DO IF or IF instead.
Quick Overview Contents
- SPSS Filtering Basics
- Example 1 - Exclude Cases with Many Missing Values
- Example 2 - Filter on 2 Variables
- Example 3 - Filter without Filter Variable
- Tip - Commands with Built-In Filters
- Warning - Data Editing with Filter
SPSS FILTER - Example Data
I'll use bank_clean.sav -partly shown below- for all examples in this tutorial. This file contains the data from a small bank employee survey. Feel free to download these data and rerun the examples yourself.
SPSS Filtering Basics
Filtering in SPSS usually involves 4 steps:
- create a filter variable;
- activate the filter variable;
- run one or many analyses -such as correlations, ANOVA or a chi-square test- with the filter variable in effect;
- deactivate the filter variable.
In theory, any variable can be used as a filter variable. After activating it, cases with
- zeroes,
- user missing values or
- system missing values
on the filter variable are excluded from all analyses until you deactivate the filter. For the sake of clarity, I recommend you only use filter variables containing 0 or 1 for each case. Enough theory. Let's put things into practice.
Example 1 - Exclude Cases with Many Missing Values
At the end of our data, we find 9 rating scales: q1 to q9. Perhaps we'd like to run a factor analysis on them or use them as predictors in regression analysis. In any case, we may want to exclude cases having many missing values on these variables. We'll first just count them by running the syntax below.
compute mis_1 = nmiss(q1 to q9).
*Apply variable label.
variable labels mis_1 'Number of missings on q1 to q9'.
*Check frequencies.
frequencies mis_1.
Result
Based on this frequency distribution, we decided to exclude the 8 cases having 3 or more missing values on q1 to q9. We'll create our filter variable with a simple RECODE as shown below.
recode mis_1 (lo thru 2 = 1)(else = 0) into filt_1.
*Apply variable label.
variable labels filt_1 'Filter out cases with 3 or more missings on q1 to q9'.
*Activate filter variable.
filter by filt_1.
*Reinspect numbers of missings over q1 to q9.
frequencies mis_1.
Result
Note that SPSS now reports 456 instead of 464 cases. The 8 cases with 3 or more missing values are still in our data but they are excluded from all analyses. We can see why in data view as shown below.
Case 21 has 8 missing values on q1 to q9 and we recoded this into zero on our filter variable.
The strikethrough its $casenum shows that case 21 is currently filtered out.
The status bar confirms that a filter variable is in effect.
Finally, let's deactivate our filter by simply running
FILTER OFF.
We'll leave our filter variable filt_1 in the data. It won't bother us in any way.
Example 2 - Filter on 2 Variables
For some other analysis, we'd like to use only female respondents working in sales or marketing. A good starting point is running a very simple contingency table as shown below.
set tnumbers both.
*Show frequencies for job type per gender.
crosstabs gender by jtype.
Result
As our table shows, we've 181 female respondents working in either sales or marketing. We'll now create a new filter variable holding only zeroes. We'll then set it to 1 for our case selection with a simple IF command.
compute filt_2 = 0.
*Set filter to 1 for females in job types 1 and 2.
if(gender = 0 & jtype <= 2) filt_2 = 1.
*Apply variable label.
variable labels filt_2 'Filter in females working in sales and marketing'.
*Activate filter.
filter by filt_2.
*Confirm filter working properly.
crosstabs gender by jtype.
Rerunning our contingency table (not shown) confirms that SPSS now reports only 181 female cases working in marketing or sales. Also note that we now have 2 filter variables in our data and that's just fine but only 1 filter variable can be active at any time. Ok. Let's deactivate our new filter variable as well with FILTER OFF.
Example 3 - Filter without Filter Variable
Experienced SPSS users may know that
- TEMPORARY can “undo” some data editing that follow it and
- SELECT IF permanently deletes cases from your data.
By combining them you can circumvent the need for creating a filter variable but for 1 analysis at the time only. The example below shows just that: the first CROSSTABS is limited to a selection of cases but also rolls back our case deletion. The second CROSSTABS therefore includes all cases again.
temporary.
*Delete cases unless gender = 1 & jtype = 3.
select if (gender = 1 & jtype = 3).
*Crosstabs includes only males in IT and rolls back case selection.
crosstabs gender by jtype.
*Crosstabs includes all cases again.
crosstabs gender by jtype.
Tip - Commands with Built-In Filters
Something else you may want to know is that some commands have a built-in filter. These are
- REGRESSION,
- LOGISTIC REGRESSION,
- FACTOR and
- DISCRIMINANT.
The dialog suggests you can filter cases -for this command only- based on just 1 variable. I suspect you can enter more complex conditions on the resulting /SELECT subcommand as well. I haven't tried it.
In any case, I think these built-in filters can be very handy and it kinda puzzles me they're only limited to the 4 aforementioned commands.
Warning - Data Editing with Filter
Most data editing in SPSS is unaffected by filtering. For example, computing means over variables -as shown below- affects all cases, regardless of whatever filter is active. We therefore need DO IF or IF to restrict this transformation to a selection of cases. However, an active filter does affect functions over cases. Some examples that we'll demonstrate below are
- adding a case count with AGGREGATE;
- computing z-scores for one or many variables;
- adding ranks, or percentiles with RANK.
SPSS Data Editing Affected by Filter Examples
filter by filt_2.
*Not affected by filter: add mean over q1 to q9 to data.
compute mean_1 = mean(q1 to q9).
execute.
*Affected by filter: add case count to data.
aggregate outfile * mode addvariables
/ofreq = n.
*Affected by filter: add z-scores salary to data..
descriptives salary
/save.
*Affected by filter: add median groups salary to data.
rank salary
/ntiles(2) into med_salary.
Result
Right. So that's pretty much all about filtering in SPSS. I hope you found this tutorial helpful and
Thanks for reading!
THIS TUTORIAL HAS 27 COMMENTS:
By Tiffany G on August 24th, 2016
I think the reason that SPSS does not strikethrough the user missing data is because when you create a Filter Variable using the point-and-click method, SPSS only allows 3 values: 0s, 1s, or system-missing data (.), but not any user-missing data (-99, 999, etc.).
For example, say a researcher wants to create a filter so they can compare (run an ANOVA on) the answers for 3 specific ethnic groups. The researcher would take the participants' answers and convert them into a filter (probably using syntax, and not by hand). For the filter, even if the participant skipped the question (user-missing), it will be coded as 0 in your filter, because it does not meet the requirement.
Here is some example syntax from SPSS when I created a filter:
USE ALL.
COMPUTE filter_$MEIM234=(MEIM.EG = 2 OR MEIM.EG = 3 OR MEIM.EG = 4 ).
VARIABLE LABELS filter_$MEIM234 'MEIM.EG = 2 OR MEIM.EG = 3 OR MEIM.EG = 4 (FILTER)'.
VALUE LABELS filter_$MEIM234 0 'Not Selected' 1 'Selected'.
FORMATS filter_$MEIM234 (f1.0).
FILTER BY filter_$MEIM234.
EXECUTE.
This filter only selects participants who indicated their ethnic group was a 2, 3, or 4 on the MEIM in our survey.
By Ruben Geert van den Berg on August 25th, 2016
Hi Tiffany!
Why should I even bother about any point-click menu at all? The syntax it generates -in this example anyway- is crap. Especially
COMPUTE filter_$MEIM234=(MEIM.EG = 2 OR MEIM.EG = 3 OR MEIM.EG = 4 ).
is hopelessly inefficient because some of the most useful functions and commands are not available from the menu at all.
Let me show a couple of examples for how to do the exact same thing properly.
*Set up test data.
data list free/mein.eg.
begin data
1 1 1 2 2 2 3 3 3 4 4 4
end data.
compute id = $casenum.
*Use ANY function for creating filter variable.
compute filt1 = any(mein.eg,2,3,4).
variable labels filt1 "Filter out cases whose mein.eg is not 2, 3 or 4".
filter by filt1.
frequencies id.
*Even shorter: use RANGE function for creating filter.
compute filt2 = range(mein.eg,2,4).
variable labels filt2 "Filter out cases whose mein.eg is not 2 through 4".
filter by filt2.
frequencies id.
*FILTER without filter variable.
missing values mein.eg(1).
filter by mein.eg.
frequencies id.
*Conclusion: FILTER works fine but no strikethrough so it seems like there's no filter in effect.
By Maria on November 19th, 2016
Hi Ruben,
How do I create a filter for multiple variables so that I can re-run my stats for the whole dataset while checking for the missing cases. Thank you,
Maria
By Ruben Geert van den Berg on November 20th, 2016
Hi Maria!
If I understand your question correctly, the easiest option is using IF.
Next, use AND or OR for combining conditions.
You can sometimes simplify things by using ANY or RANGE too.
For example, create a filter for gender = 1 and age between 20 and 40:
compute filt1 = 0.
*Set filter to 1 if gender = 1 AND age between 20 and 40 (inclusive).
if(gender = 1) and range(age(20,40)) filt1 = 1.
*Add variable label.
variable labels filt1 "Filter for analyzing cases with gender = 1 and age between 20 and 40 only".
*Switch filter on.
filter by filt1.
*FINISHED!!!.
Does that make any sense AND or OR answers your question?
By Arun Kumar on June 22nd, 2017
This missing value bug got fixed in v24