SPSS FILTER Tutorial - Exclude Cases from Analyses

SPSS FILTER – Quick & Simple Tutorial

SPSS FILTER temporarily excludes a selection of cases
from all data analyses. For excluding cases from data editing, use DO IF or IF instead.

Quick Overview Contents

SPSS Filtering Basics
Example 1 - Exclude Cases with Many Missing Values
Example 2 - Filter on 2 Variables
Example 3 - Filter without Filter Variable
Tip - Commands with Built-In Filters
Warning - Data Editing with Filter

SPSS FILTER - Example Data

I'll use bank_clean.sav -partly shown below- for all examples in this tutorial. This file contains the data from a small bank employee survey. Feel free to download these data and rerun the examples yourself.

SPSS Filtering Basics

Filtering in SPSS usually involves 4 steps:

create a filter variable;
activate the filter variable;
run one or many analyses -such as correlations, ANOVA or a chi-square test- with the filter variable in effect;
deactivate the filter variable.

In theory, any variable can be used as a filter variable. After activating it, cases with

zeroes,
user missing values or
system missing values

on the filter variable are excluded from all analyses until you deactivate the filter. For the sake of clarity, I recommend you only use filter variables containing 0 or 1 for each case. Enough theory. Let's put things into practice.

Example 1 - Exclude Cases with Many Missing Values

At the end of our data, we find 9 rating scales: q1 to q9. Perhaps we'd like to run a factor analysis on them or use them as predictors in regression analysis. In any case, we may want to exclude cases having many missing values on these variables. We'll first just count them by running the syntax below.

*Create new variable holding number of missings over q1 to q9.
compute mis_1 = nmiss(q1 to q9).

*Apply variable label.
variable labels mis_1 'Number of missings on q1 to q9'.

*Check frequencies.
frequencies mis_1.

Result

Based on this frequency distribution, we decided to exclude the 8 cases having 3 or more missing values on q1 to q9. We'll create our filter variable with a simple RECODE as shown below.

*Recode mis_1 into filter variable.
recode mis_1 (lo thru 2 = 1)(else = 0) into filt_1.

*Apply variable label.
variable labels filt_1 'Filter out cases with 3 or more missings on q1 to q9'.

*Activate filter variable.
filter by filt_1.

*Reinspect numbers of missings over q1 to q9.
frequencies mis_1.

Result

Note that SPSS now reports 456 instead of 464 cases. The 8 cases with 3 or more missing values are still in our data but they are excluded from all analyses. We can see why in data view as shown below.

Case 21 has 8 missing values on q1 to q9 and we recoded this into zero on our filter variable.
The strikethrough its $casenum shows that case 21 is currently filtered out.
The status bar confirms that a filter variable is in effect. Finally, let's deactivate our filter by simply running FILTER OFF. We'll leave our filter variable filt_1 in the data. It won't bother us in any way.

Example 2 - Filter on 2 Variables

For some other analysis, we'd like to use only female respondents working in sales or marketing. A good starting point is running a very simple contingency table as shown below.

*Show values and value labels in subsequent output tables.
set tnumbers both.

*Show frequencies for job type per gender.
crosstabs gender by jtype.

Result

As our table shows, we've 181 female respondents working in either sales or marketing. We'll now create a new filter variable holding only zeroes. We'll then set it to 1 for our case selection with a simple IF command.

*Create filter variable holding only zeroes.
compute filt_2 = 0.

*Set filter to 1 for females in job types 1 and 2.
if(gender = 0 & jtype <= 2) filt_2 = 1.

*Apply variable label.
variable labels filt_2 'Filter in females working in sales and marketing'.

*Activate filter.
filter by filt_2.

*Confirm filter working properly.
crosstabs gender by jtype.

Rerunning our contingency table (not shown) confirms that SPSS now reports only 181 female cases working in marketing or sales. Also note that we now have 2 filter variables in our data and that's just fine but only 1 filter variable can be active at any time. Ok. Let's deactivate our new filter variable as well with FILTER OFF.

Example 3 - Filter without Filter Variable

Experienced SPSS users may know that

TEMPORARY can “undo” some data editing that follow it and
SELECT IF permanently deletes cases from your data.

By combining them you can circumvent the need for creating a filter variable but for 1 analysis at the time only. The example below shows just that: the first CROSSTABS is limited to a selection of cases but also rolls back our case deletion. The second CROSSTABS therefore includes all cases again.

*Make following transformation(s) temporary.
temporary.

*Delete cases unless gender = 1 & jtype = 3.
select if (gender = 1 & jtype = 3).

*Crosstabs includes only males in IT and rolls back case selection.
crosstabs gender by jtype.

*Crosstabs includes all cases again.
crosstabs gender by jtype.

Tip - Commands with Built-In Filters

Something else you may want to know is that some commands have a built-in filter. These are

REGRESSION,
LOGISTIC REGRESSION,
FACTOR and
DISCRIMINANT.

SPSS Filter Built Into Regression Dialog

The dialog suggests you can filter cases -for this command only- based on just 1 variable. I suspect you can enter more complex conditions on the resulting /SELECT subcommand as well. I haven't tried it.

In any case, I think these built-in filters can be very handy and it kinda puzzles me they're only limited to the 4 aforementioned commands.

Warning - Data Editing with Filter

Most data editing in SPSS is unaffected by filtering. For example, computing means over variables -as shown below- affects all cases, regardless of whatever filter is active. We therefore need DO IF or IF to restrict this transformation to a selection of cases. However, an active filter does affect functions over cases. Some examples that we'll demonstrate below are

adding a case count with AGGREGATE;
computing z-scores for one or many variables;
adding ranks, or percentiles with RANK.

SPSS Data Editing Affected by Filter Examples

*Reactivate female sales filter.
filter by filt_2.

*Not affected by filter: add mean over q1 to q9 to data.
compute mean_1 = mean(q1 to q9).
execute.

*Affected by filter: add case count to data.
aggregate outfile * mode addvariables
/ofreq = n.

*Affected by filter: add z-scores salary to data..
descriptives salary
/save.

*Affected by filter: add median groups salary to data.
rank salary
/ntiles(2) into med_salary.

Result

Right. So that's pretty much all about filtering in SPSS. I hope you found this tutorial helpful and

Thanks for reading!

SPSS FREQUENCIES – Quick Tutorial

SPSS FREQUENCIES command can be used for much more than frequency tables: it's also the easiest way to obtain basic charts such as histograms and bar charts. On top of that, it provides us with percentiles and some other statistics. Plenty of reasons for taking a closer look at this ubiquitous SPSS command. We'll use employees.sav throughout this tutorial.

SPSS FREQUENCIES - Basic Table

The most basic way to use FREQUENCIES is simply generating a frequency table. For example, the frequency table for job_type is obtained by running the following line of SPSS syntax: frequencies job_type.

By default, the rows of this table are sorted ascendingly by value. Note that this may not be obvious when only value labels are displayed. We'll next take a look at different options for sorting the table rows.

SPSS FREQUENCIES - Sort Order

SPSS default sort order of ascendingly be value can be changed by adding a FORMAT subcommand. Possible values are AVALUE and DVALUE (ascending and descending values) or AFREQ and DFREQ (ascending and descending frequencies). For example, the syntax below sorts the rows from the value with highest frequency (yes, that's the mode) through the value with the lowest frequency.

*Frequency table sorted descendingly by frequency.

frequencies job_type
/format dfreq.

SPSS FREQUENCIES - Bar Chart

SPSS FREQUENCIES command is the easiest way to create one or more bar charts for categorical variables. Just add the BARCHART subcommand. Note that you can combine it with a sort order, resulting in the barchart bars being ordered from highest through lowest frequency as shown below.

*Create frequency table and bar chart, both ordered by descending frequencies.

frequencies job_type
/format dfreq
/barchart.

SPSS FREQUENCIES - Pie Chart

An alternative visualization for categorical variables is a pie chart. In order to generate it, simply add a PIECHART subcommand to FREQUENCIES. The syntax below creates a pie chart for education_type.

*Create pie chart for education_type.

frequencies education_type
/piechart.

SPSS FREQUENCIES - Histogram

Frequency tables, bar charts and pie charts can all be used for both metric as well as categorical variables, including string variables. However, they are not useful for metric variables with many distinct values; in this case, tables get too many rows and graphs too many elements.
The ideal way to visualize such variables is a histogram, obtained by the HISTOGRAM subcommand. Apart from that, we can suppress frequency tables by specifying NOTABLE on the FORMAT subcommand. Like so, the syntax below generates a histogram for monthly_income.

*Generate histogram (without table) for monthly_income.

frequencies monthly_income
/format notable
/histogram.

SPSS FREQUENCIES - Percentiles

SPSS FREQUENCIES provides a nice way to obtain percentiles: just add a PERCENTILES subcommand followed by the desired percentiles in parentheses. The syntax below gives an example. Keep in mind that percentiles are not meaningful for nominal variables.

*Calcute 25th., 50th., and 75th. percentiles for monthly_income.

frequencies monthly_income
/format notable
/percentiles (25 50,75).

SPSS FREQUENCIES - Ntiles

Ntiles are easily obtained with SPSS FREQUENCIES: simply add the NTILES subcommand with the number of ntiles behind it in parentheses. If you want to assign cases to ntile groups, use RANK; it creates a new variable holding the ntile for each case on a given variable. Both options are shown in the syntax below.

*1. Inspect ntiles for monthly_income.

frequencies monthly_income
/format notable
/ntiles (5).

*2. Create monthly_income ntile group variable in data.

rank monthly_income/ntiles(5).

SPSS FREQUENCIES - Statistics

SPSS FREQUENCIES can compute all statistics obtained from DESCRIPTIVES plus the median and mode. Note that the statistics table from FREQUENCIES has a different layout with variables in columns and statistics in rows. For obtaining them, add a STATISTICS subcommand. Just as with DESCRIPTIVES, specifying the ALL keyword returns all available statistics.

*Obtain all available statistics from monthly_income.

frequencies monthly_income
/format notable
/statistics all.

SPSS FREQUENCIES - Multiple Variables

Obviously, FREQUENCIES can be run for multiple variables, possibly using TO or ALL. If multiple types of output (frequency table, chart and so on) are generated, you can have them sorted by variable or output type by specifying VARIABLE or ANALYSIS on an ORDER subcommand.

*1. Sort output by variable.

frequencies education_type to job_type
/format dfreq
/barchart
/order variable.

*2. Sort output by output type (first tables for all variables, then charts for all variables).

frequencies education_type to job_type
/format dfreq
/barchart
/order analysis.

SPSS FORMATS – Set Display Format for Variables

SPSS FORMATS sets formats -decimal places, dates, percent signs and more- for numeric variables. Setting variable formats in SPSS does not change your actual data values. However, formats determine how your data are displayed -in the data viewer as well as the output window. Two main uses of FORMATS are

increasing or decreasing the decimal places of standard numeric variables;
displaying date, time and datetime values (consisting of numbers of seconds) as normal dates and times.

So let's try it and see how it works. All examples in this tutorial use employees.sav.

Setting Decimal Places

One of the main uses of FORMATS is setting decimal places for standard numeric variables by specifying their desired f formats. For example, we don't see any decimal places for monthly_income in data view except for the formula bar when we select a value (see screenshot).

Its current format, F4, does not only hide all decimals in data view but affects some of the output as well. We can see this by running the following syntax: descriptives monthly_income.

SPSS FORMATS Syntax Example 1

We'll now set two decimal places for monthly_income and rerun the exact same DESCRIPTIVES command with the syntax below.

*1. Specify two decimal places with FORMATS command.

formats monthly_income(f7.2).

*2. Rerun descriptives.

descriptives monthly_income.

Note how all decimal places in the output table have increased by 2 by changing the variable's format. Apart from that, the F7.2 format also displays 2 decimal places in data view now (see next screenshot). Keep in mind that the actual values don't change in any way by running FORMATS.

Setting Date, Time and Datetime Formats

When new date variables, time variables and datetime variables are created, they may initially hold huge numbers that don't look like dates and time at all. These huge numbers are their actual values in numbers of seconds. These are only shown as normal dates and times after setting their formats appropriately. The syntax below demonstrates how to do so.

SPSS FORMATS Syntax Example 2

*1. Compute date of 50th. birthday for respondents.

compute birthday_50 = datesum(date_of_birth,50,'years').
exe.

*2. Show numbers of seconds as normal date values.

formats birthday_50(date11).

The reason why this is usually necessary is that SPSS date, time and datetime variables are numeric variables. In SPSS, new numeric variables initially have an f format, usually F8.2. Those who really want to know can confirm this by running show format.

Multiple Variables

Formats can be set for multiple variables at once; after FORMATS, specify one or more variable names followed by their format. If desired, the command may continue with more variable names, again followed by their format. The syntax below gives an example.

SPSS FORMATS Syntax Example 3

*Set different formats for different sets of variables at once.

formats education_type to experience_years(f2.1) monthly_income(dollar6) birthday_50(datetime20).