SPSS TUTORIALS BASICS ANOVA REGRESSION FACTOR CORRELATION

# SPSS IF – A Quick Tutorial

In SPSS, IF computes a new or existing variable
for a selection of cases.
For analyzing a selection of cases, use FILTER or SELECT IF instead.

## Data File Used for Examples

All examples use bank.sav, a short survey of bank employees. Part of the data are shown below. For getting the most out of this tutorial, we recommend you download the file and try the examples for yourself.

## Example 1 - Flag Cases Based on Date Function

Let's flag all respondents born during the 80’s. The syntax below first computes our flag variable -born80s- as a column of zeroes. We then set it to one if the year -extracted from the date of birth- is in the RANGE 1980 through 1989.

*Create new variable holding only zeroes.
compute born80s = 0.

*Set value to 1 if respondent born between 1980 and 1989.
if(range(xdate.year(dob),1980,1989)) born80s = 1.
execute.

add value labels born80s 0 'Not born during 80s' 1 'Born during 80s'.

## Example 2 - Replace Range of Values by Function

Next, if we'd run a histogram on weekly working hours -whours- we'd see values of 160 hours and over. However, weeks only hold (24 * 7 =) 168 hours. Even Kim Jong Un wouldn't claim he works 160 hours per week!
We assume these respondents filled out their monthly -rather than weekly- working hours. On average, months hold (52 / 12 =) 4.33 weeks. So we'll divide weekly hours by 4.33 but only for cases scoring 160 or over.

*Sort cases descendingly on weekly hours.
sort cases by whours (d).

*Divide 160 or more hours by 4.33 (average weeks per month).
if(whours >= 160) whours = whours / 4.33.
execute.

## Note

We could have done this correction with RECODE as well: RECODE whours (160 = 36.95)(180 = 41.57). Note, however, that RECODE becomes tedious insofar as we must correct more distinct values. It works reasonably for this variable but IF works great for all variables.

## Example 3 - Compute Variable Differently Based on Gender

We'll now flag cases who work fulltime. However, “fulltime” means 40 hours for male employees and 36 hours for female employees. So we need to use different formulas based on gender. The IF command below does just that.

*Compute fulltime holding only zeroes.
compute fulltime = 0.

*Set fulltime to 1 if whours >= 36 for females or whours >= 40 for males.
if(gender = 0 & whours >= 36) fulltime = 1.
if(gender = 1 & whours >= 40) fulltime = 1.

add value labels fulltime 0 'Not working fulltime' 1 'Working fulltime'.

*Quick check.
means whours by gender by fulltime
/cells min max mean stddev.

## Result

Our syntax ends with a MEANS table showing minima, maxima, means and standard deviations per gender per group. This table -shown below- is a nice way to check the results.

The maximum for females not working fulltime is below 36. The minimum for females working fulltime is 36. And so on.

## SPSS IF Versus DO IF

Some SPSS users may be familiar with DO IF. The main differences between DO IF and IF are that

• IF is a single line command while DO IF requires at least 3 lines: DO IF, some transformation(s) and END IF.
• IF is a conditional COMPUTE command whereas DO IF can affect other transformations -such as RECODE or COUNT- as well.
• If cases meet more than 1 condition, the first condition prevails when using DO IF - ELSE IF. If you use multiple IF commands instead, the last condition met by each case takes effect. The syntax below sketches this idea.

## DO IF - ELSE IF Versus Multiple IF Commands

*DO IF: respondents meeting both conditions get result_1.
do if(condition_1).
result_1.
else if(condition_2). /*excludes cases meeting condition_1.
result_2.
end if.

*IF: respondents meeting both conditions get result_2.
if(condition_1) result_1.
if(condition_2) result_2. /*includes cases meeting condition_1.

## SPSS IF Versus RECODE

In many cases, RECODE is an easier alternative for IF. However, RECODE has more limitations too.
First off, RECODE only replaces (ranges of) constants -such as 0, 99 or system missing values- by other constants. So something like recode overall (sysmis = q1). is not possible -q1 is a variable, not a constant- but if(sysmis(overall)) overall = q1. works fine. You can't RECODE a function -mean, sum or whatever- into anything nor recode anything into a function. You'll need IF for doing so.

Second, RECODE can only set values based on a single variable. This is the reason why you can't recode 2 variables into one but you can use an IF condition involving multiple variables: if(gender = 0 & whours >= 36) fulltime = 1. is perfectly possible.

You can get around this limitation by combining RECODE with DO IF, however. Like so, our last example shows a different route to flag fulltime working males and females using different criteria.

## Example 4 - Compute Variable Differently Based on Gender II

*Recode whours into fulltime for everyone.
recode whours (40 thru hi = 1)(else = 0) into fulltime2.

*Apply different recode for female respondents.
do if(gender = 0).
recode whours (36 thru hi = 1)(else = 0) into fulltime2.
end if.

add value labels fulltime2 0 'Not working fulltime' 1 'Working fulltime'.

*Quick check.
means whours by gender by fulltime2
/cells min max mean stddev.

## Final Notes

This tutorial presented a brief discussion of the IF command with a couple of examples. I hope you found them helpful. If I missed anything essential, please throw me a comment below.

# Tell us what you think!

*Required field. Your comment will show up after approval from a moderator.

# THIS TUTORIAL HAS 6 COMMENTS:

• ### By Jon Peck on December 8th, 2022

One thing that often confuses users of conditional statements such as IF and DO IF is the logic of comparisons with SYSMIS. Such a comparison is neither true nor false, so it has to be handled with a special function where relevant. Even ELSE does not catch it. SYSMIS is not equal to SYSMIS!

• ### By Ruben Geert van den Berg on December 8th, 2022

Hi Jon!

First off, I think SYSMIS and ELSE only apply to RECODE, not to (DO) IF.

Also, I do believe they actually do capture system missing values (see examples below).

What really surprises me is that IF(ID = \$SYSMIS) ... does not seem to capture system missings. I could have sworn this should work?!

I also think it's counter intuitive that LO THRU HI captures user missing values.

Obviously, such issues can be circumvented by using stuff like

RECODE (MISSING = COPY)(....)

and

IF(MISSING(ID)) ....

but I'm sure you're well aware of that.

data list free/id.
begin data
1 2 3 4 5 6 ''
end data.

missing values id (6).

compute v01 = 0.
if(id = \$sysmis) v01 = 1.

recode id (else = 1) into v02.

recode id (lo thru hi = 1)(else = 0) into v03.

recode id (sysmis = 1)(else = 0) into v04.

execute.

• ### By Jon K Peck on December 8th, 2022

SYSMIS and ELSE do apply to DO IF. SYSMIS is an issue with any logical condition. ELSE is the final catchall for a DO IF loop, but it doesn't capture SYSMIS.

Think of SYSMIS like infinity. Mathematically, infinity does not equal infinity. SYSMIS was really intended for situations like division by zero, but because it can so easily be used without declaring variable missing values, people often use it for just missing. That's not good practice.

The MISSING, VALUE, and SYSMIS functions give the user control over missing value logic.

LO and HIGH in RECODE do include user missing values, but transformations overall treat missing conditions a little differently from procedures, because they manipulate values.

• ### By Ruben Geert van den Berg on December 9th, 2022

Ah, of course, you're right:

There's the SYSMIS function (besides the SYSMIS keyword in RECODE).

And indeed you could make a DO IF - ELSE - END IF clause (which I hardly ever use). Again, that's a different ELSE than the RECODE keyword.

I totally agree that having (or even creating!) system missing values is never a really good practice. My simple argument is that they're usually treated the same as user missings except that you can't apply any value label to them which clarifies why they are missing.

P.s. technically, a system missing value is truly just a (very unlikely) double-precision floating point number that's merely displayed as a dot, isn't it? There's a SYSMIS subcommand in SHOW that's puzzled me for a while...

• ### By Jon K Peck on December 9th, 2022

Sysmiss is conceptually something that is impossible as a number. The IEEE floating point standard and modern hardware actually implement it that way, but the SPSS implementation is a carefully chosen floating point number that would never occur, since it predates that standard. It is carefully constructed a specific hexadecimal constant in order to ensure that compilers can't compile it slightly differently.

Sysmis is treated differently from user missing. For example, those values can never be included as valid in procedures while user missing values usually can.

I am surprised that you don't use DO IF/ELSE IF/ELSE much. I find that often to be better. One could just write a bunch of IF statements that accomplish the same thing, but a DO IF construct is much easier to understand when there are multiple conditions, and it can be slightly faster, although the difference would rarely matter.