In SPSS, IF
is a conditional COMPUTE
command. It calculates a (possibly new) variable but only for those cases that satisfy some condition(s). This tutorial walks you through some typical examples of the IF
command.
Example 1 - Replace Missing Values
With the syntax below we'll first create some test data. Next we'll set the existing variable score
to 100
for all respondents (only one in this case) having a missing value on score
. An alternative here is RECODE score (missing = 100)
. The effect becomes visible after sorting the cases in a more conventient way.This is because IF
is technically a transformation.
SPSS IF Syntax Example 1
data list free/gender score.
begin data
0 80 1 85 0 90 1 95 0 '' 1 105 0 110 1 115
end data.
*2. Replace missing value with 100.
if missing(score) score = 100.
*3. Sort cases.
sort cases gender.
Example 2 - Score Groups
Next, we'll create score groups. Respondents scoring under 100 points get a 1
(‘low score’). The others get a 2
(‘high score’). We'll demonstrate three ways to do so. The third may seem a little weird. It's explained in Compute A = B = C.
SPSS IF Syntax Example 2
if score lt 100 group_a = 1.
if score ge 100 group_a = 2.
exe.
*2. Create score groups option 2.
recode score (100 thru hi = 2) (else = 1) into group_b.
exe.
*3. Create score groups option 3.
compute group_c = (score ge 100) + 1.
exe.
Example 3 - Gender-Score Groups
Now we'll create score groups for female and male respondents separately. At this point we can't use a simple RECODE
anymore. This is because the conditions now involve two variables, gender
and score
. A simple approach here is using four IF
statements. Each holds two conditions (gender and score). A faster but more difficult equivalent here is a single COMPUTE
command.
SPSS IF Syntax Example 3
if score lt 100 and gender eq 0 group_d = 1.
if score ge 100 and gender eq 0 group_d = 2.
if score lt 100 and gender eq 1 group_d = 3.
if score ge 100 and gender eq 1 group_d = 4.
exe.
*2. Gender-score groups option 2.
compute group_e = 2 * gender + (score ge 100) + 1.
exe.
Difference Between IF and DO IF
Very similar to the IF
commands we showed is DO IF-ELSE IF-END IF
. Apart from the latter usually requiring more syntax, there's an important difference between the two. This occurs when conditions are not mutually exclusive. This means that a single case may satisfy two or more conditions simultaneously. In this case, the following happens
- With
IF
the last condition that holds prevails. SinceIF
statements are completely separate commands, later ones simply overwrite the results of previous ones. - With
DO IF-ELSE IF-END IF
the first condition that holds prevails. The trick is inELSE IF
. The “ELSE” here means “if the preceding condition(s) don't hold, only then...”
The final syntax example demonstrates this difference between IF
and DO IF-ELSE IF-END IF
.
SPSS IF Syntax Example 4
compute group_f = 1.
do if score ge 100.
compute group_f = 3.
else if score ge 90.
compute group_f = 2.
end if.
*2. Sort cases.
sort cases score.
*3. Equivalent IF statements don't work.
compute group_g = 1.
if score ge 100 group_g = 3.
if score ge 90 group_g = 2.
exe.
This tutorial has 36 comments
By Ruben Geert van den Berg on March 24th, 2016
Hi Miranda!
I'm not sure what exactly you're looking for but two options that come to mind are 1) avoiding any COMPUTE commands as they set values for all cases or 2) first compute new values and convert to missing if any missings are present on input variables. I'll add a mini syntax example below for showing both approaches.
*Mini test data.
data list free/perp vict.
begin data
1 1 2 1 "" 1 1 2 2 2 "" 2 1 "" 2 "" "" ""
end data.
value labels perp vict 1 'Black' 2 'White'.
*Option 1: avoid COMPUTE.
if perp = 1 and vict = 1 comb = 1.
if perp = 1 and vict = 2 comb = 2.
if perp = 2 and vict = 1 comb = 3.
if perp = 2 and vict = 2 comb = 4.
execute.
*Option 2: set system missings afterwards for cases having at least one missing value on input variables.
compute test = 1.
if(nvalid(perp,vict) < 2) test = $sysmis. execute.
By Miranda on March 24th, 2016
So I want to create a series of dummy variables that tells me whether a perpetrator and a victim are both white, both black, white perp/black vic, or black perp/white vic.
I ran the syntax as stated above, and I have valid positive scores. Unfortunately, all my missing variables became 0's, so I have significantly more 0's than I should.
Is there a way to make sure that the new variable counts cases where there are not valid scores for both original variables as missing rather than 0?
By Ruben Geert van den Berg on March 18th, 2016
Hi Suci!
You need one extra step. Like COMPUTE and RECODE, IF is a horizontal function (over variables, for each case separately). However, computing percentiles is a vertical function (over cases, for each variable separately).
Most vertical functions (sums and means over cases) are done with AGGREGATE. However, for percentiles you need RANK:
rank x/percent.
returns percentiles for x in a new variable called Px (P for percentile and x for x). For the final step you can use RANGE:
compute flag = 0.
if (range(Px,0.5,99.5)) flag = 1.
Or faster
compute flag = range(Px,0.5,99.5).
By Suci on March 17th, 2016
Hi, i want to ask, i want to compute new variabel with command IF where CUT=1 if X is between 0.5 percentile of X and 99.5 percentile of X. And CUT=0 if X is below 0.5 percentile and greater than 99.5 percentile. Can we use IF syntax? What is the syntax for percentiles then? Thank you.
By lion on November 24th, 2015
Hi Ruben,
Great work! it helped me a lot