LOOP is a command for running one or many SPSS transformation commands repetitively. SPSS LOOP is often used together with VECTOR. An (often) easier alternative is DO REPEAT.
- There are several ways for looping in SPSS. It depends on the specifics of the situation which one(s) you can use. Note that these options are only available in syntax.
- An option for looping over transformations is the LOOP command. We'll explain it with some examples a bit later in this tutorial.
- A second option for transformations is the DO REPEAT command.
- For looping over procedures, the way to go is Python. For a very basic example, see Regression over Many Dependent Variables.
Example: Replacing Double by Single Spaces
- Say we have data containing sentences. The sentences contain double, triple (and so on) spaces which we'd like to replace by single spaces.
- Test data for this example are created by running the syntax below.
*Create mini test dataset.
data list free/sentence(a45).
begin data
'a b c d e f g h i'
end data.
data list free/sentence(a45).
begin data
'a b c d e f g h i'
end data.
SPSS LOOP - Minimal Specification
- Note that simply replacing double spaces by single ones won't be sufficient. This is because 'new' double spaces may be created by the replacement process if it encounters triple(+) spaces.
- However, if we perform this replacement repeatedly, all double spaces will at some point be gone. The most basic way for doing this is simply putting the replacement in a loop.
- The SPSS LOOP command indicates that subsequent commands should be repeated. Reversely, END LOOP indicates that commands following it do not have to be repeated.
- The syntax below demonstrates the most basic use of LOOP. We'll use REPLACE for removing double spaces.
SPSS LOOP Syntax Example 1
*Wrong way (triggers warning #534) to replace double spaces by single ones.
loop.
compute sentence = replace(sentence,' ',' ').
end loop.
execute.
loop.
compute sentence = replace(sentence,' ',' ').
end loop.
execute.
The LOOP Index Variable
- The preceding syntax example will do its job but it's very inefficient and even raises a warning (#534). This is because nothing tells SPSS to stop looping at some point except for a predefined maximum number of loops.
- A very basic way to circumvent this is to use a loop index variable. This is a variable whose values change over iterations. Like so we can specify exactly how many iterations we'd like over our command(s).
- Assuming our sentence does not hold more than 8 spaces in a row, we'll need to repeat our replace command only 3 times. On the first iteration, 8 or 7 adjacent spaces will become 4 spaces. The second iteration will replace these 4 spaces with two spaces. The 2 spaces will be replaced by a single space on the last iteration.
- For a demonstration, recreate the test data from the first example and try the syntax below.
SPSS LOOP Syntax Example 2
*Replace double spaces by single ones exactly three times.
loop repetition = 1 to 3.
compute sentence = replace(sentence,' ',' ').
end loop.
execute.
loop repetition = 1 to 3.
compute sentence = replace(sentence,' ',' ').
end loop.
execute.
LOOP Index as Scratch Variable
- The example above basically works as follows: the variable 'repetition' takes on the value 1 and the replace command is performed. Next, it takes on the value 2 and the replace command is performed a second time. 'Repetition' becomes 3 and the third iteration takes place.
- Next, 'repetition' becomes 4 but since this exceeds the threshold of 3 that we set, the loop stops and the replace command is not carried out a fourth time.
- Three iterations are exactly enough for the data at hand. However, we do end up with a useless loop index ('repetition') in our data. We could delete it after the loop but a more common solution is to ensure it doesn't show up in the first place.
- This is done by using a scratch variable as the loop index. In a nutshell, just start the variable name with "#" and it won't show up.
- Like so, you could use #repetition instead of repetition. In practice you'll often see #i (i for index) being used as the loop index. However, just
#
is also a valid name for a scratch variable so we'll stick with that. - These points are demonstrated in the syntax below.
SPSS LOOP Syntax Example 3
*Replace double spaces by single ones exactly three times.
loop # = 1 to 3.
compute sentence = replace(sentence,' ',' ').
end loop.
exe.
loop # = 1 to 3.
compute sentence = replace(sentence,' ',' ').
end loop.
exe.
END LOOP IF
- The last syntax example wasn't too bad but it has two problems. First, we need to know in advance how many loops we'll need. This is not always the case. To ensure sufficient iterations, we could simply loop a large number of times but this may slow down the process on large datasets.
- Second, if there are many cases then perhaps some need more iterations than others.
- Both points can be taken into account by dropping the loop index. Instead, we'll end the loop as soon as there's no more double spaces for each case. During each iteration we'll check whether this is the case by using the INDEX function which will return 0 when the double space is not present. The syntax below demonstrates this.
SPSS LOOP Syntax Example 4
*Stop looping when double spaces aren't present anymore.
loop.
compute sentence = replace(sentence,' ',' ').
end loop if char.index(sentence,' ') = 0.
exe.
loop.
compute sentence = replace(sentence,' ',' ').
end loop if char.index(sentence,' ') = 0.
exe.
LOOP IF
- The previous syntax example still has a tiny shortcoming: it will perform the replace command even if no double spaces are present in a sentence at all.
- A more efficient approach is to only start the loop for cases containing at least one double space. So for some cases zero iterations will take place while for others three (or more) iterations may be carried out.
- This is accomplished by using LOOP IF. The condition for looping is the presence of a double space. The syntax below demonstrates this.
SPSS LOOP Syntax Example 5
*Start an iteration if a double space is present.
loop if char.index(sentence,' ') > 0.
compute sentence = replace(sentence,' ',' ').
end loop.
exe.
loop if char.index(sentence,' ') > 0.
compute sentence = replace(sentence,' ',' ').
end loop.
exe.
Using the LOOP Index
- The previous syntax examples using a loop index didn't use this index within the commands that were repeated. It merely indicated a fixed number of repetitions for each case.
- However, it's common that the index itself is used within the loop as well. Over the iterations, the index is replaced by each of the numbers that's being looped over.
- This is demonstrated in the syntax below (using different test data than the previous examples). It will count the occurrence of the letter 'e' in each name. For each case the number of iterations is equal to the number of letters in their name.
- If you're unfamiliar with the string functions used in the example, see our SPSS String Variables Tutorial.
SPSS LOOP Syntax Example 6
*1. Create mini test dataset.
data list free/name(a10).
begin data
Anneke Martin Stefan
end data.
*2. Count occurrence of 'e' by looping through letters in name.
compute count_e = 0.
loop # = 1 to char.length(name).
if char.substr(name,#,1) = 'e' count_e = count_e + 1.
end loop.
exe.
data list free/name(a10).
begin data
Anneke Martin Stefan
end data.
*2. Count occurrence of 'e' by looping through letters in name.
compute count_e = 0.
loop # = 1 to char.length(name).
if char.substr(name,#,1) = 'e' count_e = count_e + 1.
end loop.
exe.
The BY Keyword
- When a loop index is used, it may increment/decrement in steps smaller or larger than one. This is specified by the BY keyword.
- For instance, 3 TO 12 BY 3 increments from 3 through 12 by steps of 3. It thus returns 3, 6, 9 and 12.
- When combined with VECTOR, this can be used to compute means over groups over variables. Like so, the final syntax example calculates means over (v1, v2, v3), (v4, v5, v6) and so on.
SPSS LOOP Syntax Example 7
*1. Create mini test dataset.
data list free/v1 to v12 (12f1.0).
begin data
0 0 0 0 0 1 0 1 1 1 1 1
end data.
*2. Compute 4 sums, each over 3 adjacent variables.
vector v = v1 to v12 / s(4).
loop # = 3 to 12 by 3.
compute s(# / 3) = sum(v(#),v(# - 1),v(# - 2)).
end loop.
exe.
data list free/v1 to v12 (12f1.0).
begin data
0 0 0 0 0 1 0 1 1 1 1 1
end data.
*2. Compute 4 sums, each over 3 adjacent variables.
vector v = v1 to v12 / s(4).
loop # = 3 to 12 by 3.
compute s(# / 3) = sum(v(#),v(# - 1),v(# - 2)).
end loop.
exe.
THIS TUTORIAL HAS 19 COMMENTS:
By Oliver on March 10th, 2016
Hi,
first of all, thank you for your neice tutorials. I've learned a lot!
Today I was searching for an opportunity to make things easy using LOOP. I have a bunch of variables with missing data and want to replace them with SMEAN. Sadly there are 50 subgroups and each subgroup has its specific mean. I started like that:
DO IF (VarA = 1).
RMV /rVAR1=SMEAN(VAR1).
RMV /rVAR2=SMEAN(VAR2).
RMV /rVAR3=SMEAN(VAR3).
END IF.
EXE.
DO IF (VarA = 2).
RMV /rVAR1=SMEAN(VAR1).
RMV /rVAR2=SMEAN(VAR2).
RMV /rVAR3=SMEAN(VAR3).
END IF.
EXE.
[...]
Is LOOP an option in thise case?
Maybe like (not working yet):
LOOP (VarA = i to 50).
RMV /rVAR1=SMEAN(VAR1).
RMV /rVAR2=SMEAN(VAR2).
RMV /rVAR3=SMEAN(VAR3).
END LOOP.
EXE.
Thank you in advance.
Best,
Oliver
By Ruben Geert van den Berg on March 11th, 2016
Hi Oliver!
If you consult the CSR on RMV, the first thing you'll see is that
"This command reads the active dataset..."
This implies that RMV is a procedure and you can use only Transformations in LOOP and DO REPEAT. So that's not going to work.
However, you can accomplish an identical result by computing the variable means per group with a single AGGREGATE command with
MODE ADDVARIABLES
. Use the TO keyword for specifying the old and new variables.Now you can use IF for replacing the missing values by the group means. And because
IF
is a tranformation, you can now loop over your variables withDO REPEAT
with something likedo repeat #vars = v1 to v50/#means = mv1 to mv50.
if(missing(#vars)) #vars = #means.
end repeat.
execute.
By Oliver on March 11th, 2016
Hi Ruben,
thank you for your quick reply. Sadly I didn't understand how to replace the placeholder in your example.
Again, this is my case (not working):
DO IF (VarA = 1).
RMV /rVAR1=SMEAN(VAR1).
RMV /rVAR2=SMEAN(VAR2).
[...]
RMV /rVAR50=SMEAN(VAR50).
END IF.
EXE.
DO IF (VarA = 2).
RMV /rVAR1=SMEAN(VAR1).
RMV /rVAR2=SMEAN(VAR2).
[...]
RMV /rVAR3=SMEAN(VAR3).
END IF.
EXE.
[...]
DO IF (VarA = 35).
RMV /rVAR1=SMEAN(VAR1).
RMV /rVAR2=SMEAN(VAR2).
[...]
RMV /rVAR3=SMEAN(VAR3).
END IF.
EXE.
I understand, that this isn't just an awful bunch of work, furthermore it won't work, because of RMV is a procedure. So I've tried:
do repeat #vars = VAR1 to VAR50/#means = mVAR1 to mVAR50.
if(missing(#vars)) #vars = #means.
end repeat.
exe.
But where will I put the filter VarA = 1, VarA = 2 [...] VarA = 35?
By Ruben Geert van den Berg on March 11th, 2016
Hi Oliver!
Right, say you've 50 variables called var1 to var50 and they're adjancent in your data. Groups are indicated by a variable called "group".
We first create the 50 variable means for groups separately by
aggregate outfile * mode addvariables
/break group
/mvar1 to mvar50 = mean(var1 to var50).
We then loop over the original variables and the 50 new variables (holding group means) simultaneously with
do repeat #old = var1 to var50 / #new = mvar1 to mvar50.
if (missing(#old)) #old = #new.
end repeat.
execute.
Does that make a bit more sense to you?
By Oliver on March 14th, 2016
Hi Ruben,
that makes definitely sense to me. It's just working perfekt and saved hours of recoding.
Thank you very much!