LOOP is a command for running one or many SPSS transformation commands repetitively. SPSS LOOP is often used together with VECTOR. An (often) easier alternative is DO REPEAT.
- There are several ways for looping in SPSS. It depends on the specifics of the situation which one(s) you can use. Note that these options are only available in syntax.
- An option for looping over transformations is the LOOP command. We'll explain it with some examples a bit later in this tutorial.
- A second option for transformations is the DO REPEAT command.
- For looping over procedures, the way to go is Python. For a very basic example, see Regression over Many Dependent Variables.
Example: Replacing Double by Single Spaces
- Say we have data containing sentences. The sentences contain double, triple (and so on) spaces which we'd like to replace by single spaces.
- Test data for this example are created by running the syntax below.
*Create mini test dataset.
data list free/sentence(a45).
begin data
'a b c d e f g h i'
end data.
data list free/sentence(a45).
begin data
'a b c d e f g h i'
end data.
SPSS LOOP - Minimal Specification
- Note that simply replacing double spaces by single ones won't be sufficient. This is because 'new' double spaces may be created by the replacement process if it encounters triple(+) spaces.
- However, if we perform this replacement repeatedly, all double spaces will at some point be gone. The most basic way for doing this is simply putting the replacement in a loop.
- The SPSS LOOP command indicates that subsequent commands should be repeated. Reversely, END LOOP indicates that commands following it do not have to be repeated.
- The syntax below demonstrates the most basic use of LOOP. We'll use REPLACE for removing double spaces.
SPSS LOOP Syntax Example 1
*Wrong way (triggers warning #534) to replace double spaces by single ones.
loop.
compute sentence = replace(sentence,' ',' ').
end loop.
execute.
loop.
compute sentence = replace(sentence,' ',' ').
end loop.
execute.
The LOOP Index Variable
- The preceding syntax example will do its job but it's very inefficient and even raises a warning (#534). This is because nothing tells SPSS to stop looping at some point except for a predefined maximum number of loops.
- A very basic way to circumvent this is to use a loop index variable. This is a variable whose values change over iterations. Like so we can specify exactly how many iterations we'd like over our command(s).
- Assuming our sentence does not hold more than 8 spaces in a row, we'll need to repeat our replace command only 3 times. On the first iteration, 8 or 7 adjacent spaces will become 4 spaces. The second iteration will replace these 4 spaces with two spaces. The 2 spaces will be replaced by a single space on the last iteration.
- For a demonstration, recreate the test data from the first example and try the syntax below.
SPSS LOOP Syntax Example 2
*Replace double spaces by single ones exactly three times.
loop repetition = 1 to 3.
compute sentence = replace(sentence,' ',' ').
end loop.
execute.
loop repetition = 1 to 3.
compute sentence = replace(sentence,' ',' ').
end loop.
execute.
LOOP Index as Scratch Variable
- The example above basically works as follows: the variable 'repetition' takes on the value 1 and the replace command is performed. Next, it takes on the value 2 and the replace command is performed a second time. 'Repetition' becomes 3 and the third iteration takes place.
- Next, 'repetition' becomes 4 but since this exceeds the threshold of 3 that we set, the loop stops and the replace command is not carried out a fourth time.
- Three iterations are exactly enough for the data at hand. However, we do end up with a useless loop index ('repetition') in our data. We could delete it after the loop but a more common solution is to ensure it doesn't show up in the first place.
- This is done by using a scratch variable as the loop index. In a nutshell, just start the variable name with "#" and it won't show up.
- Like so, you could use #repetition instead of repetition. In practice you'll often see #i (i for index) being used as the loop index. However, just
#
is also a valid name for a scratch variable so we'll stick with that. - These points are demonstrated in the syntax below.
SPSS LOOP Syntax Example 3
*Replace double spaces by single ones exactly three times.
loop # = 1 to 3.
compute sentence = replace(sentence,' ',' ').
end loop.
exe.
loop # = 1 to 3.
compute sentence = replace(sentence,' ',' ').
end loop.
exe.
END LOOP IF
- The last syntax example wasn't too bad but it has two problems. First, we need to know in advance how many loops we'll need. This is not always the case. To ensure sufficient iterations, we could simply loop a large number of times but this may slow down the process on large datasets.
- Second, if there are many cases then perhaps some need more iterations than others.
- Both points can be taken into account by dropping the loop index. Instead, we'll end the loop as soon as there's no more double spaces for each case. During each iteration we'll check whether this is the case by using the INDEX function which will return 0 when the double space is not present. The syntax below demonstrates this.
SPSS LOOP Syntax Example 4
*Stop looping when double spaces aren't present anymore.
loop.
compute sentence = replace(sentence,' ',' ').
end loop if char.index(sentence,' ') = 0.
exe.
loop.
compute sentence = replace(sentence,' ',' ').
end loop if char.index(sentence,' ') = 0.
exe.
LOOP IF
- The previous syntax example still has a tiny shortcoming: it will perform the replace command even if no double spaces are present in a sentence at all.
- A more efficient approach is to only start the loop for cases containing at least one double space. So for some cases zero iterations will take place while for others three (or more) iterations may be carried out.
- This is accomplished by using LOOP IF. The condition for looping is the presence of a double space. The syntax below demonstrates this.
SPSS LOOP Syntax Example 5
*Start an iteration if a double space is present.
loop if char.index(sentence,' ') > 0.
compute sentence = replace(sentence,' ',' ').
end loop.
exe.
loop if char.index(sentence,' ') > 0.
compute sentence = replace(sentence,' ',' ').
end loop.
exe.
Using the LOOP Index
- The previous syntax examples using a loop index didn't use this index within the commands that were repeated. It merely indicated a fixed number of repetitions for each case.
- However, it's common that the index itself is used within the loop as well. Over the iterations, the index is replaced by each of the numbers that's being looped over.
- This is demonstrated in the syntax below (using different test data than the previous examples). It will count the occurrence of the letter 'e' in each name. For each case the number of iterations is equal to the number of letters in their name.
- If you're unfamiliar with the string functions used in the example, see our SPSS String Variables Tutorial.
SPSS LOOP Syntax Example 6
*1. Create mini test dataset.
data list free/name(a10).
begin data
Anneke Martin Stefan
end data.
*2. Count occurrence of 'e' by looping through letters in name.
compute count_e = 0.
loop # = 1 to char.length(name).
if char.substr(name,#,1) = 'e' count_e = count_e + 1.
end loop.
exe.
data list free/name(a10).
begin data
Anneke Martin Stefan
end data.
*2. Count occurrence of 'e' by looping through letters in name.
compute count_e = 0.
loop # = 1 to char.length(name).
if char.substr(name,#,1) = 'e' count_e = count_e + 1.
end loop.
exe.
The BY Keyword
- When a loop index is used, it may increment/decrement in steps smaller or larger than one. This is specified by the BY keyword.
- For instance, 3 TO 12 BY 3 increments from 3 through 12 by steps of 3. It thus returns 3, 6, 9 and 12.
- When combined with VECTOR, this can be used to compute means over groups over variables. Like so, the final syntax example calculates means over (v1, v2, v3), (v4, v5, v6) and so on.
SPSS LOOP Syntax Example 7
*1. Create mini test dataset.
data list free/v1 to v12 (12f1.0).
begin data
0 0 0 0 0 1 0 1 1 1 1 1
end data.
*2. Compute 4 sums, each over 3 adjacent variables.
vector v = v1 to v12 / s(4).
loop # = 3 to 12 by 3.
compute s(# / 3) = sum(v(#),v(# - 1),v(# - 2)).
end loop.
exe.
data list free/v1 to v12 (12f1.0).
begin data
0 0 0 0 0 1 0 1 1 1 1 1
end data.
*2. Compute 4 sums, each over 3 adjacent variables.
vector v = v1 to v12 / s(4).
loop # = 3 to 12 by 3.
compute s(# / 3) = sum(v(#),v(# - 1),v(# - 2)).
end loop.
exe.
THIS TUTORIAL HAS 19 COMMENTS:
By Ruben Geert van den Berg on June 21st, 2016
Hi Young!
Thanks for your comment! The double space in the syntax wasn't shown because it's in HTML that's been fixed now.
The warning is supposed to be triggered as it says right after the example: “The preceding syntax example will do its job but it's very inefficient and even raises a warning (#534).”. It would have been better to announce it before running the syntax so that's been fixed as well.
Hope that helps!
By Jon on November 28th, 2017
Thanks for great tutorials! In my web browser (IE 11, windows) the double spaces in your example don't show properly. Took me some time to figure out why your syntax didn't work. Now I've added the "missing" double space and it runs perfectly.
By Linh on May 8th, 2018
Thanks for your detailed instruction.
I ran the Syntax below but it doesn't work. Supposed that the String_Pattern should be double space. I have changed and practiced smoothly then.
------------------
loop repetition = 1 to 3.
compute sentence = replace(sentence,' ',' ').
end loop.
exe.
------------------
By Ruben Geert van den Berg on May 8th, 2018
Hi Linh, thanks for your comment!
You're right. The problem is that web browsers show multiple consecutive spaces as 1 space. However, I'll fix the error this afternoon with a special trick.
Thanks for letting me know!