LOOP is a command for running one or many SPSS transformation commands repetitively. SPSS LOOP is often used together with VECTOR. An (often) easier alternative is DO REPEAT.
- There are several ways for looping in SPSS. It depends on the specifics of the situation which one(s) you can use. Note that these options are only available in syntax.
- An option for looping over transformations is the LOOP command. We'll explain it with some examples a bit later in this tutorial.
- A second option for transformations is the DO REPEAT command.
- For looping over procedures, the way to go is Python. For a very basic example, see Regression over Many Dependent Variables.
Example: Replacing Double by Single Spaces
- Say we have data containing sentences. The sentences contain double, triple (and so on) spaces which we'd like to replace by single spaces.
- Test data for this example are created by running the syntax below.
*Create mini test dataset.
data list free/sentence(a45).
begin data
'a b c d e f g h i'
end data.
data list free/sentence(a45).
begin data
'a b c d e f g h i'
end data.
SPSS LOOP - Minimal Specification
- Note that simply replacing double spaces by single ones won't be sufficient. This is because 'new' double spaces may be created by the replacement process if it encounters triple(+) spaces.
- However, if we perform this replacement repeatedly, all double spaces will at some point be gone. The most basic way for doing this is simply putting the replacement in a loop.
- The SPSS LOOP command indicates that subsequent commands should be repeated. Reversely, END LOOP indicates that commands following it do not have to be repeated.
- The syntax below demonstrates the most basic use of LOOP. We'll use REPLACE for removing double spaces.
SPSS LOOP Syntax Example 1
*Wrong way (triggers warning #534) to replace double spaces by single ones.
loop.
compute sentence = replace(sentence,' ',' ').
end loop.
execute.
loop.
compute sentence = replace(sentence,' ',' ').
end loop.
execute.
The LOOP Index Variable
- The preceding syntax example will do its job but it's very inefficient and even raises a warning (#534). This is because nothing tells SPSS to stop looping at some point except for a predefined maximum number of loops.
- A very basic way to circumvent this is to use a loop index variable. This is a variable whose values change over iterations. Like so we can specify exactly how many iterations we'd like over our command(s).
- Assuming our sentence does not hold more than 8 spaces in a row, we'll need to repeat our replace command only 3 times. On the first iteration, 8 or 7 adjacent spaces will become 4 spaces. The second iteration will replace these 4 spaces with two spaces. The 2 spaces will be replaced by a single space on the last iteration.
- For a demonstration, recreate the test data from the first example and try the syntax below.
SPSS LOOP Syntax Example 2
*Replace double spaces by single ones exactly three times.
loop repetition = 1 to 3.
compute sentence = replace(sentence,' ',' ').
end loop.
execute.
loop repetition = 1 to 3.
compute sentence = replace(sentence,' ',' ').
end loop.
execute.
LOOP Index as Scratch Variable
- The example above basically works as follows: the variable 'repetition' takes on the value 1 and the replace command is performed. Next, it takes on the value 2 and the replace command is performed a second time. 'Repetition' becomes 3 and the third iteration takes place.
- Next, 'repetition' becomes 4 but since this exceeds the threshold of 3 that we set, the loop stops and the replace command is not carried out a fourth time.
- Three iterations are exactly enough for the data at hand. However, we do end up with a useless loop index ('repetition') in our data. We could delete it after the loop but a more common solution is to ensure it doesn't show up in the first place.
- This is done by using a scratch variable as the loop index. In a nutshell, just start the variable name with "#" and it won't show up.
- Like so, you could use #repetition instead of repetition. In practice you'll often see #i (i for index) being used as the loop index. However, just
#
is also a valid name for a scratch variable so we'll stick with that. - These points are demonstrated in the syntax below.
SPSS LOOP Syntax Example 3
*Replace double spaces by single ones exactly three times.
loop # = 1 to 3.
compute sentence = replace(sentence,' ',' ').
end loop.
exe.
loop # = 1 to 3.
compute sentence = replace(sentence,' ',' ').
end loop.
exe.
END LOOP IF
- The last syntax example wasn't too bad but it has two problems. First, we need to know in advance how many loops we'll need. This is not always the case. To ensure sufficient iterations, we could simply loop a large number of times but this may slow down the process on large datasets.
- Second, if there are many cases then perhaps some need more iterations than others.
- Both points can be taken into account by dropping the loop index. Instead, we'll end the loop as soon as there's no more double spaces for each case. During each iteration we'll check whether this is the case by using the INDEX function which will return 0 when the double space is not present. The syntax below demonstrates this.
SPSS LOOP Syntax Example 4
*Stop looping when double spaces aren't present anymore.
loop.
compute sentence = replace(sentence,' ',' ').
end loop if char.index(sentence,' ') = 0.
exe.
loop.
compute sentence = replace(sentence,' ',' ').
end loop if char.index(sentence,' ') = 0.
exe.
LOOP IF
- The previous syntax example still has a tiny shortcoming: it will perform the replace command even if no double spaces are present in a sentence at all.
- A more efficient approach is to only start the loop for cases containing at least one double space. So for some cases zero iterations will take place while for others three (or more) iterations may be carried out.
- This is accomplished by using LOOP IF. The condition for looping is the presence of a double space. The syntax below demonstrates this.
SPSS LOOP Syntax Example 5
*Start an iteration if a double space is present.
loop if char.index(sentence,' ') > 0.
compute sentence = replace(sentence,' ',' ').
end loop.
exe.
loop if char.index(sentence,' ') > 0.
compute sentence = replace(sentence,' ',' ').
end loop.
exe.
Using the LOOP Index
- The previous syntax examples using a loop index didn't use this index within the commands that were repeated. It merely indicated a fixed number of repetitions for each case.
- However, it's common that the index itself is used within the loop as well. Over the iterations, the index is replaced by each of the numbers that's being looped over.
- This is demonstrated in the syntax below (using different test data than the previous examples). It will count the occurrence of the letter 'e' in each name. For each case the number of iterations is equal to the number of letters in their name.
- If you're unfamiliar with the string functions used in the example, see our SPSS String Variables Tutorial.
SPSS LOOP Syntax Example 6
*1. Create mini test dataset.
data list free/name(a10).
begin data
Anneke Martin Stefan
end data.
*2. Count occurrence of 'e' by looping through letters in name.
compute count_e = 0.
loop # = 1 to char.length(name).
if char.substr(name,#,1) = 'e' count_e = count_e + 1.
end loop.
exe.
data list free/name(a10).
begin data
Anneke Martin Stefan
end data.
*2. Count occurrence of 'e' by looping through letters in name.
compute count_e = 0.
loop # = 1 to char.length(name).
if char.substr(name,#,1) = 'e' count_e = count_e + 1.
end loop.
exe.
The BY Keyword
- When a loop index is used, it may increment/decrement in steps smaller or larger than one. This is specified by the BY keyword.
- For instance, 3 TO 12 BY 3 increments from 3 through 12 by steps of 3. It thus returns 3, 6, 9 and 12.
- When combined with VECTOR, this can be used to compute means over groups over variables. Like so, the final syntax example calculates means over (v1, v2, v3), (v4, v5, v6) and so on.
SPSS LOOP Syntax Example 7
*1. Create mini test dataset.
data list free/v1 to v12 (12f1.0).
begin data
0 0 0 0 0 1 0 1 1 1 1 1
end data.
*2. Compute 4 sums, each over 3 adjacent variables.
vector v = v1 to v12 / s(4).
loop # = 3 to 12 by 3.
compute s(# / 3) = sum(v(#),v(# - 1),v(# - 2)).
end loop.
exe.
data list free/v1 to v12 (12f1.0).
begin data
0 0 0 0 0 1 0 1 1 1 1 1
end data.
*2. Compute 4 sums, each over 3 adjacent variables.
vector v = v1 to v12 / s(4).
loop # = 3 to 12 by 3.
compute s(# / 3) = sum(v(#),v(# - 1),v(# - 2)).
end loop.
exe.
THIS TUTORIAL HAS 19 COMMENTS:
By Valentina on March 21st, 2016
hi!
I'd like to add value labels to numeric variable in a loop (for 1 variable). For example, 2 should be labelled as '2 cigarettes', 3 labelled as '3 cigarettes', etc. (Data isnt sorted by order or anyway).
It should be like - for each line of this variable with value #, I attach variable label '# cigarettes'.
*preferrably without python coding if possible, the simpliest.
Maybe with the help of other string variable which concatenates # and cigarettes, and has values '# cigarettes'.
Looking forward to your reply!
By Ruben Geert van den Berg on March 21st, 2016
Hi Valentina!
You can't do it with LOOP or DO REPEAT because the ADD VALUE LABELS commands you need aren't transformations.
The only reasonable way for doing it without Python is creating a new string variable. Use a concatenation for creating values such as
01 cigarettes
02 cigarettes
If there's numbers over 99, you'll need 001... 002... Finally, AUTORECODE this new string variable.
With Python, it's much easier. Try something like
compute somevariable = $casenum.
begin program.
import spss
for i in range(1000):
spss.Submit('add value labels somevariable %d "%d cigarettes".'%(i,i))
end program.
Finally, I do kinda wonder why you need such value labels in the first place. If the variable label says "number of cigarettes smoked", then there can't be any confusion regarding the meaning of the values, right?
By Valentina on March 22nd, 2016
Thank you, Ruben!
Alright, I will try.
Why - there are also values like 'I never smoked', 'I quit smoking', 'I never tried' and else. So I thought it woulf be useful to add values to numbers also, to ake it look nicer in analytics.
By Ruben Geert van den Berg on March 22nd, 2016
Hi Valentina!
You could consider labeling only values that don't express a number of cigarettes smoked per day. In this case, the table will look fine. For a quick example, try the syntax below.
data list free/smoking.
begin data
5 5 5 10 10 20 20 20 20 9999 9998
end data.
value labels smoking 9999 "I quit smoking" 9998 "I never smoked".
formats smoking(f4).
variable labels smoking "Number of cigarettes smoked per day".
set tnumbers labels tvars labels.
frequencies smoking.
By Young on June 20th, 2016
SPSS LOOP Syntax Example 1 is not working.
It shows ">Warning # 534
>Execution of a loop was terminated after MXLOOPS trips. The value of MXLOOPS
>can be displayed with the SHOW command and changed with the SET command."
>Command line: 44 Current case: 1 Current splitfile group: 1