LOOP is a command for running one or many SPSS transformation commands repetitively. SPSS LOOP is often used together with VECTOR. An (often) easier alternative is DO REPEAT.
- There are several ways for looping in SPSS. It depends on the specifics of the situation which one(s) you can use. Note that these options are only available in syntax.
- An option for looping over transformations is the LOOP command. We'll explain it with some examples a bit later in this tutorial.
- A second option for transformations is the DO REPEAT command.
- For looping over procedures, the way to go is Python. For a very basic example, see Regression over Many Dependent Variables.
Example: Replacing Double by Single Spaces
- Say we have data containing sentences. The sentences contain double, triple (and so on) spaces which we'd like to replace by single spaces.
- Test data for this example are created by running the syntax below.
*Create mini test dataset.
data list free/sentence(a45).
begin data
'a b c d e f g h i'
end data.
data list free/sentence(a45).
begin data
'a b c d e f g h i'
end data.
SPSS LOOP - Minimal Specification
- Note that simply replacing double spaces by single ones won't be sufficient. This is because 'new' double spaces may be created by the replacement process if it encounters triple(+) spaces.
- However, if we perform this replacement repeatedly, all double spaces will at some point be gone. The most basic way for doing this is simply putting the replacement in a loop.
- The SPSS LOOP command indicates that subsequent commands should be repeated. Reversely, END LOOP indicates that commands following it do not have to be repeated.
- The syntax below demonstrates the most basic use of LOOP. We'll use REPLACE for removing double spaces.
SPSS LOOP Syntax Example 1
*Wrong way (triggers warning #534) to replace double spaces by single ones.
loop.
compute sentence = replace(sentence,' ',' ').
end loop.
execute.
loop.
compute sentence = replace(sentence,' ',' ').
end loop.
execute.
The LOOP Index Variable
- The preceding syntax example will do its job but it's very inefficient and even raises a warning (#534). This is because nothing tells SPSS to stop looping at some point except for a predefined maximum number of loops.
- A very basic way to circumvent this is to use a loop index variable. This is a variable whose values change over iterations. Like so we can specify exactly how many iterations we'd like over our command(s).
- Assuming our sentence does not hold more than 8 spaces in a row, we'll need to repeat our replace command only 3 times. On the first iteration, 8 or 7 adjacent spaces will become 4 spaces. The second iteration will replace these 4 spaces with two spaces. The 2 spaces will be replaced by a single space on the last iteration.
- For a demonstration, recreate the test data from the first example and try the syntax below.
SPSS LOOP Syntax Example 2
*Replace double spaces by single ones exactly three times.
loop repetition = 1 to 3.
compute sentence = replace(sentence,' ',' ').
end loop.
execute.
loop repetition = 1 to 3.
compute sentence = replace(sentence,' ',' ').
end loop.
execute.
LOOP Index as Scratch Variable
- The example above basically works as follows: the variable 'repetition' takes on the value 1 and the replace command is performed. Next, it takes on the value 2 and the replace command is performed a second time. 'Repetition' becomes 3 and the third iteration takes place.
- Next, 'repetition' becomes 4 but since this exceeds the threshold of 3 that we set, the loop stops and the replace command is not carried out a fourth time.
- Three iterations are exactly enough for the data at hand. However, we do end up with a useless loop index ('repetition') in our data. We could delete it after the loop but a more common solution is to ensure it doesn't show up in the first place.
- This is done by using a scratch variable as the loop index. In a nutshell, just start the variable name with "#" and it won't show up.
- Like so, you could use #repetition instead of repetition. In practice you'll often see #i (i for index) being used as the loop index. However, just
#
is also a valid name for a scratch variable so we'll stick with that. - These points are demonstrated in the syntax below.
SPSS LOOP Syntax Example 3
*Replace double spaces by single ones exactly three times.
loop # = 1 to 3.
compute sentence = replace(sentence,' ',' ').
end loop.
exe.
loop # = 1 to 3.
compute sentence = replace(sentence,' ',' ').
end loop.
exe.
END LOOP IF
- The last syntax example wasn't too bad but it has two problems. First, we need to know in advance how many loops we'll need. This is not always the case. To ensure sufficient iterations, we could simply loop a large number of times but this may slow down the process on large datasets.
- Second, if there are many cases then perhaps some need more iterations than others.
- Both points can be taken into account by dropping the loop index. Instead, we'll end the loop as soon as there's no more double spaces for each case. During each iteration we'll check whether this is the case by using the INDEX function which will return 0 when the double space is not present. The syntax below demonstrates this.
SPSS LOOP Syntax Example 4
*Stop looping when double spaces aren't present anymore.
loop.
compute sentence = replace(sentence,' ',' ').
end loop if char.index(sentence,' ') = 0.
exe.
loop.
compute sentence = replace(sentence,' ',' ').
end loop if char.index(sentence,' ') = 0.
exe.
LOOP IF
- The previous syntax example still has a tiny shortcoming: it will perform the replace command even if no double spaces are present in a sentence at all.
- A more efficient approach is to only start the loop for cases containing at least one double space. So for some cases zero iterations will take place while for others three (or more) iterations may be carried out.
- This is accomplished by using LOOP IF. The condition for looping is the presence of a double space. The syntax below demonstrates this.
SPSS LOOP Syntax Example 5
*Start an iteration if a double space is present.
loop if char.index(sentence,' ') > 0.
compute sentence = replace(sentence,' ',' ').
end loop.
exe.
loop if char.index(sentence,' ') > 0.
compute sentence = replace(sentence,' ',' ').
end loop.
exe.
Using the LOOP Index
- The previous syntax examples using a loop index didn't use this index within the commands that were repeated. It merely indicated a fixed number of repetitions for each case.
- However, it's common that the index itself is used within the loop as well. Over the iterations, the index is replaced by each of the numbers that's being looped over.
- This is demonstrated in the syntax below (using different test data than the previous examples). It will count the occurrence of the letter 'e' in each name. For each case the number of iterations is equal to the number of letters in their name.
- If you're unfamiliar with the string functions used in the example, see our SPSS String Variables Tutorial.
SPSS LOOP Syntax Example 6
*1. Create mini test dataset.
data list free/name(a10).
begin data
Anneke Martin Stefan
end data.
*2. Count occurrence of 'e' by looping through letters in name.
compute count_e = 0.
loop # = 1 to char.length(name).
if char.substr(name,#,1) = 'e' count_e = count_e + 1.
end loop.
exe.
data list free/name(a10).
begin data
Anneke Martin Stefan
end data.
*2. Count occurrence of 'e' by looping through letters in name.
compute count_e = 0.
loop # = 1 to char.length(name).
if char.substr(name,#,1) = 'e' count_e = count_e + 1.
end loop.
exe.
The BY Keyword
- When a loop index is used, it may increment/decrement in steps smaller or larger than one. This is specified by the BY keyword.
- For instance, 3 TO 12 BY 3 increments from 3 through 12 by steps of 3. It thus returns 3, 6, 9 and 12.
- When combined with VECTOR, this can be used to compute means over groups over variables. Like so, the final syntax example calculates means over (v1, v2, v3), (v4, v5, v6) and so on.
SPSS LOOP Syntax Example 7
*1. Create mini test dataset.
data list free/v1 to v12 (12f1.0).
begin data
0 0 0 0 0 1 0 1 1 1 1 1
end data.
*2. Compute 4 sums, each over 3 adjacent variables.
vector v = v1 to v12 / s(4).
loop # = 3 to 12 by 3.
compute s(# / 3) = sum(v(#),v(# - 1),v(# - 2)).
end loop.
exe.
data list free/v1 to v12 (12f1.0).
begin data
0 0 0 0 0 1 0 1 1 1 1 1
end data.
*2. Compute 4 sums, each over 3 adjacent variables.
vector v = v1 to v12 / s(4).
loop # = 3 to 12 by 3.
compute s(# / 3) = sum(v(#),v(# - 1),v(# - 2)).
end loop.
exe.
THIS TUTORIAL HAS 19 COMMENTS:
By Aurore de Beaumont on January 19th, 2016
Hello
I would like to cut a sentence using LOOP and VECTOR, ... in order not to worry about the number of words to be cut.
I tried different syntax without achieving the desired result.
RO =
"dress women size 42"
" shirt men winter"
"....."
STRING Mot1 Mot2 A(24) R1 R2 (A100).
COMPUTE #P1 =CHAR.INDEX (RO," ").
IF (#P1>0) Mot1=CHAR.SUBTSR (R0, 1, #P1-1).
IF (#P1>0) R1=CHAR.SUBTSR (R0, #P1+1).
IF (#P1>0) R1=LTRIM(RTRIM(R1)).
IF (#P1=0) Mot1=LTRIM(RTRIM(R0)).
COMPUTE #P2 =CHAR.INDEX (R1," ").
IF (#P2>0) Mot2=CHAR.SUBTSR (R1, 1, #P2-1).
IF (#P2>0) R2=CHAR.SUBTSR (R1, #P2+1).
IF (#P2>0) R2=LTRIM(RTRIM(R2)).
IF (#P2=0) Mot2=LTRIM(RTRIM(R1)).
EXECUTE.
Instead can we do this ?
VECTOR Mot(2, A24).
VECTOR R(2,A100).
LOOP xi=1 to 2.
COMPUTE #P(xi) =CHAR.INDEX (R(xi-1)," ").
IF (#P(xi)>0) Mot(xi)=CHAR.SUBTSR (R(xi-1), 1, #P(xi)-1).
IF (#P(xi)>0) R(xi)=CHAR.SUBTSR (R(xi-1), #P(xi)+1).
IF (#P(xi)>0) R(xi)=LTRIM(RTRIM(R(xi))).
IF (#P(xi)=0) Mot(xi)=LTRIM(RTRIM(R(xi-1))).
EXECUTE.
But it doesn't work !
Can you help me ?
Thanks a lot
Aurore from Paris
By LM on February 23rd, 2016
I am wondering whether I can use loop for recoding variables? I have multiple variables in string data that I need to recode into numeric with the same values. It is frustrating having to input every variable name and the new recoded variable names... e.g.:
RECODE ISI_1 ISI_2 ISI_3 (CONVERT) ('NONE'=0) ('MILD'=1) ('MODERATE'=2) ('SEVERE'=3) ('VERY'=4) INTO ISI_1_RC ISI_2_RC ISI_3_RC.
EXECUTE.
Any suggestions would be appreciated.
By Ruben Geert van den Berg on February 23rd, 2016
Hi Lauren!
First of all, try AUTORECODE and then our SPSS Recode Values with Value Labels Tool. Like so, you'll have all of your values nicely labelled and the entire process should require no more than 2 lines of syntax even for many variables if all variables have identical recoding schemes.
If you somehow can't use that, I suggest you use a prefix instead of a suffix for your new variables. Like so you can probably use the TO keyword as in
RECODE ISI_1 to ISI_3 (CONVERT) ('NONE'=0) ('MILD'=1) ('MODERATE'=2) ('SEVERE'=3) ('VERY'=4) INTO nISI_1 to nISI_3.
This should work if the variable names have the structure you proposed in the example. However, you'll still have to apply value labels after doing so.
In this case, LOOP and DO REPEAT won't help because you can't concatenate a prefix or suffix to variable names within such a loop.
However, you could perform the necessary concatenation within a macro loop (not recommended) or an SPSS Python loop (much better idea). This assumes that all variables have identical recoding schemes.
Let me know whether that solves the problem, ok?
By Lauren on February 29th, 2016
Hi Ruben,
Thank you for getting back to me. Unfortunately from what I can tell autorecode automatically assigns the numbers itself, whereas I need to assign specific numbers to specific words. The use of the 'to' in the recode would save some time, but it looks like I still have to type out all the variable names for the first part? (e.g. ISI_1 ISI_2 ISI_3...). I was hoping to save some time with all that typing, and obviously it's frustrating if you miss a typo and something doesn't recode! Any other shortcuts would be greatly appreciated.
Thanks,
Lauren
By Ruben Geert van den Berg on February 29th, 2016
Hi Lauren!
If your input variables are adjacent in your data, you can obviously use TO here as well. I slightly modified my previous comment for illustrating this. If your input variables are scattered throughout your data file,
SORT VARIABLES BY NAME.
may or may not solve that problem.Second, AUTORECODE assigns integer values (1, 2, ...) to alphabetically sorted string values. However, you'll usually want a different order as suggested by your syntax.
In this case, use AUTORECODE anyway. After doing so, you can easily reorder the value-value label pairs with the tool presented in this tutorial. This tool was specifically designed for handling the exact situation you're describing.
Let me know whether one of these two options work for you, ok? If not, then there's different ways for getting things done fast but they'll require more challenging (Python) syntax too.