Computing Sums in SPSS – 3 Easy Options
In SPSS, SUM(v1,v2) is not always equivalent to v1 + v2. This tutorial explains the difference and shows how to make the right choice here.
Different Ways of Taking Sums have Different Outcomes when Missing Values are Present
Explanation
- In SPSS,
v1 + v2 + v3will result in a system missing value if at least one missing value is present in v1, v2 or v3. - The first alternative,
SUM(v1, v2, v3)implicitly replaces missing values with zeroes. - The second alternative,
MEAN(v1, v2, v3) * 3implicitly replaces missing values with the mean of the non missing values. - The third alternative,
MEAN.2(v1, v2, v3) * 3is almost similar to the second. However, by suffixingMEANby.2, you ensure that a mean is only calculated if at least two non missing values are present in v1, v2 and v3. - These points are demonstrated by the syntax below.
SPSS Syntax Demonstration
data list free/v1 v2 v3.
begin data
1 3 5
1 3 ''
1 '' ''
end data.
compute sum_by_sum = sum(v1,v2,v3).
compute sum_by_plus = v1 + v2 + v3.
compute sum_by_mean = mean(v1 to v3) * 3.
compute sum_by_mean.2 = mean.2(v1 to v3) * 3.
exe.
begin data
1 3 5
1 3 ''
1 '' ''
end data.
compute sum_by_sum = sum(v1,v2,v3).
compute sum_by_plus = v1 + v2 + v3.
compute sum_by_mean = mean(v1 to v3) * 3.
compute sum_by_mean.2 = mean.2(v1 to v3) * 3.
exe.
So Which one Is Best?
- This question is rather hard to answer. It may depend on the meaning of the missing values (question skipped? technical problem?). Also, what are the individual questions and the sum supposed to reflect?
- Second, the amount of missing values and sample size may be taken into account. Does it permit excluding some observations with missing values? Will this affect representativity and if so, is that a real problem?
- For one thing, sums calculated by
SUMmay be biased towards zero. For instance, if v1 through v3 measure components of satisfaction, respondents will be seen as "less satisfied" insofar they have more missing values. That conclusion may be misleading. - Using the
+operator does not induce such bias but may result in many missing values in the sum. This problem becomes larger as more missing values are present in the input variables and a sum is taken over more variables. - Multiplying the mean by the number of variables, may be a better alternative. However, it will always come up with a sum if there's at least one non missing value. Especially with many input variables, a single value may be judged insufficient for inferring a summation measure.
- But perhaps none of these options is expected to yield sufficiently accurate results. In this case, one could partly circumvent the problem with a (multiple) imputation of missing values.
SPSS TUTORIALS