## Summary

In SPSS, `SUM(v1,v2)`

is not always equivalent to `v1 + v2`

. This tutorial explains the difference and shows how to make the right choice here.

## Explanation

- In SPSS,
`v1 + v2 + v3`

will result in a system missing value if**at least one missing value**is present in v1, v2 or v3. - The first alternative,
`SUM(v1, v2, v3)`

implicitly replaces missing values with**zeroes**. - The second alternative,
`MEAN(v1, v2, v3) * 3`

implicitly replaces missing values with the**mean of the non missing values**. - The third alternative,
`MEAN.2(v1, v2, v3) * 3`

is almost similar to the second. However, by suffixing`MEAN`

by`.2`

, you ensure that a mean is only calculated if**at least two non missing values**are present in v1, v2 and v3. - These points are demonstrated by the syntax below.

## SPSS Syntax Demonstration

data list free/v1 v2 v3.

begin data

1 3 5

1 3 ''

1 '' ''

end data.

compute sum_by_sum = sum(v1,v2,v3).

compute sum_by_plus = v1 + v2 + v3.

compute sum_by_mean = mean(v1 to v3) * 3.

compute sum_by_mean.2 = mean.2(v1 to v3) * 3.

exe.

begin data

1 3 5

1 3 ''

1 '' ''

end data.

compute sum_by_sum = sum(v1,v2,v3).

compute sum_by_plus = v1 + v2 + v3.

compute sum_by_mean = mean(v1 to v3) * 3.

compute sum_by_mean.2 = mean.2(v1 to v3) * 3.

exe.

## So Which one Is Best?

- This question is rather hard to answer. It may depend on the
**meaning of the missing values**(question skipped? technical problem?). Also, what are the individual questions and the sum supposed to reflect? - Second, the amount of missing values and
**sample size**may be taken into account. Does it permit excluding some observations with missing values? Will this affect representativity and if so, is that a real problem? - For one thing, sums calculated by
`SUM`

may be**biased towards zero**. For instance, if v1 through v3 measure components of satisfaction, repondents will be seen as "less satisfied" insofar they have more missing values. That conclusion may be misleading. - Using the
`+`

operator does not induce such bias but may result in**many missing values**in the sum. This problem becomes larger as more missing values are present in the input variables and a sum is taken over more variables. - Multiplying the mean by the number of variables, may be a better alternative. However, it will always come up with with a sum if there's at least one non missing value. Especially with many input variables, a single value may be jugded
**insufficient for inferring**a summation measure. - But perhaps none of these options is expected to yield sufficiently accurate results. In this case, one could partly circumvent the problem with a
**(multiple) imputation of missing values**.

## This tutorial has 2 comments

## By Vu on November 14th, 2018

Super page. The way you explained things is so simple but easy to understand. Thank so much!

## By Mengesha Abrha on January 6th, 2017

This tutor is very fantastic and still we want to explore more. thanks in advance