This tutorial walks you through SPSS' main statistical functions. They are mainly used with COMPUTE and IF. Note that these are all within-subjects (or “horizontal”) functions.
For between-subjects (or “vertical”) functions, see AGGREGATE.
All examples in this tutorial use hospital.sav, which is freely downloadable.
SPSS Statistical Functions - Missing Values
SPSS statistical functions only return system missing values if all their input values are missing values. If a single input value is valid, the output value will be valid too. This holds for all functions we'll cover in this tutorial.
Remember that the opposite holds for SPSS numeric functions: the latter only return a valid value if all their input values are valid.
SPSS Statistical Functions - Dot Operator
A minimal number of valid input values can be specified for statistical functions. This is done by suffixing the function with a period followed by the required number of valid values. For example
compute mean_v = mean.3(v1 to v5).
means “Compute mean_v only for cases having at least 3 valid values over v1 to v5. Cases with fewer valid values must get a system missing value on mean_v.”
The dot operator can be used with all functions covered in this tutorial. Don't overlook it. Although it's little known among SPSS users, it's a terrific time saving feature.
Data Preparation
We'll use only the last 5 variables in our data.Strictly, calculations are not allowed on such ordinal variables. However, see Assumption of Equal Intervals. The functions we'll demonstrate on them may return incorrect values if we fail to specify user missing values. We'll therefore do a quick check by running FREQUENCIES with the syntax below. Note the TO keyword in step 5.
cd 'd:/temp'.
*2. Open data file.
get file 'hospital.sav'.
*3. Show values and value labels in output.
set tnumbers both.
*4. Inspect frequencies.
frequencies doctor_rating to facilities_rating.
*5. Specify 6 as user missing value for all variables involved.
missing values doctor_rating to facilities_rating(6).
SPSS MEAN Function
Means over variables are returned by SPSS MEAN function. If missing values are present, the sum of the valid values is divided by the number of valid values. The syntax below shows how to compute within-subjects means.
compute mean_rating = mean(doctor_rating to facilities_rating).
exe.
SPSS SUM Function
SPSS SUM function returns the sum over a number of variables. In the presence of missing values, the sum over all valid values is returned. Keep in mind that the result may be somewhat misleading in this case.Also see SPSS Sum - Cautionary Note. The syntax below computes the within-subjects sum over our rating variables.
compute sum_rating = sum(doctor_rating to facilities_rating).
exe.
SPSS MIN Function
The minimum (smallest value) over a number of values is returned by SPSS MIN function. We normally use MIN for numeric variables but it can technically be used on string variables as well. It's demonstrated on our rating variables by the syntax below.
compute min_rating = min(doctor_rating to facilities_rating).
exe.
SPSS MAX Function
SPSS MAX function returns the maximum (largest value) over a number of values. Just like MIN, it can be used on string variables too. The syntax below computes the maximum over the rating variables.
compute max_rating = max(doctor_rating to facilities_rating).
exe.
SPSS SD Function
The standard deviation over a number of variables is returned by SPSS SD function. Keep in mind that we're referring to the within-subjects standard deviation here.SPSS divides by (n-1) when computing the standard deviation.
Computing within-subjects standard deviations comes in handy in survey research for detecting straightliners: respondents who give the same answer to all questions will have a standard deviation of zero over these questions. This may be an indication that the questions weren't answered seriously, in which case you may want to exclude such cases from analysis.See SELECT IF and FILTER for the most likely options here.
SPSS SD Function Syntax Example
compute sd_rating = sd(doctor_rating to facilities_rating).
exe.
*2. Move straightliners to top of file.
sort cases by sd_rating.
*3. Delete straightliners from data.
select if sd_rating > 0.
exe.
SPSS VARIANCE Function
SPSS VARIANCE function computes the within-subjects variance over a number of variables. It's simply the squared standard deviation.
compute variance_rating = variance(doctor_rating to facilities_rating).
exe.
SPSS MEDIAN Function
Finally, the median over a number of values is returned by SPSS MEDIAN function. Again, note that we refer to the within-subjects median. The syntax below demonstrates it on our rating variables.
compute median_rating = median(doctor_rating to facilities_rating).
exe.
THIS TUTORIAL HAS 16 COMMENTS:
By Darija on October 17th, 2016
Hi, I need some help with combining max and if function. I need to detect the highest value among 7 different variables and then create a new categorical variable that will point out which one of those 7 was the highest. For example, if variable 3 had the highest value, my new variable would get value 3 and so on. Do you know how syntax would go? Thanks ahead.
By Ruben Geert van den Berg on October 18th, 2016
Hi Darija! What's supposed to happen if there are ties (multiple values in case hold maximum value)? If these don't occur, you can use the example found at Find Variable Holding Max Value within Cases.
Hope that helps!
By Darija on October 20th, 2016
Thank you, that was very helpful. In case of ties, I would mark that case with "8". In this specific data there was no ties, but might occur in the future, so what would you suggest in that case?
By Ruben Geert van den Berg on October 22nd, 2016
Hi Darija!
It's generally a good idea to choose user missing values that would be very unlikely (or impossible) to occur as valid values. We typically see values such as -9999 and -9998 as user missing values to make sure they're easily recognizable as such. Choosing "8" sounds somewhat tricky because it sounds as a possible -perhaps even likely- valid value. Then if somebody really scores "8", you won't be able to use this as a user missing value anymore.
Hope that helps!
By Eline on March 1st, 2017
I have a problem regarding the analysis of a big dataset, that I can't seem to tackle by myself! Help would be really appreciated, and maybe one of the methods discussed above is suitable:
This is the situation: I did an experiment in which participants had to evaluate the taste of 5 samples with different flavours. They would drink them out of small cups with blinded codes on them, and all would have a randomized drinking order of the 5 samples.
User = participant number
Loopvalue code 1, 2, 3, 4 and 5= Varibales representing five blinding codes that were used for the 5 samples that were given in random order to the participants.
Q15.1, Q15.2, Q15.3,. Q15.4 and Q15.5 = variables representing values between 0 and 100 that give information about the experienced pleasantness of each sample.
Based on the Q15 scores I would like to select the two values (out of five) for each participant that are closest to 50. I would like to make new variables for these values:
Example for user t10:
Middle_low = 50.6
Middle_high = 50.9
I don't know how to select the two values closest to 50 using spss. Doing it manually would take way too much time (I would have to select thousands of numbers).
Anyone have a suggestion?
Another question is how to link the blinding code to the matching Q15 question? Example:
Middle_low = 50.6
Middle_high = 50.9
Middle_low_nr = 390
Middle_high_nr = 159
Thank you in advance for your help!