SPSS TUTORIALS FULL COURSE BASICS ANOVA REGRESSION FACTOR

# SPSS – Quick Overview Statistical Functions

This tutorial walks you through SPSS' main statistical functions. They are mainly used with COMPUTE and IF. Note that these are all within-subjects (or “horizontal”) functions.

For between-subjects (or “vertical”) functions, see AGGREGATE.

Within-subjects versus between-subjects functions.

## SPSS Statistical Functions - Missing Values

SPSS statistical functions only return system missing values if all their input values are missing values. If a single input value is valid, the output value will be valid too. This holds for all functions we'll cover in this tutorial.
Remember that the opposite holds for SPSS numeric functions: the latter only return a valid value if all their input values are valid.

## SPSS Statistical Functions - Dot Operator

A minimal number of valid input values can be specified for statistical functions. This is done by suffixing the function with a period followed by the required number of valid values. For example compute mean_v = mean.3(v1 to v5). means “Compute mean_v only for cases having at least 3 valid values over v1 to v5. Cases with fewer valid values must get a system missing value on mean_v.”
The dot operator can be used with all functions covered in this tutorial. Don't overlook it. Although it's little known among SPSS users, it's a terrific time saving feature.

Compute mean only for cases with at least 3 valid values on the input variables

## Data Preparation

We'll use only the last 5 variables in our data.Strictly, calculations are not allowed on such ordinal variables. However, see Assumption of Equal Intervals. The functions we'll demonstrate on them may return incorrect values if we fail to specify user missing values. We'll therefore do a quick check by running FREQUENCIES with the syntax below. Note the TO keyword in step 5.

*1. Specify folder where data are located.

cd 'd:/temp'.

*2. Open data file.

get file 'hospital.sav'.

*3. Show values and value labels in output.

set tnumbers both.

*4. Inspect frequencies.

frequencies doctor_rating to facilities_rating.

*5. Specify 6 as user missing value for all variables involved.

missing values doctor_rating to facilities_rating(6).

## SPSS MEAN Function

Means over variables are returned by SPSS MEAN function. If missing values are present, the sum of the valid values is divided by the number of valid values. The syntax below shows how to compute within-subjects means.

*Compute mean_rating as mean over all 5 ratings.

compute mean_rating = mean(doctor_rating to facilities_rating).
exe.

## SPSS SUM Function

SPSS SUM function returns the sum over a number of variables. In the presence of missing values, the sum over all valid values is returned. Keep in mind that the result may be somewhat misleading in this case.Also see SPSS Sum - Cautionary Note. The syntax below computes the within-subjects sum over our rating variables.

*Compute sum over 5 ratings.

compute sum_rating = sum(doctor_rating to facilities_rating).
exe.

## SPSS MIN Function

The minimum (smallest value) over a number of values is returned by SPSS MIN function. We normally use MIN for numeric variables but it can technically be used on string variables as well. It's demonstrated on our rating variables by the syntax below.

*Compute minimum value over variables.

compute min_rating = min(doctor_rating to facilities_rating).
exe.

## SPSS MAX Function

SPSS MAX function returns the maximum (largest value) over a number of values. Just like MIN, it can be used on string variables too. The syntax below computes the maximum over the rating variables.

*Compute maximum rating.

compute max_rating = max(doctor_rating to facilities_rating).
exe.

## SPSS SD Function

The standard deviation over a number of variables is returned by SPSS SD function. Keep in mind that we're referring to the within-subjects standard deviation here.SPSS divides by (n-1) when computing the standard deviation.
Computing within-subjects standard deviations comes in handy in survey research for detecting straightliners: respondents who give the same answer to all questions will have a standard deviation of zero over these questions. This may be an indication that the questions weren't answered seriously, in which case you may want to exclude such cases from analysis.See SELECT IF and FILTER for the most likely options here.

## SPSS SD Function Syntax Example

*1. Compute within-subjects standard deviation over rating variables.

compute sd_rating = sd(doctor_rating to facilities_rating).
exe.

*2. Move straightliners to top of file.

sort cases by sd_rating.

*3. Delete straightliners from data.

select if sd_rating > 0.
exe.
Detecting potential straightliners with SPSS SD function.

## SPSS VARIANCE Function

SPSS VARIANCE function computes the within-subjects variance over a number of variables. It's simply the squared standard deviation.

*Compute within-subjects variance over rating variables.

compute variance_rating = variance(doctor_rating to facilities_rating).
exe.

## SPSS MEDIAN Function

Finally, the median over a number of values is returned by SPSS MEDIAN function. Again, note that we refer to the within-subjects median. The syntax below demonstrates it on our rating variables.

*Compute within-subjects median over rating variables.

compute median_rating = median(doctor_rating to facilities_rating).
exe.

# Tell us what you think!

*Required field. Your comment will show up after approval from a moderator.

# THIS TUTORIAL HAS 16 COMMENTS:

• ### By Darija on October 17th, 2016

Hi, I need some help with combining max and if function. I need to detect the highest value among 7 different variables and then create a new categorical variable that will point out which one of those 7 was the highest. For example, if variable 3 had the highest value, my new variable would get value 3 and so on. Do you know how syntax would go? Thanks ahead.

• ### By Ruben Geert van den Berg on October 18th, 2016

Hi Darija! What's supposed to happen if there are ties (multiple values in case hold maximum value)? If these don't occur, you can use the example found at Find Variable Holding Max Value within Cases.

Hope that helps!

• ### By Darija on October 20th, 2016

Thank you, that was very helpful. In case of ties, I would mark that case with "8". In this specific data there was no ties, but might occur in the future, so what would you suggest in that case?

• ### By Ruben Geert van den Berg on October 22nd, 2016

Hi Darija!

It's generally a good idea to choose user missing values that would be very unlikely (or impossible) to occur as valid values. We typically see values such as -9999 and -9998 as user missing values to make sure they're easily recognizable as such. Choosing "8" sounds somewhat tricky because it sounds as a possible -perhaps even likely- valid value. Then if somebody really scores "8", you won't be able to use this as a user missing value anymore.

Hope that helps!

• ### By Eline on March 1st, 2017

I have a problem regarding the analysis of a big dataset, that I can't seem to tackle by myself! Help would be really appreciated, and maybe one of the methods discussed above is suitable:

This is the situation: I did an experiment in which participants had to evaluate the taste of 5 samples with different flavours. They would drink them out of small cups with blinded codes on them, and all would have a randomized drinking order of the 5 samples.

User = participant number
Loopvalue code 1, 2, 3, 4 and 5= Varibales representing five blinding codes that were used for the 5 samples that were given in random order to the participants.
Q15.1, Q15.2, Q15.3,. Q15.4 and Q15.5 = variables representing values between 0 and 100 that give information about the experienced pleasantness of each sample.

Based on the Q15 scores I would like to select the two values (out of five) for each participant that are closest to 50. I would like to make new variables for these values:

Example for user t10:
Middle_low = 50.6
Middle_high = 50.9

I don't know how to select the two values closest to 50 using spss. Doing it manually would take way too much time (I would have to select thousands of numbers).

Anyone have a suggestion?

Another question is how to link the blinding code to the matching Q15 question? Example:

Middle_low = 50.6
Middle_high = 50.9
Middle_low_nr = 390
Middle_high_nr = 159