Many easy options have been proposed for combining the values of categorical variables in SPSS. However, the real information is usually in the value labels instead of the values. This tutorial proposes a simple trick for combining categorical variables and automatically applying correct value labels to the result.
SPSS Combine Categorical Variables Example
You may follow along by downloading and opening hospital.sav. Now say we'd like to combine “doctor_rating” and “nurse_rating” (near the end of the file). The result is shown in the screenshot below. Note that all variables are numeric with proper value labels applied to them.
SPSS Combine Categorical Variables Syntax
We first present the syntax that does the trick. Next, we'll point out how it how to easily use it on other data files.
string tmp(a1000).
*2. Combine values and value labels of doctor_rating and nurse_rating into tmp string variable.
compute tmp = concat(
"doctor_rating = ",string(doctor_rating,f1)," (",rtrim(valuelabels(doctor_rating)),") ",
"nurse_rating = ",string(nurse_rating,f1)," (",rtrim(valuelabels(nurse_rating)),") "
).
*3. Convert string variable into numeric.
autorecode tmp
/into doctor_and_nurse_rating.
*4. Delete tmp string variable.
delete variables tmp.
*5. Optionally, apply variable label to end result.
variable labels doctor_and_nurse_rating 'Combination of doctor_rating and nurse_rating'.
SPSS Combine Categorical Variables - Other Data
We realize that many readers may find this syntax too difficult to rewrite for their own data files. So instead of rewriting it, just copy and paste it and make three basic adjustments before running it:
- replace “doctor_rating” by the name of the first variable you'd like to combine. Note that you can do so by using the ctrl + h shortkey.
- replace “nurse_rating” by the name of the second variable you'd like to combine.
- replace “doctor_and_nurse_rating” by the variable name you'd like to use for the final result.
SPSS Combine Categorical Variables - System Missing Values
You may have noticed that the value labels of the combined variable don't look very nice if system missing values are present in the original values. An example of such a value label is doctor_rating = 3 (Neutral) nurse_rating = . (). A nicer result can be obtained without changing the basic syntax for combining categorical variables. Prior to running this syntax, simply RECODE system missing values. Use a value that's not yet present in the original variables and apply a value label to it. The syntax below shows how to do so.
recode doctor_rating nurse_rating (sysmis = 7).
*2. Apply value label to new value.
add value labels doctor_rating nurse_rating 7 'System missing'.
*3. Proceed with remaining syntax from here.
After doing so, the resulting value label will look as follows: doctor_rating = 3 (Neutral) nurse_rating = 7 (System missing). Further, note that the syntax we used made a couple of assumptions. Most real world data will satisfy those. We'll walk through them below.
SPSS Combine Categorical Variables - Assumptions
- Although the syntax combines two variables, it can be expanded to incorporate three or more variables.
- It is assumed that all values in the original variables consist of single digits. If two or three digit values are present, replace
f1
byn2
orn3
.Then3
format left pads numbers with zeroes and thus keeps their alphabetical order equal to their numerical order.
Further Reading
Those who'd like a closer look at some of the commands and functions we combined in this tutorial may want to consult string variables, STRING function, VALUELABEL, CONCAT, RTRIM and AUTORECODE.
THIS TUTORIAL HAS 26 COMMENTS:
By Linda Martell on July 15th, 2015
Good
By Stephen on February 3rd, 2016
How can I combine two different categorical variables into one on SPSS spreadsheet without the use of syntax?
By Ruben Geert van den Berg on February 3rd, 2016
I think you can't do it properly (that is, with value labels) via the menu. I may at some point build a menu tool for it but not on the short term.
More generally, working directly from the SPSS menu is a very bad idea anyway. Working from syntax is the last thing you'll ever regret when it comes to SPSS.
By Christine on May 31st, 2016
Hi, I'm looking at drug use. I have 20 string variables (s1-s20) where the drugs are listed, and 20 numeric variables (n1-n20) where the doses for the drugs are listed (s1 gives drug, n1 gives dose for drug in s1). It's about 400 different drugs, and they can be listed in any of the 20 string variables.
I would like a method to find the dose the patient use of a specific drug like this in plain language:
If s1 has drugA the variable drugA_dose=n1, this should then be looped through s1/n1 to s20/n20. In the end I have a variable drugA_dose with missing if the patient doesn't use the drug and the dose if the patient use the drug. Is this possible in syntax?
By Ruben Geert van den Berg on May 31st, 2016
Hi Christine!
The first thing that comes to mind is to convert your data from "wide" to "long" format with VARSTOCASES. You could then perhaps run something like
means dose by drug.
to find the mean/standard deviation of each dose per drug separately. Or SPLIT FILE on drug and run histograms on dose. Or sort cases on drug and then reconvert the data back to "wide" format with CASESTOVARS. Which way to go depends on what/how you'd like to report on these data.