SPSS Label Cleaning Tool
We sometimes receive data files with annoying prefixes or suffixes in variable and/or value labels. This tutorial presents a simple tool for removing these and some other “cleaning” operations.
- Prerequisites and Installation
- Example I - Text Replacement over Variable and Value Labels
- Example II - Remove Suffix from Variable Labels
- Example III - Remove Prefix from Value Labels
Example Data File
All examples in this tutorial use dirty-labels.sav. As shown below, its labels are far from ideal.
Some variable labels have suffixes that are irrelevant to the final data.
All value labels are prefixed by the values that represent them.
Variable and value labels have underscores instead of spaces.
Our tool deals with precisely such issues. Let's try it.
Prerequisites and Installation
First off, this tool requires SPSS version 24 or higher. Next, the SPSS Python 3 essentials must be installed, which is normally the case with recent SPSS versions.
Next, click SPSS_TUTORIALS_CLEAN_LABELS.spe for downloading our tool. You can install it by dragging & dropping it into a data editor window. Alternatively, navigate to
as shown below.
In the dialog that opens, navigate to the downloaded .spe file and select it. SPSS now throws a message that “The extension was successfully installed under Transform - SPSS tutorials - Clean Labels”.
Example I - Text Replacement over Variable and Value Labels
Let's first replace all underscores by spaces in both variable and value labels. We'll open
and fill out the dialog as shown below.
Completing these steps results in the syntax below. Let's run it.
SPSS TUTORIALS CLEAN_LABELS VARIABLES=v1 v2 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15 v16 v17 v18 v19
v20 v21 v22 FIND='_' REPLACEBY=' '
/OPTIONS OPERATION=FIREPCONT PROCESS=BOTH ACTION=BOTH.
Results
First note that all underscores were replaced by spaces in all variable and value labels. This was done by creating and running
- VARIABLE LABELS and
- ADD VALUE LABELS
commands. We chose to have these commands printed to our output window as shown below.
SPSS already ran this syntax but you can also copy-paste it into a syntax window. Like so, the adjustments can be replicated on any SPSS version with or without our tool installed. If there's a lot of syntax, consider moving it into a separate file and running it with INSERT.
Example II - Remove Suffix from Variable Labels
Some variable labels end with “ (proceed to question...” We'll remove these suffixes because they don't convey any interesting information and merely clutter up our output tables and charts.
Again, we start off at
and fill out the dialog as shown below.
Quick tip: you can shorten the resulting syntax by using
- TO for specifying a range of variables such as V5 TO V1;
- ALL for specifying all variables in the active dataset.
We did just that in the syntax below.
SPSS TUTORIALS CLEAN_LABELS VARIABLES=all FIND=' (proceed' REPLACEBY=' '
/OPTIONS OPERATION=FIOCSUC PROCESS=VARLABS ACTION=RUN.
Note that running this syntax removes “ (proceed to” and all characters that follow this expression from all variable labels.
Example III - Remove Prefix from Value Labels
Another issue we sometimes encounter are value labels being prefixed with the values representing them as
shown below.
Removing “= ” (mind the space) and all characters preceding it from all value labels fixes the problem. The syntax below -created from
-
does just that.
SPSS TUTORIALS CLEAN_LABELS VARIABLES=all FIND='= ' REPLACEBY=' '
/OPTIONS OPERATION=FIOCPRE PROCESS=VALLABS ACTION=RUN.
Result
After our third and final example, all value and variable labels are nice, short can clean.
So that'll wrap up the examples of our label cleaning tool.
Final Notes
I hope you'll find our tool as helpful as we do. This first version performs 4 cleaning operations that we recently needed for our daily work. We'll probably build in some more options when we (or you?) need them.
So if you've any suggestions or other remarks, please throw us a comment below. Other than that,
thanks for reading!
SPSS – Recode with Value Labels Tool
This tutorial presents a simple tool for recoding values along with their value labels into different values.
- Prerequisites & Download
- Checking Results & Creating New Variables
- Example I - Reverse Code Variables
- Example II - Correct Order after AUTORECODE
- Example III - Convert 1-2 into 0-1 Coding
- Example IV - Correct Coding Errors with Native Syntax
Example Data
We'll use recode-with-value-labels.sav -partly shown below- for all examples.
These data contain several common problems:
Some variables must be reverse coded because they measure the opposite of the other variables within some scale.
Some ordinal variables are coded as string variables.
The tool presented in this tutorial is the fastest option to fix these and several other common issues.
Prerequisites & Installation
Our recoding tool requires SPSS version 24+ with the SPSS Python 3 essentials properly installed -usually the case with recent SPSS versions.
Next, download our tool from SPSS_TUTORIALS_RECODE_WITH_VALUE_LABELS.spe. You can install it by dragging & dropping it into a data editor window. Alternatively, navigate to
as shown below.
In the dialog that opens, navigate to the downloaded .spe file and select it. SPSS now throws a message that “The extension was successfully installed under Transform - SPSS tutorials - Recode with Value Labels”. You'll now find our tool under as shown below.
Checking Results & Creating New Variables
If you use our tool, you may want to verify that all result are correct. A basic way to do so is to compare some frequency distributions before and after recoding your variables. These will be identical (except for their order) if you show only value labels in your output.
Our tool modifies existing variables instead of creating new ones. If that's not to your liking, combine it with our SPSS Clone Variables Tool (shown below).
A very solid strategy is now to
- clone all variables you'd like to recode;
- recode the original (rather than the cloned) variables with our recoding tool;
- compare the recoded variables with their cloned counterparts with CROSSTABS;
- optionally: remove the cloned variables from your data when you're done.
Ok, so let's now see how our recoding tool solves some common data problems.
Example I - Reverse Code Variables
Conf01 to Conf06 are intended to measure self confidence. However, Conf04 and Conf06 indicate a lack of self confidence and correlate negatively with the other confidence items.
This issue is solved by reverse coding these items. After installing our tool, let's first navigate to Next, we'll fill out the dialogs as shown below.
Excluding the user missing value of 8 (No answer) leaves this value and its value label unaltered.
Completing these steps results in the syntax below. Let's run it.
SPSS TUTORIALS RECODE_WITH_VALUE_LABELS VARIABLES=Conf04 Conf06 OLDVALUES=1 2 3 4 5 6 7 NEWVALUES=7
6 5 4 3 2 1
/OPTIONS LABELSUFFIX=" (R)" ACTION=RUN.
Result
Note that (R) is appended to the variable labels of our reverse coded variables;
The values and value labels have been reversed as well.
Our reverse coded items now correlate positively with all other confidence items as required for computing Cronbach’s alpha or a mean or sum score over this scale.
Example II - Correct Order after AUTORECODE
Another common issue are ordinal string variables in SPSS such as suc01 to suc06 which measure self-perceived successfulness. First off, let's convert them to labeled numeric variables by navigating to Next, we'll create an AUTORECODE command for a single variable as shown below.
We can now easily add the remaining 5 variables to the resulting SPSS syntax as shown below. Let's run it.
AUTORECODE VARIABLES=suc01 to suc06 /* ADD ALL OLD VARIABLES HERE */
/INTO nsuc01 to nsuc06 /* ADD ALL NEW VARIABLES HERE */
/GROUP
/PRINT.
This syntax converts our string variables into numeric ones but the order of the answer categories is not as desired. For correcting this, we first copy-paste our new, numeric values into Notepad++ or Excel. This makes it easy to move them into the desired order as shown below.
Doing so makes clear that we need to
- convert 9 into 1,
- convert 3 into 2,
- convert 7 into 3,
- and so on...
The figure below shows how to do so with our recoding tool.
This results in the syntax below, which sets the correct order for our autorecoded numeric variables.
SPSS TUTORIALS RECODE_WITH_VALUE_LABELS VARIABLES=nsuc01 nsuc02 nsuc03 nsuc04 nsuc05 nsuc06
OLDVALUES=9 3 7 4 6 2 8 5 1 NEWVALUES=1 2 3 4 5 6 7 8 9
/OPTIONS ACTION=RUN.
Example III - Convert 1-2 into 0-1 Coding
In SPSS, we preferably use a 0-1 coding for dichotomous variables. Some reasons are that
- this facilitates interpreting b-coefficients for dummy variables in multiple regression;
- means for 0-1 coded variables correspond to proportions of “yes” answers which are easily interpretable.
The syntax below is easily created with our recoding tool and converts the 1-2 coding for all dichotomous variables in our data file into a 0-1 coding.
SPSS TUTORIALS RECODE_WITH_VALUE_LABELS VARIABLES=somed01 somed02 somed03 somed04 somed05 somed06
somed07 OLDVALUES=2 NEWVALUES=0
/OPTIONS ACTION=RUN.
Example IV - Correct Coding Errors with Native Syntax
If we take a close look at our final variable, sat01, we see that it is coded 21 through 27. Depending on how we analyze it, we may want to convert it into a standard 7-point Likert scale. The screenshot below shows how it's done.
Note that we select “Create syntax and print it” for creating native syntax.
Result
As shown below, selecting the print option results in native SPSS syntax in your output window.
The syntax we thus copy-pasted from our output window is:
RECODE sat01 (21.0 = 1.0)(22.0 = 2.0)(23.0 = 3.0)(24.0 = 4.0)(25.0 = 5.0)(26.0 = 6.0)(27.0 = 7.0).
EXECUTE.
VALUE LABELS
/sat01 1.0 'Strongly disagree' 2.0 'Disagree' 3.0 'Slightly disagree' 4.0 'Neutral' 5.0 'Slightly agree' 6.0 'Agree' 7.0 'Strongly agree'.
Note that it consists of 3 very basic commands:
- RECODE adjusts the values themselves;
- EXECUTE can usually be removed from the syntax but it ensures that our RECODE is executed immediately;
- VALUE LABELS adjusts our value labels after our RECODE.
So why should you consider using the print option? Well, the default syntax created by our tool only runs on SPSS installations with the tool installed. So if a client or colleague needs to replicate your work,
using native syntax ensures that everything will run
on any SPSS installation.
Right, so that should do. I hope you'll find my tool helpful -I've been using it on tons of project myself. If you've any questions or remarks, just throw me a comment below, ok?
Thanks for reading!
SPSS Clone Variables Tool
Some SPSS commands such as RECODE and ALTER TYPE can make irreversible changes to variables. Before using these, I like to clone the variables that I'm about to edit. This allows me to compare the edited to the original versions.
This tutorial presents a super easy tool for making exact clones of variables in SPSS. We'll use bank-clean.sav (partly shown below) for all examples.
Prerequisites & Installation
Installing this tool requires
- SPSS version 24 or higher with
- the SPSS Python 3 essentials installed.
Recent SPSS versions usually meet these requirements.
Download our tool from SPSS_TUTORIALS_CLONE_VARIABLES.spe. You can install it from
as shown below.
After completing these steps, you'll find SPSS tutorials - Clone Variables under Transform.
Clone Variables Example I
Let's first clone jtype -short for job type- as illustrated below.
Completing these steps results in the SPSS syntax below. Let's run it.
SPSS_TUTORIALS_CLONE_VARIABLES VARIABLES=jtype
/OPTIONS FIX="c" FIXTYPE=PREFIX ACTION=RUN.
Result
Note that SPSS has now added a new variable to our data: cjtype as shown below.
Except for its name, cjtype is an exact clone of jtype: it has the same
- variable type and format;
- value labels;
- user missing values;
- and so on...
There's one minor issue with our first example: the syntax we just pasted only runs on SPSS installations with our tool installed.
The solution for this is to have the tool print native syntax instead: this syntax is typically (much) longer but it does run on any SPSS installation. Our second examples illustrates how to do just that.
Clone Variables Example II
Let's create native syntax for cloning a couple of different variables, including a string variable and a date variable.
This option has our tool print native syntax into our output window.
Because we chose to print (rather than run) syntax, this is one of the rare occasions at which we click Ok instead of Paste.
Result
Note that we now have native syntax for cloning several variables in our output window.
For actually running this syntax, we can simply copy-paste-run it in a syntax window.The entire syntax is shown below.
STRING clast_name (A30).
RECODE last_name (ELSE = COPY) INTO clast_name.
APPLY DICTIONARY FROM * /SOURCE VARIABLES = last_name /TARGET VARIABLES = clast_name.
RECODE gender (ELSE = COPY) INTO cgender.
APPLY DICTIONARY FROM * /SOURCE VARIABLES = gender /TARGET VARIABLES = cgender.
RECODE dob (ELSE = COPY) INTO cdob.
APPLY DICTIONARY FROM * /SOURCE VARIABLES = dob /TARGET VARIABLES = cdob.
RECODE educ (ELSE = COPY) INTO ceduc.
APPLY DICTIONARY FROM * /SOURCE VARIABLES = educ /TARGET VARIABLES = ceduc.
If our tool creates very long syntax, you could copy it into a separate file and run it from an INSERT command.
Right, I guess that should cover this simple but handy little tool. Hope you'll give it a try and hope you'll find it helpful. If you've any remarks, feel free to throw me a quick comment below.
Thanks for reading!
SPSS VARSTOCASES With Labels Tool
Running VARSTOCASES is often necessary for generating nice charts in SPSS. One of the many examples is a stacked bar chart for comparing multiple variables. Sadly, we lose our variable labels when running VARSTOCASES and we really do need those.
The solution is a simple tool that generates our VARSTOCASES for us and applies the variable labels of the input variables as value labels to the newly created index variable.
SPSS VARSTOCASES Problem - Example
Before proposing our solution, let's first take a look at the problem. We'll use course_evaluation.sav, part of which is shown below.
Now let's say we'd like to create the overview table shown below, perhaps followed up by a nice chart. The way to go here is VARSTOCASES as we'll demonstrate in a minute.
Desired End Result
Quick Data Check before VARSTOCASES
Whenever running VARSTOCASES, we always need to make sure our input variables have consistent value labels as explained in VARSTOCASES - wrong results. We'll check that by running the syntax below.
Inspecting Consistency of Value Labels Syntax
set tnumbers both.
*2. Check if value labels are consistent over variables.
frequencies q1 to q6.
*Result: consistent labels, good to go.
Right, we now run a basic VARSTOCASES command followed by CROSSTABS for comparing the scores on our 6 input variables in a single table.
Creating an Overview Table Syntax
varstocases/make q from q1 to q6/index question (q).
*2. Show only labels in output for reporting.
set tnumbers labels tvars labels.
*3. Create table.
crosstabs question by q/cells row.
Result
Right. Now, the numbers in this table are correct but we find it very annoying that we lost the variable labels for q2 to q6 in the process. By default, the variable label for q1 is -incorrectly- applied to our newly created variable that now holds the values of q1 to q6. It's precisely these two problems that our tool takes care of.
VARSTOCASES with Labels Tool - How to Use It?
- This tool requires SPSS version 17 or higher with the SPSS Python Essentials properly installed.
- Download and install the VARSTOCASES with labels tool. Note that this is an SPSS custom dialog.
- Navigate to
. - Enter the input variables. Note that you can use the TO and ALL keywords for doing so.
- and run the syntax.
- directs your web browser to the tutorial you're currently reading.
VARSTOCASES with Labels Tool - Just Do It
If you already ran VARSTOCASES on course_evaluation.sav, then you'll need to reopen the original data. After installing the tool, you'll find it under in the menu. Fill it out as shown below.
Clicking results in the syntax below.
VARSTOCASES Syntax Generated by Tool
dataset close all.
new file.
get file 'course_evaluation.sav'.
*2. SPSS Python syntax pasted from tool.
begin program.
varSpec = 'q1 to q6'
import spssaux,spss
vallabcmd = 'value labels aspect'
sDict = spssaux.VariableDict(caseless = True)
varList = sDict.expand(varSpec)
for var in varList:
vallabcmd += "\n'%s' '%s'"%(var.lower(),sDict[var].VariableLabel)
spss.Submit('''
varstocases/make score '' from q1 to q6/index aspect(score).
compute aspect = lower(aspect).
''')
spss.Submit(vallabcmd + '.')
end program.
Generating a Nice Overview Table
After running the previous example, we're good to go. We'll now create a nice overview table of q1 to q6 by running the syntax below.
variable labels q "Answer Category".
*2. Generate table.
crosstabs question by q/cells row.
Result
Final Notes
Our final table can also be created with CTABLES but not all SPSS users have a license for it. Alternatively, TABLES can do the trick but it's available only from (rather challenging) syntax which is no longer documented.
In our daily work, we routinely use this tool for creating charts rather than tables. One example, the stacked bar chart will be discussed in next week’s tutorial.
SPSS TUTORIALS