Introduction
Merging data with ADD FILES
can result in nonsensical data. This occurs when variables or values have different meanings over files. By comparing dictionaries over files, inconsistently coded variables are quickly detected.
So What's the Problem?
When files are merged with ADD FILES
, inconsistent dictionary information is discarded. This will happen, for example, if a variable v1
means "gender" in one file and "employment status" in another. In this case, values indicating gender will seem to indicate employment status or vice versa. For a demonstration, see SPSS Add Files - Cautionary Note.
So What's the Solution?
SPSS Compare Dictionaries Tool- Put the files you'd like to merge in a single folder. Make sure there's no other .sav files in this folder.
- Close all open datasets.
- Make sure you have the SPSS Python Essentials installed.
- Download and install SPSS Dictionary Checker. Note that this is an SPSS custom dialog.
- Go to syntax file with a "save list" of variables to be written. Click and run the pasted syntax. . Copy-paste the path to the data folder into the dialog and select whether you'd like a
- Clicking the tool's button will take you to this tutorial. We very much appreciate your feedback on it.
Explanation of the Dictionary Overview
SPSS Compare Dictionaries Tool Result- This command will always produce a new dataset presenting an overview of the dictionary comparison.
- Each row represents either a value or a variable with value labels and variable labels over different source files.
- Empty cells indicate that either the variable wasn't present in one or more source files or no label was defined.
- Value inconsistencies (
val_incon
) is (the number of distinct labels - 1). Empty cells are not counted as distinct labels. - Variable inconsistencies (
var_incon
) is the sum of all value inconsistencies for each variable. - Variables are sorted in descending order according to variable inconsistencies. That is, the "worst" variables are moved to the top of the dataset.
- Variables with zero variable inconsistencies are removed from the overview by default. Fully consistent data files, therefore, will result in an empty new dataset.
- The command is case insensitive. All labels are converted to lower case before comparing them.
Notes on the Syntax File
- The dictionary check may write a new syntax file containing all consistently coded variables.
- The file is called "savelist.sps" and will appear in the source data folder.
- If this file already exists, it will be overwritten.
- Before use, variables can be added to or removed from this "save list".
- In order to use it, first merge all files and then run this syntax file over the result. It will drop all variables that are not in the "save list".
THIS TUTORIAL HAS 14 COMMENTS:
By Ruben Geert van den Berg on July 29th, 2015
I just retested the tool and it ran fine on my system. Are you sure there's multiple data files in the folder you specified? And Python is installed and running in version 22 but can you confirm that for version 21 as well? Because the error suggests that it isn't.
By Johannes on February 8th, 2016
I get an error also everything is correctly installed and other of your custom dialogues work
File "C:\PROGRA~1\IBM\SPSS\STATIS~1\22\Python\Lib\site-packages\spss\dataStep.py", line 1010, in insert
raise SpssError,error
spss.errMsg.SpssError: [errLevel 27] Invalid variable name.
By Ruben Geert van den Berg on February 8th, 2016
Some other users have reported problems with this tool too. Perhaps I'll rewrite it from scratch or remove it from the site altogether. However, I don't have the time to look into it right now. Sorry about that!
By Isabel Bradburn on October 27th, 2016
Hi Ruben, I imported the tool thru custom dialog and got the pop-up window, pasted syntax and ran it - but it didn't work. The outfile noted "Traceback File "" line 70 in , File "" line 22 in dictcheck, AttributeError: 'NoneType' object has o attribute 'lower' Thoughts? Many thanks
By Ruben Geert van den Berg on October 28th, 2016
Hi Isabel!
My first guess is that there's zero or one .sav files in the target folder. In this case, there's nothing to compare. The tool should check for this but since it's rather old, I didn't include such a check. I'll probably rewrite it at some point but we're currently rebuilding the entire website so that may take a while.