In some cases, ALTER TYPE in SPSS version 24 seems to report incorrect altered values when converting a string to a numeric value. This results in messed up data while SPSS reports that everything is fine. Let's try and replicate the problem with the example below.
Example
We imported some .csv data in which some numeric values were flagged with an “a”. This indicates that these were estimated rather than measured values. Due to these flags, some variables were imported as string variables. The syntax below recreates a microsample from these data.
data list free/s1 (a2).
begin data
'' '1' '1a' '2' '2a' '3' '3a' '4' '4a' '5' '5a'
end data.
Result
Convert to Numeric with ALTER TYPE
We'll now convert our string to a numeric variable with the syntax below. Since our variable seems to hold only single digit integers, we chose to convert it into the f1 format. As an extra safety check, we'll inspect which original values are converted into which new values.
alter type s1(f1)
/print alteredvalues.
Result
This table suggests that all variables have been converted as desired. It seems SPSS made the right guess that string value 1a should be changed to 1. Furthermore, only empty string values result in system missings after the conversion. Mission accomplised?
Data View after ALTER TYPE
At first glance, everything looks great. Ok, we do have quite some system missing values after the conversion but those were all empty string values. Right?
But let's take another quick look at our original data: our flagged values weren't converted as reported in the altered values table. They all result in system missing values.
With just 1 variable and 11 cases, we immediately see the problem. However, the actual data contained some 40,000 records and it was more or less by accident that I stumbled upon the issue.
Which SPSS Versions?
My students tried to replicate the issue in SPSS versions 18 and 22. Neither version reported incorrect values because the altered values tables were empty -for this example at least.
Perhaps the /PRINT ALTEREDVALUES
subcommand was introduced in version 23 or 24 but the command syntax reference does not mention anything about it.
Thanks for reading!
THIS TUTORIAL HAS 6 COMMENTS:
By Jon Peck on July 11th, 2017
The actual data values produced by ALTER TYPE in this example are correct. Any string that cannot be correctly converted to numeric should be and is converted to sysmis. This is the same behavior you would see with the conversion transformation functions. However, the output table in V24, while it lists the input values correctly, shows the wrong output. I have reported this to Development. However, the data results are correct. Also, the V24 CSR does show the print subcommand and the alteredvalues keyword.
By Ruben Geert van den Berg on July 12th, 2017
Hi Jon!
When was the /PRINT ALTEREDVALUES subcommand introduced? Straight away with the entire command in SPSS 16?
My students told me it produced empty tables in versions 18 and 22. So that's a different problem than version 24. This suggested to me that the subcommand may have been added in version 23 or 24? However, the release history in the CSR doesn't say so.
By Ruben Geert van den Berg on July 12th, 2017
P.s. we discussed this before but I still think that ALTER TYPE should at least throw a warning if non empty string values result in system missings.
If I convert a variable with many distinct values, I don't want to visually inspect a huge table to see if everything went right. That's more work than necessary. Besides, if a variable holds more than 25 distinct values, I may not even detect the problem altogether because -if I'm correct- the ALTERED VALUES table never lists more than 25.
By Jon Peck on July 12th, 2017
I designed ALTER TYPE, and I believe that the PRINT subcommand with its options was an original feature. I noticed that the table seems to be incorrect only when a field to be converted to numeric starts with a digit but also contains nonnumeric characters. Try 'a1' for example.
By Ruben Geert van den Berg on July 12th, 2017
"I believe that the PRINT subcommand with its options was an original feature." The CSR suggests this too as the release history does not mention this being added.
Do you have access to any older SPSS versions (22 or 18)? According to some readers, the example I presented results in an empty ALTEREDVALUES table on those versions. Which is a different issue but an issue nonetheless.