Computers can only process numbers. They represent letters and punctuation marks as numbers as well. As explained in Unicode, one of the most basic systems for mapping all characters onto numbers is ASCII. SPSS can convert characters to ASCII (and other) single byte codepoints and reversely. This is necessary for removing non printing characters from string variables.
Single byte codepoints are represented by decimal numbers
255. A codepoint can be converted to its corresponding character with
string(codepoint,pib). For the reverse, use
number(character,pib). For an overview of these and their corresponding characters, run the syntax below. For this and the following examples it is recommended you switch to Unicode mode if you're not already.
SPSS Syntax Example 1
loop codepoint = 0 to 255.
compute character = string(codepoint,pib).
end input program.
Removing Non Printing CharactersNon printing characters in string.
When reading in data from other file formats than .sav, non printing characters may appear in strings. This is the case with john_doe.sav. Before fixing this variables, let's first just see what's in there. We can do so with the syntax below, using VECTOR and
SUBSTR within a
SPSS Syntax Example 2
*2. Open data file.
get file 'john_doe.sav'.
*3. Inspect codepoint of each character.
loop #pos = 1 to char.length(name).
compute s(#pos) = number(substr(name,#pos,1),pib).
The previous syntax creates new variables holding the codepoints for the characters in the string variable. We'll now take a look at the ASCII table (or the result of the first syntax example). We'll see that codepoints 9-13 and 28-31 are non printing characters present in the string. We can now delete the temporary helper variables and clean up the string variable with the syntax below.
SPSS Syntax Example 3
delete variables s1 to s20.
*2. Remove non printable characters.
do repeat char = 9 to 13 28 to 31.
compute name = replace(name,string(char,pib),'').
end repeat print.
Sorting Letters in Strings
Converting codepoints to letters also renders it easy to sort letters within a string. Presuming only lower case letters, we can just loop through
97 TO 122 for representing
a-z. The syntax below does so by combining SUBSTR, CONCAT and RTRIM.
SPSS Syntax Example 3
data list free/answers (a10).
abc cba dbabce ecabad abacadae edc
*2. Sort characters in string.
loop #char = 97 to 122.
loop #pos = 1 to char.length(answers).
if char.substr(answers,#pos,1) = string(#char,pib) sorted = concat(rtrim(sorted),char.substr(answers,#pos,1)).