SPSS Tutorials

BASICS REGRESSION T-TEST CHI-SQUARE TEST ANOVA

Convert Characters to Codepoints and Reversely

Summary

Computers can only process numbers. They represent letters and punctuation marks as numbers as well. As explained in Unicode, one of the most basic systems for mapping all characters onto numbers is ASCII. SPSS can convert characters to ASCII (and other) single byte codepoints and reversely. This is necessary for removing non printing characters from string variables.

Single byte codepoints are represented by decimal numbers 0 through 255. A codepoint can be converted to its corresponding character with string(codepoint,pib). For the reverse, use number(character,pib). For an overview of these and their corresponding characters, run the syntax below. For this and the following examples it is recommended you switch to Unicode mode if you're not already.

SPSS Syntax Example 1

*Show all codepoints and corresponding characters.

input program.
string character(a1).
loop codepoint = 0 to 255.
compute character = string(codepoint,pib).
end case.
end loop.
end file.
end input program.
exe.

Removing Non Printing Characters

SPSS Non Printing Characters in String Non printing characters in string.

When reading in data from other file formats than .sav, non printing characters may appear in strings. This is the case with john_doe.sav. Before fixing this variables, let's first just see what's in there. We can do so with the syntax below, using VECTOR and SUBSTR within a LOOP.

SPSS Syntax Example 2

*1. Set working directory to file location.

cd 'd:/downloaded'.

*2. Open data file.

get file 'john_doe.sav'.

*3. Inspect codepoint of each character.

vector s(20).
loop #pos = 1 to char.length(name).
compute s(#pos) = number(substr(name,#pos,1),pib).
end loop.
exe.

The previous syntax creates new variables holding the codepoints for the characters in the string variable. We'll now take a look at the ASCII table (or the result of the first syntax example). We'll see that codepoints 9-13 and 28-31 are non printing characters present in the string. We can now delete the temporary helper variables and clean up the string variable with the syntax below.

SPSS Syntax Example 3

*1. Delete tmp variables after inspection.

delete variables s1 to s20.

*2. Remove non printable characters.

do repeat char = 9 to 13 28 to 31.
compute name = replace(name,string(char,pib),'').
end repeat print.
exe.

Sorting Letters in Strings

Converting codepoints to letters also renders it easy to sort letters within a string. Presuming only lower case letters, we can just loop through 97 TO 122 for representing a-z. The syntax below does so by combining SUBSTR, CONCAT and RTRIM.

SPSS Syntax Example 3

*1. Create test data.

data list free/answers (a10).
begin data
abc cba dbabce ecabad abacadae edc
end data.

*2. Sort characters in string.

string sorted(a10).
loop #char = 97 to 122.
loop #pos = 1 to char.length(answers).
if char.substr(answers,#pos,1) = string(#char,pib) sorted = concat(rtrim(sorted),char.substr(answers,#pos,1)).
end loop.
end loop.
exe.

Previous tutorial: SPSS – Five Handy System Settings

Next tutorial: SPSS Datasets Tutorial 1 – Basics

Let me know what you think!

*Required field. Your comment will show up after approval from a moderator.

This tutorial has 2 comments