SPSS Unicode mode is a setting which implies that all text is encoded as UTF-8 (Unicode Tranformation Format - 8 bit). Note that this tutorial leans substantively on Unicode.
SPSS Unicode Mode - What and Why
- Up to version 15, all character encoding in SPSS was based on code pages. SPSS using code pages is now referred to as SPSS code page mode.
- Starting from version 16, however, Unicode (as UTF-8) has been supported as well. SPSS using UTF-8 is referred to as Unicode mode. Note that this encoding doesn't only apply to string variables but to syntax files as well.
- For SPSS versions 21 and onwards, whether to use Unicode mode or not is explicitly asked when the program is first started.
SPSS Unicode and Variable Widths
- The single most important thing to understand about UTF-8 in SPSS is that any character may consist of 1, 2 or 3 bytes. Code page mode is restricted to single byte characters so that characters and bytes correspond.
- In SPSS, variable width is defined as the number of bytes (not characters) that may be used for string values. To stay on the safe side, one could use three times the number of characters as variable widths.
- When a data file that was saved in code page mode is opened in Unicode mode, SPSS automatically triples all string variable widths to ensure that they are long enough.
SPSS Unicode Mode and String Functions
- Basic string functions in SPSS (such as INDEX) apply to bytes (not characters). Since code page mode uses only single byte characters, bytes and characters correspond and basic string functions can safely be used. In Unicode mode they can only be used on string values that don't hold any multibyte characters.
- SPSS' character string functions (such as
CHAR.INDEX) apply to characters (not bytes). They can always be used (both in Unicode as wel as in code page mode.)
- Lastly, RTRIM is automatically applied in Unicode mode. Including it in your syntax anyway is the safest option since this renders your syntax valid in both Unicode mode as well as code page mode.
Switching Between Unicode and Code Page Mode
- If Unicode is to be used,
SET UNICODE ON.will switch on Unicode mode. Note that this command can only be run if there are no open datasets.
SET UNICODE OFF.switches SPSS into code page mode.
SHOW UNICODE.will show whether SPSS is in Unicode mode or code page mode.