SAP ABAP Central: 2.2 ABAP Character Sets

Next »»

Application Server ABAP supports only Unicode systems in the current release.

Unicode systems are ABAP systems based on Unicode character representation with a code page for Unicode and which have an appropriate underlying operating system and database.
Non-Unicode systems are ABAP systems with code pages for single byte code and double byte code. These systems are no longer supported in the current release.

Unicode (ISO/IEC 10646) with the character set UCS covers all existing characters. A variety of Unicode character formats is possible for the Unicode character set, for example UTF, in which a character can occupy between one and four bytes or UCS-2, where a character occupies two bytes.

UTF-16 is the system code page of a Unicode system.
The ABAP programming language supports the character representation UCS-2, which basically matches the UTF-16 representation and covers its characters (except the characters in the surrogate area).

A restriction to UCS-2 in ABAP means that a character is always assumed as having a length of two bytes. This generally only produces problems if character strings are truncated in the middle of a character representation from the UTF-16 surrogate area or if individual characters from sets of characters are compared in character string processing.

For use in a Unicode system, a program must be identified as a Unicode program. Non-Unicode programs cannot be used in a Unicode system and are hence fully obsolete.

Note: Before Unicode, SAP used various different codes for representing characters in different fonts, such as ASCII, EBCDIC as single-byte code pages, or double-byte code pages:

ASCII (American Standard Code for Information Interchange) encodes every character with one byte. This means that a maximum of 256 characters can be displayed (strictly speaking, standard ASCII only encodes one character using 7 bit and can therefore only represent 128 characters. The extension to 8 bit is introduced with ISO-8859). Examples of common code pages are ISO-8859-1 for Western European, or ISO-8859-5 for Cyrillic fonts.
EBCDIC (Extended Binary Coded Decimal Interchange) also encodes each character using one byte, and can therefore also represent 256 characters. For example, EBCDIC 0697/0500 is an IBM format that has been used on the AS/400 platform (now known as IBM System i) for Western European fonts.
Double-byte code pages require between 1 and 2 bytes per character. This enables the representation of 65,536 characters, of which only 10,000 to 15,000 characters are normally used. For example, the code page SJIS is used for Japanese and BIG5 for traditional Chinese fonts.

Using these character sets, all languages were able to be handled individually in one AS ABAP. Problems generally occurred when texts from different incompatible character sets were mixed in a central system. The exchange of data between systems with incompatible character sets was also a potential source of problems.

Switching to Unicode

Before Unicode support was introduced, many ABAP programmers assumed that one character corresponded to one byte. Therefore, before a non-Unicode system is converted to Unicode, ABAP programs must be changed wherever an explicit or implicit assumption is made about the internal length of a character. This mainly affects

character processing and byte string processing
access to structures. The latter case applies since flat structures in non-Unicode programs are handled like character-like data objects and some programming techniques exploit this fact. The structure fragment view can be used to handle structures.

In order to convert a program into Unicode, Unicode checks active must be selected in the program attributes. The transaction UCCHECK supports the activation of this check for existing programs. If this property is set, the program is identified as a Unicode program.

In a Unicode system, only Unicode programs can be executed. Non-Unicode programs can only be executed in non-Unicode systems. Before converting to a Unicode system, the profile parameter abap/unicode_check should be set to "ON" so that only the execution of Unicode programs is permitted.

Notes:

The program RSUNISCAN_FINAL can be used instead of transaction UCCHECK.
In the current release, non-Unicode programs can no longer be used.

«« Previous

Next »»

Index

Pages

2.2 ABAP Character Sets