- Unicode systems are ABAP systems based on Unicode character representation with a code page for Unicode and which have an appropriate underlying operating system and database.
- Non-Unicode systems are ABAP systems with code pages for single byte code and double byte code. These systems are no longer supported in the current release.
- UTF-16 is the system code page of a Unicode system.
- The ABAP programming language supports the character representation UCS-2, which basically matches the UTF-16 representation and covers its characters (except the characters in the surrogate area).
For use in a Unicode system, a program must be identified as a Unicode program. Non-Unicode programs cannot be used in a Unicode system and are hence fully obsolete.
Note: Before Unicode, SAP used various different codes for representing characters in different fonts, such as ASCII, EBCDIC as single-byte code pages, or double-byte code pages:
- ASCII (American Standard Code for Information Interchange) encodes every character with one byte. This means that a maximum of 256 characters can be displayed (strictly speaking, standard ASCII only encodes one character using 7 bit and can therefore only represent 128 characters. The extension to 8 bit is introduced with ISO-8859). Examples of common code pages are ISO-8859-1 for Western European, or ISO-8859-5 for Cyrillic fonts.
- EBCDIC (Extended Binary Coded Decimal Interchange) also encodes each character using one byte, and can therefore also represent 256 characters. For example, EBCDIC 0697/0500 is an IBM format that has been used on the AS/400 platform (now known as IBM System i) for Western European fonts.
- Double-byte code pages require between 1 and 2 bytes per character. This enables the representation of 65,536 characters, of which only 10,000 to 15,000 characters are normally used. For example, the code page SJIS is used for Japanese and BIG5 for traditional Chinese fonts.
Switching to Unicode
Before Unicode support was introduced, many ABAP programmers assumed that one character corresponded to one byte. Therefore, before a non-Unicode system is converted to Unicode, ABAP programs must be changed wherever an explicit or implicit assumption is made about the internal length of a character. This mainly affects
- character processing and byte string processing
- access to structures. The latter case applies since flat structures in non-Unicode programs are handled like character-like data objects and some programming techniques exploit this fact. The structure fragment view can be used to handle structures.
In a Unicode system, only Unicode programs can be executed. Non-Unicode programs can only be executed in non-Unicode systems. Before converting to a Unicode system, the profile parameter abap/unicode_check should be set to "ON" so that only the execution of Unicode programs is permitted.
Notes:
- The program RSUNISCAN_FINAL can be used instead of transaction UCCHECK.
- In the current release, non-Unicode programs can no longer be used.