Progress
Internationalization Guide
Techniques For Working With Multi-byte Characters
The following techniques might save you time and trouble:
Choosing the Appropriate Unit Of Measure
Several Progress 4GL elements, including the LENGTH function, the OVERLAY statement, the SUBSTRING function, and the SUBSTRING statement, let you specify the unit of measure as the character, the byte, or the column. If you choose the wrong unit of measure, you might split or overlay a multi-byte character. Consider the following example:
![]()
The example defines a character variable and sets it to a string of seven characters, the fourth of which is double byte. The example then overlays a string of four characters, all single byte, on the original string, starting at position one and continuing for four positions. Unfortunately, the unit of measure is the byte (specified by RAW), so the fourth byte of the second string, which is the character z, overlays the fourth byte of the original string, which is the lead-byte of the double-byte character.
Figure 8–5 shows how the z in the second string overlays the lead byte of the double-byte character in the original string.
Figure 8–5: A Single-byte Character Overlaying a Lead Byte
![]()
All that remains of the multi-byte character is the trail-byte, as shown in Figure 8–6.
Figure 8–6: Result Of a Single-byte Character Overlaying a Lead Byte
![]()
To fix this error, change the unit of measure to characters. The corrected program is as follows:
![]()
The corrected program produces the string shown in Figure 8–7.
Figure 8–7: String Produced By an OVERLAY Statement Whose Unit Of Measure Is the Character
![]()
Testing Character Strings For Multi-byte Characters
To determine whether a character string contains multi-byte characters, use the LENGTH function, which returns the number of characters, bytes, or columns in a string. The syntax is:
string
A character expression. The specified string can contain double-byte characters.
type
A character expression that indicates whether you want the length of a string in character units, bytes, or columns. A double-byte character registers as one character unit. By default unit of measurement is character units.
There are three valid types: CHARACTER, RAW, and COLUMN. The expression "CHARACTER" indicates that the length is measured in characters, including double-byte characters. The expression "RAW" indicates that the length is measured in bytes. The expression "COLUMN" indicates that the length is measured in columns. If you specify the type as a constant expression, Progress validates the type specification at compile time. If you specify the type as a variable expression, Progress validates the type specification at run time.
raw-expression
A function or variable name that returns a raw value.
To use the technique, call LENGTH twice: once with the CHARACTER option, which returns the length in characters, and once with the RAW option, which returns the length in bytes. Then, compare the two lengths. If they are equal, the string contains only single-byte characters. Else, the string contains at least one multi-byte character.
The following examples illustrate the technique:
The first example tests a character string consisting of one double-byte character. Since the length of the string in characters (1) does not match the length in bytes (2), the example
displaysMulti-byte characters in the string:
![]()
The second example tests a character string consisting of three single-byte characters. Since the length of the string in characters (3) matches the length in bytes (3), this example displays
No multi-byte characters in the string.
Testing For a Lead-Byte Value
The next technique involves testing a byte for a lead-byte value. Lead bytes (and trail bytes) often have special values to distinguish them. Table 8–5 lists the lead-byte and trail-byte values for the multi-byte code pages that Progress supports.
NOTE: You cannot always assume that a byte with a lead-byte value is a lead byte, or that a byte with a trail-byte value is a trail byte. This is because the possible values for trail bytes overlap those of lead bytes and single bytes. For example, the value 164 can correspond to a lead byte or to a trail byte. To determine which it is, you must inspect the string.To determine if a byte has a lead-byte value, use the IS–LEAD–BYTE function, which evaluates a character expression and returns TRUE if the first byte of the first character of the character string has a value within the range permitted for lead bytes. Otherwise, IS–LEAD–BYTE returns FALSE. IS–LEAD–BYTE has the following syntax:
string
A character expression (a constant, field name, variable name, or any combination of these) whose value is a character.
In the following example, IS–LEAD–BYTE examines a string whose first character is single byte. Since the first byte of the first character of the string is not a lead byte, its value is not within the range permitted for lead bytes, IS–LEAD–BYTE returns FALSE, and the example displays
Lead: no
:
![]()
The following example is identical to the preceding example except that the first character of the string is double byte. Since the first byte of the first character of the string is a lead byte, its value falls within the range permitted for lead bytes, IS–LEAD–BYTE returns TRUE, and the example displays
Lead: Yes
:
![]()
Copyright © 2004 Progress Software Corporation www.progress.com Voice: (781) 280-4000 Fax: (781) 280-4095 |