Progress
Internationalization Guide


Collating Multi-byte Characters

When you sort multi-byte characters, you face a question that you do not face when sorting single-byte characters: in what order should the different types of character sort? That is, should all one-byte characters sort before all two-byte characters? Should all two-byte characters sort before all three-byte characters? And how should the user-defined characters of the SHIFT–JIS code page sort?

The default collation table that Progress provides for the double-byte Asian languages (Chinese, Japanese, and Korean) sorts all single-byte characters before all double-byte characters. Table 8–9 shows how Progress sorts Japanese characters.

Table 8–9: Japanese Collation Order By Character Type 
Character Type
Range of Values
Single Byte (ASCII)
0–127
Single Byte (half-width Katakana)
160–223
Lead Byte (range 1)
129–159
Lead Byte (range 2)
224–239
User-defined (Gaiji)
240–252

NOTE: You can modify the sort order of lead bytes, though not the sort order of trail bytes. For more information on modifying the sort order of lead bytes, see the comments in the BASIC collation table for the SHIFT–JIS code page in the japanese.dat file in the DLC/prolang/convmap directory.

Sort Order Of Trail Bytes

For a given lead byte, trail bytes sort in binary order. For example, if one double-byte character has a lead-byte value of 159 and a trail-byte value of 100, and another double-byte character has a lead-byte value of 159 and a trail-byte value of 170, the character with byte values 159 and 100 sorts before the character with byte values 159 and 170. Figure 8–8 illustrates this.

Figure 8–8: Sorting Double-byte Characters


Copyright © 2004 Progress Software Corporation
www.progress.com
Voice: (781) 280-4000
Fax: (781) 280-4095