Progress
Internationalization Guide


Default Word-break Behavior Of Characters In Multi-byte Code Pages

Table 8–10 describes the default word-break behavior of characters in multi-byte code pages.

NOTE: Table 8–10 assumes that word-break tables are Version 9 Type 3. For more information on word-break tables, see the "Word-break Tables" section in Understanding Character Processing Tables."

Table 8–10: Default Word-break Behavior Of Characters In
Multi-byte Code Pages 

If the Code Page Is...
And the Characters Are...
The Characters Behave
(By Default)...
Double byte
Single Byte
Depending on whether they are alphabetic or nonalphabetic. This is specified in the code page’s character-attribute table.
To change the default word-break behavior, supply a word-break table input file.
Double byte
Double Byte
As separate words.
UTF-8
Single Byte
Depending on whether they are alphabetic or nonalphabetic. This is specified in the code page’s character-attribute table.
To change the default word-break behavior, supply a word-break table input file.
UTF-8
Two-byte UTF-8
Corresponding to the USE_IT word-delimiter attribute.
UTF-8
Three-byte UTF-8
As separate words.

NOTE: The default word-break behavior can be changed only for single-byte characters.

For more information on character-attribute tables, see the "Character Attribute Tables" section. For more information on modifying word-break tables, see the "Creating and Modifying Word-break Tables" section. For more information on word-delimiter attributes, see the "Understanding Word-delimiter Attributes" section. All these sections resides in Understanding Character Processing Tables."


Copyright © 2004 Progress Software Corporation
www.progress.com
Voice: (781) 280-4000
Fax: (781) 280-4095