Progress
Programming
Handbook
Using Word-break Tables
You can create word-break tables that specify word separators using a rich set of criteria. To specify and work with word-break tables involves:
Specifying Word Delimiter Attributes
As mentioned previously, to break down the contents of a word-indexed field into individual words, Progress needs to know which characters delimit words and which do not. The distinction can be subtle and sometimes depends on context. For example, consider the function of the dot in the character strings in Table 9–5.
In the first character string, the dot functions as a decimal point and does not divide one word from another. Thus, you can query on the word “$25,125.95.” In the second character string, by contrast, the dot functions as a period, dividing the word “received” from the word “call.”
To help define word delimiters systematically while allowing for contextual variation, Progress provides eight word delimiter attributes, which you can use in word-break tables. The eight word delimiter attributes appear in Table 9–6.
Understanding the Syntax of Word-break Tables
Word delimiter attributes form the heart of word break tables, and you specify them using the following syntax:
symbolic-name
The name of a symbol.
For example: DOLLAR-SIGN
symbol-value
The value of the symbol.
For example: ’$’
NOTE: Although some versions of Progress let you compile word-break tables that omit all items within the second pair of square brackets, Progress Software Corporation (PSC) recommends that you always include these items. If the source-code version of a compiled word-break table lacks these items, and the associated database is not so large as to make this unfeasible, PSC recommends that you add these items to the table, recompile the table, reassociate the table with the database, and rebuild the indexes.codepage-name
The name, not surrounded by quotes, of the code page the word-break table is associated with. The maximum length is 20 characters.
For example: UTF–8
wordrules-name
The name, not surrounded by quotes, of the compiled word-break table. The maximum length is 20 characters.
For example: utf8sample
table-type
The number 2.
NOTE: Some versions of Progress allow a table type of 1. Although this is still supported, Progress Software Corporation (PSC) recommends, if feasible, that you change the table type to 2, recompile the word-break table, reassociate it with the database, and rebuild the indexes.char-literal
A character within single quotes or a symbolic–name, which represents a character in the code page.
For example: ’#’
hex-literal
A hexadecimal value or a symbolic–name, which represents a character in the code page.
For example:0xAC
decimal-literal
A decimal value or a symbolic–name, which represents a character in the code page.
For example: 39
word-delimiter-attribute
In what context the character is a word delimiter. You can use one of the following:
Examples of Word-break Tables
The following is an example of a word-break table for Unicode:
As the preceding example illustrates, word-break tables can contain comments delimited as follows:
For more examples, see the word-break tables that Progress provides in source-code form. They reside in the
DLC/prolang/convmap
directory and have the file extension.wbt
.NOTE: Progress supplies a word-break table for each code page it supports.Compiling Word-break Tables
After you create or modify a word-break table, you must compile it with the PROUTIL utility. The syntax is as follows:
src-file
The name of the word-break table file to be compiled.
rule-num
A number between 1 and 255 inclusive that identifies this word-break table within your Progress installation.
The PROUTIL utility names the compiled version of the word-break table
proword
.rule–num. For example, if rule–num is 34, PROUTIL names the compiled versionproword.34.
Associating Compiled Word-break Tables with Databases
After you compile a word-break table, you must associate the compiled version with a database using the PROUTIL utility. The syntax is as follows:
database
The name of the database.
rule-num
The value of rule–num you specified when you compiled the word-break table.
To associate the database with the default word-break rules, set rule–num to zero.
NOTE: Setting rule–num to zero associates the database with the default word-break rules for the current code page. For more information on code pages, see the Progress Internationalization Guide .Rebuilding Word Indexes
For word indexing to work as expected, the word-break table Progress uses to write the word indexes (to add, modify, or delete a record that contains a word index) and the word-break table Progress uses to read word indexes (to process a query that contains the CONTAINS operator) must be identical. To ensure this, when you associate the compiled version of a word-break table with a database, Progress writes cyclical redundancy check (CRC) values from the compiled word-break table into the database. When you connect to the database, Progress compares the CRC values in the database to the CRC value in the compiled version of the word-break table. If they do not match, Progress displays an error message and terminates the connection attempt.
If a connection attempt fails and you want to avoid rebuilding the indexes, you can try associating the database with the default word-break rules.
NOTE: This might invalidate the word indexes and require you to rebuild them anyway.To rebuild the indexes, you can use the PROUTIL utility with the IDXBUILD or IDXFIX qualifier.
The syntax of PROUTIL with the IDXBUILD qualifier is:
Operating System
Syntax UNIX
Windowsproutil
db-name -C idxbuild [all
]
[ -T dir-name ][ -TB blocksize ]
[ -TM n ] [ -B n ]
The syntax of PROUTIL with the IDXFIX qualifier is:
For more information on the PROUTIL utility, see the Progress Database Administration Guide and Reference.
Providing Access to the Compiled Word-break Table
To allow database servers and shared-memory clients to access the compiled version of the word-break table, it must reside either in the Progress installation directory or in the location pointed to by the environment variable
PROWD
rule–num. For example, if the compiled word-break table has the nameproword.34
and resides in theDLC/mydir/mysubdir
directory, set the environment variablePROWD34
toDLC/mydir/mysubdir/proword.34
.NOTE: Although the name of the compiled version of the word-break table has a dot, the name of the corresponding environment variable does not.
Copyright © 2004 Progress Software Corporation www.progress.com Voice: (781) 280-4000 Fax: (781) 280-4095 |