Progress
Programming
Handbook


Word Indexing External Documents

To create a word index on an existing document, import the text into a Progress database, then index the text by line or by paragraph.

Indexing by Line

To index a set of documents by line, you might create a table called line with three fields: document_name, line_number, and line_text. Define the primary index on document_name and a word index on line_text. Next, write a text-loading Progress program that reads each document and creates a line record for each line in the document. To decrease the amount of storage required by the line table and to normalize the database, you might replace its document_name field with a document_number field, and create a document table to associate a document_name with each document_number.

When base documents change, you must update the line index. You can store a document ID as part of the record for each line. When a document changes, you can delete all lines with that document ID and reload the document.

The following program queries the line table using the word index:

DEFINE VARIABLE words AS CHAR FORMAT "x(60)" 
  LABEL "To find document lines, enter search words". 
REPEAT: 
  UPDATE words. 
  FOR EACH line WHERE line_text CONTAINS words: 
    DISPLAY line. 
  END. 
END. 

The example prompts for a string of words, then displays each line that matches the search criteria.

Indexing by Paragraph

Instead of indexing by line, you can index by paragraph. The technique resembles line indexing, but the text from a paragraph can be much longer. You can use paragraph indexes the same way you use line indexes. You can also index by chapter, by page, and by other units of text. The only difference is how your text-loading program parses the document into character fields. Otherwise, your word search code, as in the line table example, can be identical.


Copyright © 2004 Progress Software Corporation
www.progress.com
Voice: (781) 280-4000
Fax: (781) 280-4095