cs::WordIndex - a keyword index
use cs::WordIndex;
$I = new cs::WordIndex("indexfile.gz");
$I->ProcessFile("textfile");
$I->Save();
The cs::WordIndex module defines a keyword index object, with methods for adding files to the index and searching the index.
Both the index files and the textfiles may be compressed.
The main index format consists of one line per keyword, of the form
keyword hits...
where hits is a space separated list of ``file/lines'' pairs, where ``file'' is a pathname relative to the directory of the index file, and ``lines'' is a comma separated list of lines in which the keyword appears. Adjacent line citations may be coalesced into ranges ``n-m''.
Take some search results after collation by the CollateResults method and fetch the lines involved, returning an arrayref for results, each an arrayref of the form file, lineno, line, words...].
Create a new WordIndex object attached to the file named indexfile (if specified). The optional parameter bigmode, if specified and true, turns on deferred update mode where index updates are queued for application at index Save() time.
Read the text available in the file named filename and add it to the index.
Read the text available on the input stream FILE and add it to the index. filename is used in error reports.
Add the keyword word to the index as at line lineno of the file filename.
Return an hashref of the occurences of the specified word in the indexed files. The keys of the hashref are filenames and the values are arrayrefs containing line numbers. The optional parameter uniq, if supplied and true, ensures the line number arrayrefs contain no duplicates.
Save the index to the file named filename, or to the file specified when this index object was created if filename is omitted.
Add the contends of the specified index file to this index.
Remove all mention of the specified filename from the index.
Return the Word indices for all words matching the supplied regexp as a hashref mapping words to word indices as returned by the WordIndex method.
This method is intended as an aid to rating results or reporting results.
For each search result supplied (in the form of a hashref mapping word to word index as from the SearchRE method), populate the supplied resulthash with a map of filename to linemaps where the linemaps are a hashref mapping line number to an arrayref of words on the line.
Cameron Simpson <cs@zip.com.au> 24apr2002