Search::FreeText

Search::FreeText is a free text indexing module for medium-to-large text corpuses.
Download

Search::FreeText Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Perl Artistic License
  • Price:
  • FREE
  • Publisher Name:
  • Stuart Watt
  • Publisher web site:
  • http://search.cpan.org/~snkwatt/Search-FreeText-0.05/FreeText.pm

Search::FreeText Tags


Search::FreeText Description

Search::FreeText is a free text indexing module for medium-to-large text corpuses. Search::FreeText is a free text indexing module for medium-to-large text corpuses.SYNOPSIS my $test = new Search::FreeText(-db => ); $text->open_index(); $text->clear_index(); $text->index_document(1, "Hello world"); $text->index_document(2, "World in motion"); $text->index_document(3, "Cruel crazy beautiful world"); $text->index_document(4, "Hey crazy"); $text->close_index(); $text->open_index(); foreach ($text->search("Crazy", 10)) { print "$_->, $_->n"; }; $text->close_index();This module provides free text searching in a relatively open manner. It allows a persistent inverted file index to be constructed and managed (within limits), and then to be searched fairly efficiently. The module depends on a DBM module of some kind to manage the inverted file (DB_File is usually the best choice, as it is quite fast, quite scaleable, and accepts the long values that are needed for performance.The free text searching algorithm used is the BM25 weighting scheme described in Robertson, S. E., Walker, S., Beaulieu, M. M., Gatford, M., and Payne, A. (1995). Okapi at TREC-4, in NIST Special Publication 500-236, the Fourth Text Retrieval Conference (TREC-4), pages 73-96.Much of the module depends on an open lexical analysis system, which is implemented by Search::FreeText::LexicalAnalysis. This is where all the word splitting and stemming is handled (Lingua::Stem is used for the stemming).Using the module is quite simple: you can open an index and close it, and while it is open you add documents as strings, each with a key of your own choosing. You can search the corpus using a string, and you get back a list of matches, each an array of your own document key and a relevance measure. So, for example, the keys might be database table keys, URLs, file names, anything like that will do. This makes Search::FreeText a very useful package to implement fairly efficient and high quality search systems. Requirements: · Perl


Search::FreeText Related Software