[Zope] [ANN] TextIndexNG 1.05 Beta 1 released

Andreas Jung andreas@andreas-jung.com
Sat, 14 Sep 2002 09:49:20 -0700


After a longer period of time, I am pleased to announce the release of

                          TextIndexNG 1.05 beta1 

TextIndexNG is a pluggable index for the ZCatalog that enhances the
fulltext
indexing capabilities of Zope by providing the following features:

    * support for document converters (HTML, PDF, WinWord, PowerPoint,
      Postscript). Custom converters can be easily added

    * stemmer support for 12 languages

    * optional support for right truncation

    * similarity search (soundex, metaphone support) (for english)

    * NEAR search

    * phrase search

    * pluggable query parsers (two parsers included)

    * stop words support

    * new test tab for interactive testing

    * faster than Zopes old TextIndex

    * full unicode support (new)

    * normalization support (new)

    * new similarity algorithm: double metaphone (new)

    * new TXNGSplitter

    * new vocabulary browser

Changes:

    * added full wildcard support for CLLexicon and StandardLexicon

    * rewrote Stemmer module (now fully unicode compliant)

    * unittests code cleanup

    * query evaluation refactored

    * Parser API changed to return a parse tree instead of a Python
expression

    * new parse tree evaluator added

    * PyQueryParser: now accepts a minus sign as prefix of a word to
indicate
      NOT. Searching for "foo -bar" will be recognized as "foo AND NOT
bar". In
      addition the syntax for "ANDNOT" has been changed to "AND NOT".

    * stopword handling through registry

    * added double metaphone algorithm for similarity search

    * Splitter handling changed: The new TXNGSplitter has been added. It
      supports both strings and unicode strings and supercedes the
      functionalities of all other existing splitters for Zope.
TXNGSplitter is
      the only splitter that will be used by TextIndexNG. The "index
numbers"
      options has been removed both from the splitter and the ZMI. In
addition
      the splitter now accepts an optional set of characters that are
      recognized to be valid inside words. This allows you to index
common
      words like "C++" or "python-22.lib" when you specify "+.-" as
valid word
      characters.
     
    * Python C extensions compile now under Windows (Binary distribution
will
      be available for Windows)

    * normalizer support added

    * full unicode support

    * the add form for TextIndexNG now uses the registries to obtain
      informations about registered componenents instead of hardcoded
values.

    * fixed problem with changed API of the Interface packages (backport
from
      Zope 3 to Zope 2.6)

    * added vocabulary browser

    * lots of code cleanup

    * bug fixes...

    * add statistics tab to ZMI

    * fixed serious bug in TXNGSplitter due to missing encoding
parameter

    * minor ZMI adjustments

    * using converters no longer raises an exception when a converter
could not
      be found for the mime-type of a document


Requirements:

    * Zope 2.5 or Zope CVS trunk checkout

Documentation:

    * http://www.zope.org/Members/ajung/TextIndexNG/wiki

Download:

    * http://www.zope.org/Members/ajung/TextIndexNG/ or

    * http://sourceforge.net/project/showfiles.php?group_id=50052


 
---------------------------------------------------------------------
   -    Andreas Jung                     http://www.andreas-jung.com   -
  -   EMail: andreas at andreas-jung.com                              -
   -            "Life is too short to (re)write parsers"               -
 
---------------------------------------------------------------------