[Zope] [ANN] TextIndexNG 1.05alpha1 released

Andreas Jung Andreas Jung <lists@andreas-jung.com>
Thu, 20 Jun 2002 07:17:03 -0400


I am pleased to announce the release of TextIndexNG 1.05alpha1.

TextIndexNG is a pluggable index for the ZCatalog that enhances
the fulltext indexing capabilities of Zope by providing
the following features:

    * support for document converters (HTML, PDF, WinWord, PowerPoint,
      Postscript). Custom converters can be easily added

    * stemmer support for 12 languages

    * optional support for right truncation

    * similarity search (soundex, metaphone support) (for english)

    * NEAR search

    * phrase search

    * pluggable query parsers (two parsers included)

    * stop words support

    * new test tab for interactive testing

    * faster than Zopes old TextIndex

    * full unicode support (new)

    * normalization support (new)

    * new similarity algorithm: double metaphone (new)

Requirements:

    * Zope 2.5 or Zope CVS trunk checkout

Documentation:

    * http://www.zope.org/Members/ajung/TextIndexNG/wiki

Download:

    * http://www.zope.org/Members/ajung/TextIndexNG/ or

    * http://sourceforge.net/project/showfiles.php?group_id=50052

Changes:

    * added full wildcard support for CLLexicon and StandardLexicon

    * rewrote Stemmer module (now fully unicode compliant)

    * unittests code cleanup

    * query evaluation refactored

    * Parser API changed to return a parse tree instead of a Python 
expression

    * new parse tree evaluator added

    * PyQueryParser: now accepts a minus sign as prefix of a word to 
indicate
      NOT. Searching for "foo -bar" will be recognized as "foo AND NOT 
bar".
      In addition the syntax for "ANDNOT" has been changed to "AND NOT".

    * stopword handling through registry

    * added double metaphone algorithm for similarity search

    * Splitter handling changed: The new TXNGSplitter has been
      added. It  supports both strings and unicode strings and supercedes
      the functionalities of all other existing splitters for Zope.
      TXNGSplitter is  the only splitter that will be used by
      TextIndexNG. The "index numbers"  options has been removed both
      from the splitter and the ZMI. In addition  the splitter now
      accepts an optional set of characters that are recognized  to be
      valid inside words. This allows you to index common words like
      "C++"  or "python-22.lib" when you specify "+.-" as valid word
      characters.

    * Python C extensions compile now under Windows (Binary distriution
      will be available for Windows)

    * normalizer support added

    * full unicode support

    * the add form for TextIndexNG now uses the registries to obtain
      informations about registered componenents instead of hardcoded 
values.

    * lots of code cleanup

    * bug fixes...

Note:

   I will not be reachable during July because of a longer vaction.
   Please report any problems or bugs to the tracker on Sourceforge
   project page.

    ---------------------------------------------------------------------
   -    Andreas Jung                     http://www.andreas-jung.com   -
  -   EMail: andreas at andreas-jung.com                              -
   -            "Life is too short to (re)write parsers"               -
    ---------------------------------------------------------------------