[Zope] [ANN] TextIndexNG 1.05final released
Andreas Jung
andreas@andreas-jung.com
Sun, 13 Oct 2002 11:10:27 +0200
I am pleased to announce the release of
TextIndexNG 1.05 FINAL
TextIndexNG is a new pluggable index for the ZCatalog and is the most
feature complete solution for fulltext inexing under Zope. TextIndexNG
enhances the fulltext indexing capabilities of Zope by providing the
following features:
* support for document converters (HTML, PDF, WinWord, PowerPoint,
Postscript). Custom converters can be easily added
* stemmer support for 12 languages
* optional support for right truncation
* similarity search (soundex, metaphone support) (for English)
* NEAR search
* phrase search
* pluggable query parsers (two parsers included)
* stop words support
* new test tab for interactive testing
* faster than Zopes old TextIndex
* full unicode support (new)
* normalization support (new)
* new similarity algorithm: double metaphone (new)
* new TXNGSplitter
* new vocabulary browser
Changes:
* added full wildcard support for CLLexicon and StandardLexicon
* rewrote Stemmer module (now fully unicode compliant)
* unittests code cleanup
* query evaluation refactored
* Parser API changed to return a parse tree instead of a Python
expression
* new parse tree evaluator added
* PyQueryParser: now accepts a minus sign as prefix of a word to
indicate NOT. Searching for "foo -bar" will be recognized as "foo AND
NOT
bar". In addition the syntax for "ANDNOT" has been changed to
"AND NOT".
* stopword handling through registry
* added double metaphone algorithm for similarity search
* Splitter handling changed: The new TXNGSplitter has been added. It
supports both strings and unicode strings and supercedes the
functionalities of all other existing splitters for Zope. TXNGSplitter
is the only splitter that will be used by TextIndexNG. The
"index numbers"
options has been removed both from the splitter and the ZMI. In
addition
the splitter now accepts an optional set of characters that are
recognized to be valid inside words. This allows you to index common
words like "C++" or "python-22.lib" when you specify "+.-" as valid
word
characters.
* Python C extensions compile now under Windows (Binary
distribution will be available for Windows)
* normalizer support added
* full unicode support
* the add form for TextIndexNG now uses the registries to obtain
informations about registered complements instead of hardcoded
values.
* fixed problem with changed API of the Interface packages
(backport from Zope 3 to Zope 2.6)
* added vocabulary browser
* lots of code cleanup
* bug fixes...
* add statistics tab to ZMI
* fixed serious bug in TXNGSplitter due to missing
encoding parameter
* minor ZMI adjustments
* using converters no longer raises an exception when a converter
could not be found for the mime-type of a document
* using document converters did not work due to a changed API call
* added Finnish stemmer
* improved CMF support: TextIndexNG is not able to index foreign file
format stored as "Portal File" using the DocumentConverters.
"Portal File" objects are indexed if the index name is
'SearchableText".
This is a big improvement since you can now use
to search through text objects and word, pdf etc. inside your CMF
site with the "SearchableText" index.
* added stopword files for ten languages
* minor fixes inside the TXNGSplitter
* changed default encodings from iso-8859-1 to iso-8859-15
Requirements:
* Zope 2.5 or Zope CVS trunk checkout
Documentation:
* http://www.zope.org/Members/ajung/TextIndexNG/wiki
Download:
* http://www.zope.org/Members/ajung/TextIndexNG/ or
* http://sourceforge.net/project/showfiles.php?group_id=50052