I am pleased to announce the first alpha release of TextIndexNG V 1.0. TextIndexNG is a pluggable index for the ZCatalog that enhances the full text indexing capabilities of Zope by providing the following features: * support for document converters (HTML, PDF, WinWord, PowerPoint, Postscript). Custom converters can be easily added * stemmer support for 12 languages * optional support for right truncation * similarity search (soundex, metaphone support) (for english) * NEAR search * phrase search * pluggable query parsers (two parsers included) * stop words support * new test tab for interactive testing * 2-3 times faster than Zopes TextIndex (more to come ;-) ) * compatible with ASCII, ISO-8859-1 Requirements: * Zope 2.5 or Zope CVS trunk checkout Documentation: * http://www.zope.org/Members/ajung/TextIndexNG/wiki Download: * http://www.zope.org/Members/ajung/TextIndexNG/ Installation: * The installation is described in the Wiki (see above) Changes sind alpha 1: * complete Registry implementation rewritten. Interfaces for pluggable components are strictly enforced. * every pluggable component resides now in a registry * broken similarity search fixed * additional options for interactiveDemo.py * integrated Matt Hamiltons compressed lists code * changed license of TextIndexNG to ZPL * the queryparser extension module should now build out-of-the-box at least on all Unix plattforms (flex, bison required). * minor code cleanup and speedup in the extension modules Contact: * Andreas Jung, Email: andreas at andreas-jung.com
I plan to support unicode in the future. There are some unresolved issues I must work on. In general nearly all components are prepared to handle unicode. However I had not time to test all the stuff. So version 1.0 will only support Ascii and ISO-8859-1. Andreas ----- Original Message ----- From: "Milos Prudek" <milos.prudek@tiscali.cz> To: "Andreas Jung" <andreas@andreas-jung.com>; "zope" <zope@zope.org> Sent: Friday, May 10, 2002 12:46 Subject: Re: [Zope] [ANN] TextIndexNG 1.0 alpha 2 released
* compatible with ASCII, ISO-8859-1
What about other encodings, paricularly East European ISO-8859-2 and Win-1250 ?
-- Milos Prudek
Milos, because the limitation was an issue of the QueryParser module (using flex/yacc) I have rewritten the parser as native Python module. This new parser has the advantage that it will run on all platforms and accepts *any* encoding based on your locale settings. And it will accept unicode strings (but they are currently not very well handled inside TXNG). Hope, this makes TXNG attractive for you. Andreas ----- Original Message ----- From: "Milos Prudek" <milos.prudek@tiscali.cz> To: "Andreas Jung" <andreas@andreas-jung.com>; "zope" <zope@zope.org> Sent: Friday, May 10, 2002 12:46 Subject: Re: [Zope] [ANN] TextIndexNG 1.0 alpha 2 released
* compatible with ASCII, ISO-8859-1
What about other encodings, paricularly East European ISO-8859-2 and Win-1250 ?
-- Milos Prudek
Andreas Jung wrote:
Milos,
because the limitation was an issue of the QueryParser module (using flex/yacc) I have rewritten the parser as native Python module. This new parser has the advantage that it will run on all platforms and accepts *any* encoding based on your locale settings. And it will accept unicode strings (but they are currently not very well handled inside TXNG). Hope, this makes TXNG attractive for you.
Great news! It definitely does. There's something that perhaps is a problem. It's this: Althouth ISO-8859-2 is an international standard for many East European languages (such as Czech and Slovak), 95% of pages published on the World Wide Web are in Win-1250 encoding. Even Linux servers publish web pages in Win-1250 [but have ISO-8859-2 or even ISO-8859-1 locale set] because older Microsoft browsers cannot display ISO-8859-2 encoding. However, Mozilla and other browsers on Linux can display Win-1250 (even if the client host is set to ISO-8859-2), even in web forms. Since TXNG relies on locale setting, that would mean that Linux (server) locale must be set to Win-1250. I'm not sure whether it is possible to set Linux locale to Win-1250, and even if it was possible it could bring other problems. Do you think that this is an issue for TXNG, or that people should switch to more recent MSIE browsers? Thank you for your time! -- Milos Prudek
participants (2)
-
Andreas Jung -
Milos Prudek