Re: [Zope-dev] Catalog improvements

28 Nov 2001


      ----- Original Message -----
From: "Chris Withers" <chrisw@nipltd.com>
To: "Matt Hamilton" <matth@netsight.co.uk>
Cc: "Casey Duncan" <c.duncan@nlada.org>; "Steve Alexander"
<steve@cat-box.net>; "Wolfram Kerber" <wk@gallileus.de>; <zope-dev@zope.org>
Sent: Wednesday, November 28, 2001 09:27
Subject: Re: [Zope-dev] Catalog improvements
...
Matt Hamilton wrote:
...
I would like in on that too :)  About a year or so ago I was working on
a
...
full-text indexing system for indexing several gigabytes of text
(mailing
list archives).  Most of it was written in C and uses quite a lot of
cool
algorithms from various information retrieval papers and books.  I have
been hoping to have the time to take parts of it and work it into the
new
PluginIndex architecture.  The existing code uses BerkeleyDB files to
hold
the index structures, but I would like to use ZODB instead to give it a
bit more modularity.
Hi Matt,
Are any of these algorithms publicly available? I'd be _very_ interested
in them
:-)
I think the software "MG" from the book "Managing Gigabytes" is GPLed and
currently
released as mg-1.21. Walking through the TOC of the book, it seems to be a
very detailed
sources about text processing and gives very much informations about
different indexes types.
But I miss some explanations about current data structures like suffix
arrays or suffix tree
that have several advantages for text processing compared to B-Trees.

Andreas

    ---------------------------------------------------------------------
   -    Andreas Jung                            Zope Corporation       -
  -   EMail: andreas@zope.com                http://www.zope.com      -
 -  "Python Powered"                       http://www.python.org     -
  -   "Makers of Zope"                       http://www.zope.org      -
   -                  "Life is a fulltime occupation"                  -
    ---------------------------------------------------------------------