----- Original Message ----- From: "Matt Hamilton" <matth@netsight.co.uk> To: "Andreas Jung" <andreas@zope.com> Cc: "Chris Withers" <chrisw@nipltd.com>; "Casey Duncan" <c.duncan@nlada.org>; "Steve Alexander" <steve@cat-box.net>; "Wolfram Kerber" <wk@gallileus.de>; <zope-dev@zope.org> Sent: Wednesday, November 28, 2001 09:55 Subject: Re: [Zope-dev] Catalog improvements
On Wed, 28 Nov 2001, Andreas Jung wrote:
I think the software "MG" from the book "Managing Gigabytes" is GPLed and currently released as mg-1.21. Walking through the TOC of the book, it seems to be a very detailed sources about text processing and gives very much informations about different indexes types. But I miss some explanations about current data structures like suffix arrays or suffix tree that have several advantages for text processing compared to B-Trees.
Suffix Trees/Tries take up a *lot* of space. But they are very fast, and useful for searching for substrings.
Usually four times the amount of the data to be indexed ;-) Andreas