[Zope-dev] Request for a Pluggin Index (NameIndex)
Matt Hamilton
matth@netsight.co.uk
Tue, 5 Jun 2001 17:03:14 +0100 (BST)
On Tue, 5 Jun 2001, Chris Withers wrote:
> > Looks like you should write your own index type. Zope 2.4
> > comes with an PlugableIndex interface to allow third-party
> > indexes to be integrated into the Catalog.
>
> Yeah, I know all that, and I'm very much looking forward to playing with
> this. :-)
> However, the email was an invitation for anyone who's interested and
> currently has time on their hands (yeah, I know, there's lots of us like
> that ;-) to have a go at writing the index type for me...
I would like to help if I had time :) I think the most efficient way of
doing what you want is to construct an index based on a 'Suffix Trie' this
essentially allows matching of arbitrary substrings very quickly, the only
problem is that it takes up a fair amount of space. The upside is that it
can be updated and incrementally added to quite easily (unlike many
inverted list implementations).
I confess I have not had the chance to look at the pluggable index types
in 2.4, but would really like to as I would like to port over some
indexing code I was working on for another project that allows compressed
storage of inverted lists for indexes. On average you can store a 32-bit
document id/ref in around 4 bits, which means you save a lot of space and
can keep stopwords in the lexicon (as an example try searching for 'to be
or not to be' in an index that removes stopwords :). Not only do you save
space, but due to the way the inverted list is read and decompressed you
save time on disk access for large indexes as there is less to physically
read.
-Matt
--
Matt Hamilton matth@netsight.co.uk
Netsight Internet Solutions, Ltd. Business Vision on the Internet
http://www.netsight.co.uk +44 (0)117 9090901
Web Hosting | Web Design | Domain Names | Co-location | DB Integration