On Tue, 5 Jun 2001, Chris Withers wrote:
Looks like you should write your own index type. Zope 2.4 comes with an PlugableIndex interface to allow third-party indexes to be integrated into the Catalog.
Yeah, I know all that, and I'm very much looking forward to playing with this. :-) However, the email was an invitation for anyone who's interested and currently has time on their hands (yeah, I know, there's lots of us like that ;-) to have a go at writing the index type for me...
I would like to help if I had time :) I think the most efficient way of doing what you want is to construct an index based on a 'Suffix Trie' this essentially allows matching of arbitrary substrings very quickly, the only problem is that it takes up a fair amount of space. The upside is that it can be updated and incrementally added to quite easily (unlike many inverted list implementations). I confess I have not had the chance to look at the pluggable index types in 2.4, but would really like to as I would like to port over some indexing code I was working on for another project that allows compressed storage of inverted lists for indexes. On average you can store a 32-bit document id/ref in around 4 bits, which means you save a lot of space and can keep stopwords in the lexicon (as an example try searching for 'to be or not to be' in an index that removes stopwords :). Not only do you save space, but due to the way the inverted list is read and decompressed you save time on disk access for large indexes as there is less to physically read. -Matt -- Matt Hamilton matth@netsight.co.uk Netsight Internet Solutions, Ltd. Business Vision on the Internet http://www.netsight.co.uk +44 (0)117 9090901 Web Hosting | Web Design | Domain Names | Co-location | DB Integration