On Tue, 28 Sep 1999, Michel Pelletier wrote:
Greetings,
I finally got sick of paging through endless archive messages, so I implimented an expirimental searchable list archive:
http://www.zope.org:12080/archives/Catalog/S
will present you with a single text search box. This is a very trivial interface, it will be expanded upon.
Please try and use this over the next few days and see if it help answer your questions.
I used the fsimport script to import the entirety of the pipermail archive, and then cataloged it with the 'Find objects' Catalog tab. In the process, I fixed a silly design flaw that improved the mass indexing speed of catalog by at least 200% and greatly reduced the memory overhead and thrashing. The dataset of documents is 56MB, the total dataset plus indexes is 64MB. Not bad. It took 6 minutes to index the entire dataset with a 10000 word subtransaction threshold and the process footprint grew to 85MB. Catalog has come a long way in terms of speed and memory usage.
Further improvements are to parse the documents into rfc822 Messages (probably with a ZClass), index all interesting attributes (date, author, etc), and impliment a simple ZPublisher.Client script that mailman calls to 'push' a message up to the server, instanciate a new message object, and incrimentaly index it in the Catalog.
Hmmm! Whaddayaknow! this is exactly what I've been working on! I've been planning out a product called MessageBase to do this. I'm sketching out the Message class right now. I'm planning on it having full MIME suport. (one of the things I have gotten done so far is an imporved version of python's mimetools module thast is actually compliant to the MIME RFC's) -The Dragon De Monsyne