[Zope] Experimental searchable mail list archive
The Dragon De Monsyne
dragondm@integral.org
Tue, 5 Oct 1999 05:06:53 -0500 (CDT)
On Tue, 28 Sep 1999, Michel Pelletier wrote:
> Greetings,
>
> I finally got sick of paging through endless archive messages, so I
> implimented an expirimental searchable list archive:
>
> http://www.zope.org:12080/archives/Catalog/S
>
> will present you with a single text search box. This is a very trivial
> interface, it will be expanded upon.
>
> Please try and use this over the next few days and see if it help answer
> your questions.
>
> I used the fsimport script to import the entirety of the pipermail
> archive, and then cataloged it with the 'Find objects' Catalog tab. In
> the process, I fixed a silly design flaw that improved the mass indexing
> speed of catalog by at least 200% and greatly reduced the memory
> overhead and thrashing. The dataset of documents is 56MB, the total
> dataset plus indexes is 64MB. Not bad. It took 6 minutes to index the
> entire dataset with a 10000 word subtransaction threshold and the
> process footprint grew to 85MB. Catalog has come a long way in terms of
> speed and memory usage.
>
> Further improvements are to parse the documents into rfc822 Messages
> (probably with a ZClass), index all interesting attributes (date,
> author, etc), and impliment a simple ZPublisher.Client script that
> mailman calls to 'push' a message up to the server, instanciate a new
> message object, and incrimentaly index it in the Catalog.
>
Hmmm! Whaddayaknow! this is exactly what I've been working on!
I've been planning out a product called MessageBase to do this. I'm
sketching out the Message class right now. I'm planning on it having full
MIME suport. (one of the things I have gotten done so far is an imporved
version of python's mimetools module thast is actually compliant to the
MIME RFC's)
-The Dragon De Monsyne