Building A MailMan Search Interface
I've come across an itch. I'm tired of having to go through the mailing archives to find what i need. The search interface at egroups, is a bit slow and cumbersome. The one at ntlpd is much nicer, but i'd like to have my own so i can point an archiver/search interface at any mailman mailing list. So i decided i'd like to make a generic mailman search interface in zope. i've got a cronable retrieval script working that grabs archives. the next step is pretty crucial and i thought i'd ask around for advice. My question than becomes one of storage and parsing. I'm looking for suggestions on how to do this in an efficient and speedy manner. I'm willing to use linux/nix specific stuff if it helps performance. options: first question that applies to most of these approaches is whether to store mails an individual items or in default format of the text archive. flat file: this gives me a couple of parsing/searching options. like using grep or the c regexp library or the camel library (from helix code's evolution) any other options for this format? downside, this introduces some minor hurdles with presentation. zodb - btree folder for parsing/indexing this basically forces me to use zcatalog, which i don't think will scale to the amount of raw text without lotsa of ram. i could be wrong (i haven't gone through the Catalog code), but this is my working understanding of it. if i store the emails as archives i could probably whip up a reasonably speedy external method that would search through them. one benefit will be the ease of the presentation logic. but this is secondary to a speedy system. rdbms (probably postgres - maybe mysql) i'd prefer postgres since i'll probably be doing some other work with it. but %like% is probably one of the most expensive operations you can use on a db and its pretty limited in syntax. if i had a spare oracle system than i'd drop it in a heartbeat and use Intertext Media jaunx for searching. But i'd hate to tie this to a very expensive closed system. mysql seems to excel at speed (perhaps because it was designed for it:) but again the limitations of sql search syntax pop up. if anyone knows of any good ways to search through text in a db i'd love to hear about em. right now, i'm leaning slightly towards a flat file storage, but i'd love to hear some suggestions. Cheers Kapil
participants (1)
-
Kapil Thangavelu