[Zope] ZCatalog Queries...

Kapil Thangavelu kthangavelu@earthlink.net
Wed, 30 Aug 2000 15:02:25 -0700


I saw this email with some interest since i had an offlist conversation
with some of the people at nipltd regarding a mailing list project, but
i chose to do it outside of zope because of lingering doubts about
zope's suitability(more ZCatalog) for the task.


Chris Withers wrote:
> 
> Chris McDonough wrote:
> >
> > > It's be nice if ZCatalog had a good general purpose
> > > interface, and was a
> > > bit more robust.
> > > (the BTree implementation which has been mentioned a few times springs
> > > to mind here ;-)
> >
> > Can you be more specific?
> 
> Andy can fill you in on the specifics.
> 
> >  What's insufficient about the current
> > implementation?
> 
> It doesn't scale well, especially for things where you have lots of new
> data arriving (this is the BTree problem, I think...)
> 
> It has no published and well defined query syntax (there's patches here,
> bits there, but no definitive document on how to use it, how to batch
> with it, how to perform complex and structured queries, particularly
> with TextIndex'es)
> 
> Don't get me wrong, it is very cool, but only kindof 70% there :S
> (and I get the impression that doing the remaining 30% properly would
> require a rewrite...)

more of the same...
 
I think that zcatalog is great for simple property indexing but it has
some signifigant drawbacks for its most common use, which is mass text
indexing, and searching. As object an catalog, its fine, But i think as
a mass text indexing/searching machinery it bites. I think Zope could
really use either a companion to zcatalog for text searching or a
replacement. If i'd like to index 50M i'd like not to have my search
take unreasonable long, give erroneous results, thrash my machine
because i choose to index it at once, yet these are all things that i've
experienced with zcatalog since i started using zope. for sure its
gotten much better, but it still can't handle my use cases. So i've
accepted that ZCatalog isn't a site wide nor a scalable tool.

ok so the question(at least to me) than becomes what to implement. i'm
not totally sure... i've taken a look/leaning towards Evolution's text
indexing (libibex), 

i consulted some of the evolution developers and this is the response i
got 8/5/00(us)

>>>>>>
You probably dont want to use libibex as it is now - all indexes
are stored in memory for example, and it uses a lot of memory
as well (uses gtree's).  It will handle 100MB of mail fine enough
though on modern hardware, if you ahve the spare memory.

At some point we are going to change the backend to use a disk
based storage, using the search engine used in nautilus (can't
remember what its called), although we will probably make a wrapper
so the api should remain.
>>>>>>

which takes back to square one. i am convinced of the  need for
full-text reliable search on a site or a part of site. but i'm unsure of
what the best way to do it would be.
 
my2cents

kapil

> As an example, we've been trying to do Zope-based versions of the
> mailing list archives for a coupla months now and the Catalog keeps
> exploding in different ways (huge resource consumption, even for only
> 30K messages or so, no matter what storage is used)
> 
> Then there's the ubiquitous 'KeyError's and other associated weirdness,
> all of which leaves me feeling a lot less than totally confident in the
> Catalog ;-)
> 
> comments very welcome,
> 
> Chris
> 
> _______________________________________________
> Zope maillist  -  Zope@zope.org
> http://lists.zope.org/mailman/listinfo/zope
> **   No cross posts or HTML encoding!  **
> (Related lists -
>  http://lists.zope.org/mailman/listinfo/zope-announce
>  http://lists.zope.org/mailman/listinfo/zope-dev )