[Zope-dev] Searching/Indexing/ZODB/SQL/BerkleyDB

Casey Duncan c.duncan@nlada.org
Wed, 28 Nov 2001 10:10:07 -0500


On Wednesday 28 November 2001 09:37 am, Chris Withers allegedly wrote:
> Casey Duncan wrote:
> > > > I would be willing to help both in coding and getting the code put
> > > > into the Zope core.
> > >
> > > <raises hand> me too!
>
> Me three! :-)
>
> Just to put my take on all of this...
>
> As some of you may know, I've been looking at indexign for a while now in
> one way or another...
>
> > > I'm interested in this too, and I'm keen to get a solution that will
> > > work with just the ZODB, without needing all of Zope.
> >
> > Yes, I second, third and forth that motion. I have a bunch of ideas
> > kicking around for ZODB-level indexing. Let's talk more.
>
> I don't believe this is a good idea anymore, especially if you get into any
> kind of amount of data.
> ZODB simple doesn't seem to scale to indexing very well. You all have no
> doubt experienced this with ZCatalog TextIndexes... I have a more flexible
> and pluggable indexer written for ZODB (not only Zope! ;-) but it didn't
> scale to anything like I needed :-(

I'm not sure I want to store the indexes in the ZODB, just index ZODB data at 
a low level.

>
> FileStorage goes through RAM at a rate of knots. Jim has a patch for this,
> but I haven't had a chance o stress test it yet.
> bsddb2Storage currently hammers disk meaning it has worse performance when
> indexing than FileStorage ;-)

Yup, I think I have a solution, but it'll involve some coding ;^)

>
> I'm currently working on a MySQL-based full text indexer with phrase
> matching, and potentially wildcards some time soon. For me, once this is
> cracked, FieldIndexes and the like are trivial in SQL and I intend to
> encapsulate the whole thing in a python class for ease of use. This is what
> I think might be the best solution; relational databases to tables well,
> that's what indexing is all about: tables.

I would rather avoid having to use a relational database unless I have to. 
Perhaps the index pluggability could be made to support different backends 
(like FileStorage et al does).

>
> That said, I wasn't aware of Matt's work up until very recently. I'd love
> to see an Indexer that didn't require an RDB (or BerkleyDB :-P) and scaled
> to GigaBytes of Data...

Yup, me too.
>
> > Perhaps we should arrange an
> > "indexing and catalog" chat on #zope.
>
> ...definitely. When shall we set a time and date?

OK, I'm available all this week, but I'm not as available the next two weeks. 
Lets find a good time.

>
> cheers,
>
> Chris

/---------------------------------------------------\
  Casey Duncan, Sr. Web Developer
  National Legal Aid and Defender Association
  c.duncan@nlada.org
\---------------------------------------------------/