[Zope] ZCatalog speed?

Michel Pelletier michel@digicool.com
Tue, 14 Sep 1999 02:04:39 -0400


Stuart Woolford wrote:
> 
> On Tue, 14 Sep 1999, Michel Pelletier wrote:
> > Stuart Woolford wrote:
> > >
> > > >  If
> > > > it's finding your taking about, try increasing the sub transaction
> > > > threshold (on the status screen) by an order of magnitude or two.  This
> > > > will cause Zope to commit sub-transactions less frequently.  1000, the
> > > > default, is probably two low but since this is the first version of Zope
> > > > with a catalog in it, it's not gotten any real world use.  We'll
> > > > probably jack it up to at least 10,000 for 2.1.
> > >
> > > you are right, I'm findingdocs into the zcatalogue, not searching it (yet).
> >
> > Did increasing the threshold help?
> 
> Well, I upped it to 10000, and also converted all the docs to ZClasses, and
> index specific properties instead of a html body.
> 
> The down side is it now takes 40 minutes to generate 8800 items (I've still got
> to optimise this, I'm sure it can be improved), but the finding into the
> ZCatalogue is not great -3 minutes, with indexing taking another 4 minutes.

Yes but this is the first index.  In the next revision, I'll have
implimented an optimization where when you run find the second time, it
snifs the modification time of each object and only bothers to re-index
the objects that changed since the last index.  Trivial, but big wins
for large bodies of unchanging documents.  I hope we still got some cvs
testers out there.

A further optimization Jim pointed out today is a bit more advanced,
using multiple sorted indexes with merges.  This should reduce alot of
the IO thrashing that mass indexing does.

In terms of ZCatalog as it stands now, mass indexing is it's weakness. 
3-4 minutes isn't bad though, it would be nice to know a total count of
how many 'unique entities' (stemmed words) a catalog has seen over a
period of time or even better a log of total words indexed in the last n
transaction commits.
> 
> I've noticed one 'feature' - whing a basic ZSearch, I have a text indexed
> 'name' field (tha name of a book, FWIW), when I search (for 'computer', for
> example) I only get the search word back here, not the whole name, is this a
> bug or a feature? I've not looked closely yet, so it can quite probably be
> fixed..

I'm sorry, I don't understand your problem.  Can you rephrase it?

-Michel