[Zope] ZCatalog speed?

Michel Pelletier michel@digicool.com
Mon, 13 Sep 1999 09:37:27 -0400


Stuart Woolford wrote:
> 

> 
> on my 500MHz P-II with 196MB of memory it takes:
> 
> a - 22 minutes to create 8800 documents (smallish) in 1200 folders within zope,
> not too fast :( but not exactly a user-interaction-limiting factor :)

7 documents per second aint too bad I don't think, it would be
interesting to see how fast you could dump them to the filesystem.

> b - too long to then try to do a search based add to a zcatalog,
> ie: netscape times out after only around 8 minutes, and the search has not
> finished!

Let me make sure we have the same terminology.  'Finding' objects into
the catalog involves using the find tab to search recursively down from
the catalog.  'Searching' means typing search criteria into an allready
loaded catalog and getting results.  It sounds like your talking about
'finding'.  If it's taking 8 minutes to do a *search*, that's a bug.  If
it's finding your taking about, try increasing the sub transaction
threshold (on the status screen) by an order of magnitude or two.  This
will cause Zope to commit sub-transactions less frequently.  1000, the
default, is probably two low but since this is the first version of Zope
with a catalog in it, it's not gotten any real world use.  We'll
probably jack it up to at least 10,000 for 2.1.

> BTW, Zopes python process and postgresql take about 50% of the CPU each, and
> there is basically zero disk thrashing during this process (although zope does
> get up around 50MB of memory use..)

Yes mass indexing is inneficient at the moment.  I recently recieved
'Managing Gigabytes' which was recommended by someone on the list.  It
has some very cool stuff in it that we might put into the catalog to
speed up indexing and searching (although as far as I can tell, searches
with ZCatalog are *damn* fast), and reduce memory and object database
consumption with slicker aglorithms and compression.  It also has some
cool stuff about wildcard/globbing searches at the expense of some extra
memory.

Note that the time it takes to mass index will improve as we improve the
algorithm, but in reality indexing allways takes time.  Once your
'corpus' of documents is created, it would be much, much faster to
incrementally index new and changed documents into the catalog then to
mass index everything over again.

-Michel