[Zope] ZCatalog speed?
Stuart Woolford
stuartw@newmail.net
Tue, 14 Sep 1999 10:44:06 +1200
On Tue, 14 Sep 1999, Michel Pelletier wrote:
> Stuart Woolford wrote:
> >
>
> >
> > on my 500MHz P-II with 196MB of memory it takes:
> >
> > a - 22 minutes to create 8800 documents (smallish) in 1200 folders within zope,
> > not too fast :( but not exactly a user-interaction-limiting factor :)
>
> 7 documents per second aint too bad I don't think, it would be
> interesting to see how fast you could dump them to the filesystem.
I can produce documents to the FS around 10 times that speed, but I'm not
complaining, I think it is not too bad..
>
> > b - too long to then try to do a search based add to a zcatalog,
> > ie: netscape times out after only around 8 minutes, and the search has not
> > finished!
>
> Let me make sure we have the same terminology. 'Finding' objects into
> the catalog involves using the find tab to search recursively down from
> the catalog. 'Searching' means typing search criteria into an allready
> loaded catalog and getting results. It sounds like your talking about
> 'finding'. If it's taking 8 minutes to do a *search*, that's a bug. If
> it's finding your taking about, try increasing the sub transaction
> threshold (on the status screen) by an order of magnitude or two. This
> will cause Zope to commit sub-transactions less frequently. 1000, the
> default, is probably two low but since this is the first version of Zope
> with a catalog in it, it's not gotten any real world use. We'll
> probably jack it up to at least 10,000 for 2.1.
you are right, I'm findingdocs into the zcatalogue, not searching it (yet).
>
> > BTW, Zopes python process and postgresql take about 50% of the CPU each, and
> > there is basically zero disk thrashing during this process (although zope does
> > get up around 50MB of memory use..)
>
> Yes mass indexing is inneficient at the moment. I recently recieved
> 'Managing Gigabytes' which was recommended by someone on the list. It
> has some very cool stuff in it that we might put into the catalog to
> speed up indexing and searching (although as far as I can tell, searches
> with ZCatalog are *damn* fast), and reduce memory and object database
> consumption with slicker aglorithms and compression. It also has some
> cool stuff about wildcard/globbing searches at the expense of some extra
> memory.
I was thinking that a 50% share was not to bad for a non-native-compiled..
pretty much on target I would say.
>
> Note that the time it takes to mass index will improve as we improve the
> algorithm, but in reality indexing allways takes time. Once your
> 'corpus' of documents is created, it would be much, much faster to
> incrementally index new and changed documents into the catalog then to
> mass index everything over again.
One VERY interesting think I have noticed:
around 5 minutes into the add, watching TOP on the unix system, I see that the
python process splits (it's around 11MB at this stage), than a little after I
get another postmaster (the database) process appearing, and from then on we
have a 4-way split of CPU, instead of 2-way, I don't see any reason for Zope to
split off a new process (it has no other connections while doing this) - is
this a bug perhaps?
>
> -Michel
--
------------------------------------------------------------
Stuart Woolford, stuartw@newmail.net
Unix Consultant.
Software Developer.
Supra Club of New Zealand.
------------------------------------------------------------