Stuart Woolford wrote:
on my 500MHz P-II with 196MB of memory it takes:
a - 22 minutes to create 8800 documents (smallish) in 1200 folders within zope, not too fast :( but not exactly a user-interaction-limiting factor :)
7 documents per second aint too bad I don't think, it would be interesting to see how fast you could dump them to the filesystem.
b - too long to then try to do a search based add to a zcatalog, ie: netscape times out after only around 8 minutes, and the search has not finished!
Let me make sure we have the same terminology. 'Finding' objects into the catalog involves using the find tab to search recursively down from the catalog. 'Searching' means typing search criteria into an allready loaded catalog and getting results. It sounds like your talking about 'finding'. If it's taking 8 minutes to do a *search*, that's a bug. If it's finding your taking about, try increasing the sub transaction threshold (on the status screen) by an order of magnitude or two. This will cause Zope to commit sub-transactions less frequently. 1000, the default, is probably two low but since this is the first version of Zope with a catalog in it, it's not gotten any real world use. We'll probably jack it up to at least 10,000 for 2.1.
BTW, Zopes python process and postgresql take about 50% of the CPU each, and there is basically zero disk thrashing during this process (although zope does get up around 50MB of memory use..)
Yes mass indexing is inneficient at the moment. I recently recieved 'Managing Gigabytes' which was recommended by someone on the list. It has some very cool stuff in it that we might put into the catalog to speed up indexing and searching (although as far as I can tell, searches with ZCatalog are *damn* fast), and reduce memory and object database consumption with slicker aglorithms and compression. It also has some cool stuff about wildcard/globbing searches at the expense of some extra memory. Note that the time it takes to mass index will improve as we improve the algorithm, but in reality indexing allways takes time. Once your 'corpus' of documents is created, it would be much, much faster to incrementally index new and changed documents into the catalog then to mass index everything over again. -Michel