Re: [Zope] Re: ZCatalog alpha
Alex Rice <alrice@swcp.com> writes:
On Fri, 25 Jun 1999 15:20:22 -0400, Michel Pelletier <michel@digicool.com> said:
Michel> Remember, this is *alpha* software. It has limitations, and Michel> there are several optimizations/features we have planed for it Michel> which are not included in this release. We would, however, like Michel> the tired to be well kicked.
Kicking the tired? That's cruel and unusual :-)
Ugh. Well maybe at the time I felt like a kicked tired person. Paul called this a 'Fulton-typo', a typo that makes sense anyways and is funnier than the original sentence.
re: ZCatalog This rocks. It's very fast.
Yep.
However, I seem to be limited by the amount of memory used by manage_catalogFoundItems. It's memory usage seems to balloon out of control, and never goes back to normal until restarting Zope.
Is this a memory leak, or just a consequence of the awesome indexing the catalog is doing?
It *could* be a memory leak, but remember, for every unique word that you text index, Zope creates several data structures. On my list of things to do to catalog to optimize it is to improve upon the 'Status' view so that you can look inside the indexes and see how many words they are storing, and maybe some memory consumption statistics. Don't be suprised, however, if the indexes grow to two or three times the size of the data that is indexed. That is the nature of indexing. You may have noted from the code that ZCatalog uses 'UnIndex' and 'UnTextIndex'es. These are different from the usual Index and TextIndex modules that have come with Zope since 1.10.0. The 'Un' indexes are fully symetrical, meaning that not only do they have an index that maps from word to document id (and score) they also have an inverted index that maps from document id to word. The reason for this is because the Catalog can not allways track when an object changes (the only way it can do that is if the object notifies the catalog that it is about to change). 'Un' indexes then need to keep a little extra information around, so that when an object comes in to have itself Cataloged again, the Index can first find out what the old values of the object were (using the inverted index) unindex those, and then reindex the object. Pretty heavy stuff, although the use of fully symetrical indexes really simplified the design of the 'Un' indexes over those of the more memory efficient Indexes.
Just for kicks I loaded several MB of texts into DTML Docs and this is where I found the memory problem. My setup is
The several MB of text would probably translate into 2 or 3 times that in memory growth. However, the Persistence machinery will quickly start to deactivate unused portions of the index. Try tweaking your Cache settings to see if you can get Zope to be more aggresive about deactivating objects. -Michel
Zope 2.0a3+ZServer+odb3+pcgi Python 1.5.2 Linux 2.2.9/x86
Alex Rice | alrice@swcp.com | http://www.swcp.com/~alrice Current Location: N. Rio Grande Bioregion, Southwestern USA
_______________________________________________ Zope maillist - Zope@zope.org http://www.zope.org/mailman/listinfo/zope
(For developer-specific issues, use the companion list, zope-dev@zope.org - http://www.zope.org/mailman/listinfo/zope-dev )
On 26 Jun 1999 15:20:57 -0400, michel@digicool.com said: michel> The several MB of text would probably translate into 2 or 3 michel> times that in memory growth. However, the Persistence machinery michel> will quickly start to deactivate unused portions of the index. michel> Try tweaking your Cache settings to see if you can get Zope to michel> be more aggresive about deactivating objects. OK, here's some more data points, just in case it's useful: My z2.py process starts at about 5-6MB memory usage. After indexing about 2MB of text, it goes up to 40 MB memory usage. Manually running the GC, it goes down to 38MB. (manage_pack and manage_cache_size are not implemented, yet, I think) Stopping, then starting Zope it goes back down to 5-6MB. After searching via the catalog it's at about 8MB and stays right around there. Thanks for ZCatalog. It's going to be immensely useful. Alex Rice | alrice@swcp.com | http://www.swcp.com/~alrice Current Location: N. Rio Grande Bioregion, Southwestern USA
participants (2)
-
Alex Rice -
michel@digicool.com