[Zope-dev] Catalog performance - SOLVED
Nguyen Quan Son
sonnq at tinhvan.com
Thu Sep 11 10:24:25 EDT 2003
I've added catalog metadata as Seb suggested and it works fine.
Thank you very much.
Nguyen Quan Son
> Nguyen Quan Son wrote:
> > Hi,
> > I have a problem with performance and memory consumption when trying to do some statistics, using following code:
> > ...
> > docs = container.portal_catalog(meta_type='Document', ...)
> > for doc in docs:
> > obj = doc.getObject()
> > value = obj.attr
> > ...
> >
> > With about 10.000 documents this Python script takes 10 minutes and more than 500MB of memory, after that I had to restart Zope.
I
> > am running Zope 2.6.1 + Plone 1.0 on Windows 2000, Xeon P4 with 1GB RAM.
> > What's wrong with this code? Any suggestion is appreciated.
From: "John Barratt" <jlb at ball.langarson.com.au>
To: <zope-dev at zope.org>
Sent: Wednesday, September 10, 2003 6:41 PM
Subject: Re: [Zope-dev] Catalog performance
>
> If you can't use catalog metadata as Seb suggests (eg. you are actually
> accessing many attributes, large values, etc.) and if indeeed memory is
> the problem (which seems likely) then you can ghostify the objects that
> were ghosts to begin with, and it will save memory (unless all those
> objects are already in cache).
>
> The problem with this strategy though is that doc.getObject() method
> used in your code activates the object and hence you won't know if it
> was a ghost already or not. To get around this you can shortcut this
> method and do something like :
>
> docs = container.portal_catalog(meta_type='Document', ...)
> for doc in docs:
> obj = doc.aq_parent.unrestrictedTraverse(doc.getPath())
> was_ghost = obj._p_changed is None
> value = obj.attr
> if was_ghost:obj._p_deactivate()
>
> You can test this by running your code on a freshly restarted server,
> and check the number of objects in cache. The number shouldn't change
> much after running the above method, but will increase dramatically if
> you just used 'obj = doc.getObject()' instead, or didn't do the
> deactivating of the objects. The lower number of objects in your cache
> should in turn keep your memory usage down, and prevent your computer
> paging through the request, and hence speed things up considerably!
>
> Another option would be to reduce the size of your cache so that the
> amount of memory your zope instance consumes doesn't cause your computer
> to swap, though doing the above code changes will also help keep your
> cache with the 'right' objects in it as well, which in turn will further
> help with the performance of subsequent requests.
>
> Cheers,
>
> JB.
From: "Seb Bacon" <seb at jamkit.com>
To: <zope-dev at zope.org>
Sent: Wednesday, September 10, 2003 6:18 PM
Subject: [Zope-dev] Re: Catalog performance
>
> With getObject(), you're loading entire objects into memory in order to
> grab a single attribute. This is very wasteful. Try putting the
> attribute into the metadata for the catalog and grabbing it from there.
> Then you can do:
>
> for doc in docs:
> value = doc.attr
>
> seb
More information about the Zope-Dev
mailing list