[Zope-dev] Catalog performance
John Barratt
jlb at ball.langarson.com.au
Wed Sep 10 22:41:15 EDT 2003
Max M wrote:
> Nguyen Quan Son wrote:
> > Hi,
> > I have a problem with performance and memory consumption when trying
> to do some statistics, using following code:
> > ...
> > docs = container.portal_catalog(meta_type='Document', ...)
> > for doc in docs:
> > obj = doc.getObject()
> > value = obj.attr
> > ...
> >
> > With about 10.000 documents this Python script takes 10 minutes and
> more than 500MB of memory, after that I had to restart Zope. I
> > am running Zope 2.6.1 + Plone 1.0 on Windows 2000, Xeon P4 with 1GB RAM.
> > What's wrong with this code? Any suggestion is appreciated.
> > Nguyen Quan Son.
>
> Most likely you are filling the memory of your server so that you are
> swapping to disk.
>
> Try cutting the query into smaller pieces so that the memory doesn't get
> filled up.
If you can't use catalog metadata as Seb suggests (eg. you are actually
accessing many attributes, large values, etc.) and if indeeed memory is
the problem (which seems likely) then you can ghostify the objects that
were ghosts to begin with, and it will save memory (unless all those
objects are already in cache).
The problem with this strategy though is that doc.getObject() method
used in your code activates the object and hence you won't know if it
was a ghost already or not. To get around this you can shortcut this
method and do something like :
docs = container.portal_catalog(meta_type='Document', ...)
for doc in docs:
obj = doc.aq_parent.unrestrictedTraverse(doc.getPath())
was_ghost = obj._p_changed is None
value = obj.attr
if was_ghost:obj._p_deactivate()
You can test this by running your code on a freshly restarted server,
and check the number of objects in cache. The number shouldn't change
much after running the above method, but will increase dramatically if
you just used 'obj = doc.getObject()' instead, or didn't do the
deactivating of the objects. The lower number of objects in your cache
should in turn keep your memory usage down, and prevent your computer
paging through the request, and hence speed things up considerably!
Another option would be to reduce the size of your cache so that the
amount of memory your zope instance consumes doesn't cause your computer
to swap, though doing the above code changes will also help keep your
cache with the 'right' objects in it as well, which in turn will further
help with the performance of subsequent requests.
Cheers,
JB.
More information about the Zope-Dev
mailing list