[Grok-dev] Re: Performance of OrderedContainer

Thu Jun 19 21:08:09 EDT 2008

On Jun 19, 2008, at 5:18 PM, Sebastian Ware wrote:

> 18 jun 2008 kl. 10.11 skrev Martijn Faassen:
>
>> We also need to investigate whether blist could be made to make  
>> sense i this role after all.
>
> According to the rejection notice in the PEP about Blists, the  
> performance gain might not be all that huge. I don't know anything  
> about this, but I just want to raise the issue so we don't turn  
> OrderedContainer into a little beast...
>
>  http://www.python.org/dev/peps/pep-3128/

While worth noting, this is only barely pertinent.

zc.blist is certainly not of interest to a ZODB ordered container  
because of its raw list performance.  The PersistentList uses Python's  
C implementation, and will be significantly faster for standard list  
operations.  The Python 3000 PEP proposal was also after performance  
gains that are not a current goal of the zc.blist implementation.

The blist pattern is of significant interest to users of the ZODB  
because it allows small changes in a large sequence to only write to a  
small number of persistent objects.  That is the pertinent advantage  
of my implementation.  It is also one of the big advantages to the  
ZODB BTrees, and a big reason why they are recommended over  
PersistentMappings for large-sized mappings.

Writing large objects, especially when ZEO is involved, has bad  
performance characteristics in the ZODB.  Once you have a single large  
object like a PersistentList or a PersistentMapping with many members,  
making a single change means that the entire collection must be  
written (and sent over the network, for ZEO).

The other advantage of the blist pattern, and of my implementation and  
the Python 3000 implementation, is cheap copies.  That may or may not  
be pertinent to this discussion.

As you might expect, the rejected Python 3000 blist does not directly  
help with the ZODB because good ZODB behavior was not a design goal.

The downside that is pertinent to both the Py3000 version and this one  
is the complexity of the implementation.  This complexity is hidden to  
the user, since the API is the same as a list.

FWIW, I think that a pluggable ordering, such as the one in Martin's  
Plone folder API, is a great option for a one-size-fits-all mapping.   
Non-ordered mappings never get an overhead, but can be easily  
converted to an ordered mapping.  A collection of very many mappings  
that are guaranteed to be relatively small, or medium sized and not  
change very often, could use PersistentList for their ordering  
implementation. Mappings that may get many members could use zc.blist,  
or another approach or implementation that also divides up members in  
buckets efficiently.

One-size-fits-all discussions for frameworks are always hard.  Trade- 
offs between speed, complexity, and storage are hard enough when you  
are relatively sure of the use cases.  It's nice that our component  
architecture makes it easier to stitch things together, at least.

Gary