[ZODB-Dev] Re: [Bug] Another FileStorage packing bug
Dieter Maurer
dieter at handshake.de
Thu Jan 15 13:05:51 EST 2004
Hi Jeremy,
Jeremy Hylton wrote at 2004-1-14 17:49 -0500:
>On Tue, 2004-01-13 at 05:51, Dieter Maurer wrote:
>> Instead is was extremely bad runtime behaviour in connection
>> with "_resolve_backpointer" and "_data_find".
>> With the ZODB 3.1 implementation (at least) it is easy
>> to get quadratic runtime behaviour with respect to the number
>> of objects contained in a transaction.
>
>If I follow correctly, the problem is that for each backpointer in a
>transaction, you try to find the transaction for that backpointer. You
>always do a linear search through the file, starting at the end, to find
>the transaction for that backpointer. Even if there are N objects all
>in the same transaction, you do the search N times.
You are right: there are two potentially expensive operations:
* searching the transactions
* searching the object record within the transaction
My patch addresses only the second one.
> ...
>I'm not sure I understand what your patch is doing. Could you explain
>the strategy? I would think a chief problem is _txn_find(), but it
>looks like you're not changing that.
The patch maintains the mapping "oid -> pos" for transactions
in which objects for backpointers have been looked up in a cache.
>Or is the problem that the transactions themselves contain many objects
>and the search in _data_find() for the record matching a particular oid
>is expensive.
Right.
We may add another cache for the "transaction -> pos" mapping
to make "_find_txn" faster.
>Either way a cache seems like a decent solution. I wonder, though, if
>we should rethink the algorithms for these cases. For pack and undo,
>we're pretty much doing a sequential operation on all the objects. So
>it might be clearer to rework the APIs in some way to exploit the
>regular access patterns. On the other hand, a cache works without
>having to rethink the APIs.
I think you refactored the old packing code (which used a linear
copying -- but under hairy and sometimes wrong assumptions)
in order to get the implementation more stable.
Thus, it might be more safe, to use a cache then to go back
to a linear copying algorithm.
--
Dieter
More information about the ZODB-Dev
mailing list