On Sat, Aug 28, 2010 at 12:17 PM, Hanno Schlichting <hanno@hannosch.eu> wrote:
Hi.
I've recently stumbled on some at least to me unexpected behavior with zope.keyreference.
Specifically, zope.keyreference.persistent, I assume.
For a persistent object it generates a unique key using:
hash((database_name, oid))
No, it generates a hash this way.
where hash is Python's built-in hash function.
Reading the documentation I assumed that a keyreference for the same object (as identified by database name and oid) should be stable and always produce the same result. This isn't always true, when you look up persisted keyreference data, upgrade your software versions and compare it to a new calculation.
Python's hash function is only stable inside the same Python version and 32/64 bit combination. The same input in a 32bit Python 2.6 and 64bit Python 2.6 produces different results, as both try to use the maximum available integer space and thus a 64bit Python generates keys above the 32int range. As a simple example "hash(('main', 1)) > 2**32" is True in a 64bit Python and False in a 32bit Python.
The internal hash implementation seems to have been pretty stable in all the latest Python versions up to 3.1. So the algorithm produces the same results for all 32bit version of Python 2.x to 3.1 and 64bit respectively. But as far as I understand this isn't guaranteed to be the case for future versions.
Does anyone else see a problem with this? Should keyreference use a different hash algorithm?
Potentially, yes. In current practice, I don't think so. When a key reference is uses as a BTree key, its comparison function, rather than it's hash is used. If a key reference hash was used as a persistent key, then this would definitely be a problem. Note that in a dictionary or PersistentMapping, the hash isn't saved persistently. The object is saves as a collection of items and the hashes are recomputed on unpickling. I'm in favor of someone coming up with a stable hash to avoid future pitfalls. It's sad that Python's hash isn't stable across Python versions and architectures. Is this documented? If so, It's a missfeature. If not, perhaps it should be reported as a bug. Jim -- Jim Fulton