[Zope-dev] zope.keyreference hashes vs. 32/64bit
Jim Fulton
jim at zope.com
Sat Aug 28 14:47:42 EDT 2010
On Sat, Aug 28, 2010 at 12:17 PM, Hanno Schlichting <hanno at hannosch.eu> wrote:
> Hi.
>
> I've recently stumbled on some at least to me unexpected behavior with
> zope.keyreference.
Specifically, zope.keyreference.persistent, I assume.
> For a persistent object it generates a unique key
> using:
>
> hash((database_name, oid))
No, it generates a hash this way.
>
> where hash is Python's built-in hash function.
>
> Reading the documentation I assumed that a keyreference for the same
> object (as identified by database name and oid) should be stable and
> always produce the same result. This isn't always true, when you look
> up persisted keyreference data, upgrade your software versions and
> compare it to a new calculation.
>
> Python's hash function is only stable inside the same Python version
> and 32/64 bit combination. The same input in a 32bit Python 2.6 and
> 64bit Python 2.6 produces different results, as both try to use the
> maximum available integer space and thus a 64bit Python generates keys
> above the 32int range. As a simple example "hash(('main', 1)) > 2**32"
> is True in a 64bit Python and False in a 32bit Python.
>
> The internal hash implementation seems to have been pretty stable in
> all the latest Python versions up to 3.1. So the algorithm produces
> the same results for all 32bit version of Python 2.x to 3.1 and 64bit
> respectively. But as far as I understand this isn't guaranteed to be
> the case for future versions.
>
> Does anyone else see a problem with this? Should keyreference use a
> different hash algorithm?
Potentially, yes. In current practice, I don't think so.
When a key reference is uses as a BTree key, its comparison function,
rather than it's hash is used.
If a key reference hash was used as a persistent key, then this would
definitely be a problem.
Note that in a dictionary or PersistentMapping, the hash isn't
saved persistently. The object is saves as a collection of items and the
hashes are recomputed on unpickling.
I'm in favor of someone coming up with a stable hash to
avoid future pitfalls.
It's sad that Python's hash isn't stable across Python versions
and architectures. Is this documented? If so, It's a missfeature.
If not, perhaps it should be reported as a bug.
Jim
--
Jim Fulton
More information about the Zope-Dev
mailing list