[Zope-dev] Re: 64-bit BTrees

Jim Fulton jim at zope.com
Mon Apr 17 15:27:00 EDT 2006


Tres Seaver wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Jim Fulton wrote:
> 
>>Tres Seaver wrote:
>>
>>
>>>-----BEGIN PGP SIGNED MESSAGE-----
>>>Hash: SHA1
>>>
>>>Fred Drake wrote:
>>>
>>>
>>>>I have a need for 64-bit BTrees (at least for IOBTree and OIBTree),
>>>>and I'm not the first.  I've created a feature development branch for
>>>>this, and checked in my initial implementation.
>>>>
>>>>I've modified the existing code to use PY_LONG_LONG instead of int for
>>>>the key and/or value type; there's no longer a 32-bit version in the
>>>>modified code.  Any Python int or long that can fit in 64 bits is
>>>>accepted; ValueError is raised for values that require 65 bits (or
>>>>more).  Keys and values that can be reported as Python ints are, and
>>>>longs are only returned when the value cannot be converted to a Python
>>>>int.
>>>>
>>>>This can have a substantial effect on memory consumption, since keys
>>>>and/or values now take twice the space.  There may be performance
>>>>issues as well, but those have not been tested.
>>>>
>>>>There are new unit tests, but more are likely needed.
>>>>
>>>>If you're interested in getting the code from Subversion, it's
>>>>available at:
>>>>
>>>>   svn://svn.zope.org/repos/main/ZODB/branches/fdrake-64bits/
>>>>
>>>>Ideally, this or some variation on this could be folded back into the
>>>>main development for ZODB.  If this is objectionable, making 64-bit
>>>>btrees available would require introducing new versions of the btrees
>>>>(possibly named LLBTree, LOBTree, and OLBTree).
>>>
>>>
>>>
>>>I think coming up with new types is the only reasonable thing to do,
>>>given the prevalence of persistent BTrees out in the wild.  Changing the
>>>runtime behavior (footprint, performance) of those objects is probably
>>>not something which most users are going to want, at least not without
>>>carefully considering the implications.
>>
>>
>>It really depends on what the impact is.  It would be nice to get a feel
>>for whether this really impacts memory or performance for real
>>applications.
>>This adds 4-bytes per key or value.  That isn't much, especially in a
>>typical
>>Zope application.  Similarly, it's hard to say what the difference in C
>>integer
>>operations will be.  I can easily imagine it being negligible (or being
>>significant :).
>>
>>OTOH, adding a new type could be a huge PITA. We'd like to use these
>>with existing
>>catalog and index code, all of which uses IIBTrees.  If the performance
>>impacts are
>>modest, I'd much rather declare IIBTrees to use 64-bit rather than
>>32-bit integers.
>>
>>I suppose an alternative would be to add a mechanism to configure
>>IIBTrees to use
>>either 32-bit or 64-bit integers at run-time.
> 
> 
> Who uses IOBTree / OIBTree / IIBTree?
> 
>   - Catalogs map RIDs to UIDs as IOBTrees (one record per
>     indexed object)
> 
>   - Most indexes (those derived from Unindex) map RID to indexed value
>     as an IOBTree (one record per object with a value meaningful to that
>     index) and map values to RIDs as OOBTrees (where the second O is
>     usually an IITreeSet).
> 
>   - ZCTextIndex uses IIBTrees to map word IDs to RIDs, in various ways,
>     and make use of IOBTrees as wel..
> 
>   - Relationship "indexes" (typically not stored within catalogs)
>     usually have an IIBTree which is the mapping
>     of the edges as pairs of internal node IDs (one per explicit
>     relationship), with OIBTrees to map the user-supplied node value
>     to a node ID.
> 
> I would guess that if you could do a census of all the OIDs in all the
> Datas.fs in the world, a significant majority of them would be instances
> of classes declared in IOBTree / IIBTree (certainly the bulk of
> *transaction* records are going to be tied up with them).

OK.  I think we are misscommunicating. Using 64 bits for IIBTrees
types would not in any way invalidate existing pickles.
64-bit IIBTrees types can be unpickled from existing data.
Of course, someone who created 64-bit BTrees type instances
that had values outside the 32-bit range would have trouble reading
these values with 32-bit IIBTrees,

The fact that IIBTrees is so widely used is exatly the reason
I want to use 64-bits for the existing types rather than having to
introduce a new type.

Jim

-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org


More information about the Zope-Dev mailing list