I have a need for 64-bit BTrees (at least for IOBTree and OIBTree), and I'm not the first. I've created a feature development branch for this, and checked in my initial implementation. I've modified the existing code to use PY_LONG_LONG instead of int for the key and/or value type; there's no longer a 32-bit version in the modified code. Any Python int or long that can fit in 64 bits is accepted; ValueError is raised for values that require 65 bits (or more). Keys and values that can be reported as Python ints are, and longs are only returned when the value cannot be converted to a Python int. This can have a substantial effect on memory consumption, since keys and/or values now take twice the space. There may be performance issues as well, but those have not been tested. There are new unit tests, but more are likely needed. If you're interested in getting the code from Subversion, it's available at: svn://svn.zope.org/repos/main/ZODB/branches/fdrake-64bits/ Ideally, this or some variation on this could be folded back into the main development for ZODB. If this is objectionable, making 64-bit btrees available would require introducing new versions of the btrees (possibly named LLBTree, LOBTree, and OLBTree). I welcome comments. -Fred -- Fred L. Drake, Jr. <fdrake at gmail.com> "Don't let schooling interfere with your education." -- Mark Twain
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Fred Drake wrote:
I have a need for 64-bit BTrees (at least for IOBTree and OIBTree), and I'm not the first. I've created a feature development branch for this, and checked in my initial implementation.
I've modified the existing code to use PY_LONG_LONG instead of int for the key and/or value type; there's no longer a 32-bit version in the modified code. Any Python int or long that can fit in 64 bits is accepted; ValueError is raised for values that require 65 bits (or more). Keys and values that can be reported as Python ints are, and longs are only returned when the value cannot be converted to a Python int.
This can have a substantial effect on memory consumption, since keys and/or values now take twice the space. There may be performance issues as well, but those have not been tested.
There are new unit tests, but more are likely needed.
If you're interested in getting the code from Subversion, it's available at:
svn://svn.zope.org/repos/main/ZODB/branches/fdrake-64bits/
Ideally, this or some variation on this could be folded back into the main development for ZODB. If this is objectionable, making 64-bit btrees available would require introducing new versions of the btrees (possibly named LLBTree, LOBTree, and OLBTree).
I think coming up with new types is the only reasonable thing to do, given the prevalence of persistent BTrees out in the wild. Changing the runtime behavior (footprint, performance) of those objects is probably not something which most users are going to want, at least not without carefully considering the implications. Tres. - -- =================================================================== Tres Seaver +1 202-558-7113 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFEPpyu+gerLs4ltQ4RAmh1AJ9/dLigNMrQgIFNASKWbpvboapywwCePV22 /3d8kFGTjipAVCsy5fnuLa4= =xe6v -----END PGP SIGNATURE-----
Tres Seaver wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Fred Drake wrote:
I have a need for 64-bit BTrees (at least for IOBTree and OIBTree), and I'm not the first. I've created a feature development branch for this, and checked in my initial implementation.
I've modified the existing code to use PY_LONG_LONG instead of int for the key and/or value type; there's no longer a 32-bit version in the modified code. Any Python int or long that can fit in 64 bits is accepted; ValueError is raised for values that require 65 bits (or more). Keys and values that can be reported as Python ints are, and longs are only returned when the value cannot be converted to a Python int.
This can have a substantial effect on memory consumption, since keys and/or values now take twice the space. There may be performance issues as well, but those have not been tested.
There are new unit tests, but more are likely needed.
If you're interested in getting the code from Subversion, it's available at:
svn://svn.zope.org/repos/main/ZODB/branches/fdrake-64bits/
Ideally, this or some variation on this could be folded back into the main development for ZODB. If this is objectionable, making 64-bit btrees available would require introducing new versions of the btrees (possibly named LLBTree, LOBTree, and OLBTree).
I think coming up with new types is the only reasonable thing to do, given the prevalence of persistent BTrees out in the wild. Changing the runtime behavior (footprint, performance) of those objects is probably not something which most users are going to want, at least not without carefully considering the implications.
It really depends on what the impact is. It would be nice to get a feel for whether this really impacts memory or performance for real applications. This adds 4-bytes per key or value. That isn't much, especially in a typical Zope application. Similarly, it's hard to say what the difference in C integer operations will be. I can easily imagine it being negligible (or being significant :). OTOH, adding a new type could be a huge PITA. We'd like to use these with existing catalog and index code, all of which uses IIBTrees. If the performance impacts are modest, I'd much rather declare IIBTrees to use 64-bit rather than 32-bit integers. I suppose an alternative would be to add a mechanism to configure IIBTrees to use either 32-bit or 64-bit integers at run-time. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jim Fulton wrote:
Tres Seaver wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Fred Drake wrote:
I have a need for 64-bit BTrees (at least for IOBTree and OIBTree), and I'm not the first. I've created a feature development branch for this, and checked in my initial implementation.
I've modified the existing code to use PY_LONG_LONG instead of int for the key and/or value type; there's no longer a 32-bit version in the modified code. Any Python int or long that can fit in 64 bits is accepted; ValueError is raised for values that require 65 bits (or more). Keys and values that can be reported as Python ints are, and longs are only returned when the value cannot be converted to a Python int.
This can have a substantial effect on memory consumption, since keys and/or values now take twice the space. There may be performance issues as well, but those have not been tested.
There are new unit tests, but more are likely needed.
If you're interested in getting the code from Subversion, it's available at:
svn://svn.zope.org/repos/main/ZODB/branches/fdrake-64bits/
Ideally, this or some variation on this could be folded back into the main development for ZODB. If this is objectionable, making 64-bit btrees available would require introducing new versions of the btrees (possibly named LLBTree, LOBTree, and OLBTree).
I think coming up with new types is the only reasonable thing to do, given the prevalence of persistent BTrees out in the wild. Changing the runtime behavior (footprint, performance) of those objects is probably not something which most users are going to want, at least not without carefully considering the implications.
It really depends on what the impact is. It would be nice to get a feel for whether this really impacts memory or performance for real applications. This adds 4-bytes per key or value. That isn't much, especially in a typical Zope application. Similarly, it's hard to say what the difference in C integer operations will be. I can easily imagine it being negligible (or being significant :).
OTOH, adding a new type could be a huge PITA. We'd like to use these with existing catalog and index code, all of which uses IIBTrees. If the performance impacts are modest, I'd much rather declare IIBTrees to use 64-bit rather than 32-bit integers.
I suppose an alternative would be to add a mechanism to configure IIBTrees to use either 32-bit or 64-bit integers at run-time.
Who uses IOBTree / OIBTree / IIBTree? - Catalogs map RIDs to UIDs as IOBTrees (one record per indexed object) - Most indexes (those derived from Unindex) map RID to indexed value as an IOBTree (one record per object with a value meaningful to that index) and map values to RIDs as OOBTrees (where the second O is usually an IITreeSet). - ZCTextIndex uses IIBTrees to map word IDs to RIDs, in various ways, and make use of IOBTrees as wel.. - Relationship "indexes" (typically not stored within catalogs) usually have an IIBTree which is the mapping of the edges as pairs of internal node IDs (one per explicit relationship), with OIBTrees to map the user-supplied node value to a node ID. I would guess that if you could do a census of all the OIDs in all the Datas.fs in the world, a significant majority of them would be instances of classes declared in IOBTree / IIBTree (certainly the bulk of *transaction* records are going to be tied up with them). Tres. - -- =================================================================== Tres Seaver +1 202-558-7113 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFEPtfe+gerLs4ltQ4RAkDoAJ9998Bj5yMqVpKoQOn/s3Hf5GZkBwCcC4uY kXTqmBsu6vMYx4fzAOWF5uo= =yVVq -----END PGP SIGNATURE-----
[Tres Seaver]
... I would guess that if you could do a census of all the OIDs in all the Datas.fs in the world, a significant majority of them would be instances of classes declared in IOBTree / IIBTree (certainly the bulk of *transaction* records are going to be tied up with them).
Provided it still works, people can use ZODB's analyze.py to figure that out. But supposing "I" flavors of BTrees are the only objects that exist, what follows from that? It's not clear. I can guarantee that multiunion() will run slower, because its workhorse radix sort will need 8 (instead of 4) passes. Beyond that, it requires someone to try it. I'm reminded that when the MEMS Exchange wrote Durus (a kind of "ZODB lite" ;-): http://www.mems-exchange.org/software/durus/ ) they left their entire BTree implementation coded in Python -- it was "fast enough" that way. The difference between ZODB's BTree C code pushing 4 or 8 bytes around at a time may well be insignificant overall. If done carefully, pickle sizes probably won't change: cPickle has a large number of ways to pickle integers, and picks the smallest one needed to hold an integer's actual value. Provided the internal getstate() functions are careful to avoid Python longs when possible, bigger pickles won't happen unless more than 32 bits are actually needed to hold an integer. There's also that ZODB's current "I" trees are badly broken on 64-bit boxes (except for Win64) in at least this way: http://collector.zope.org/Zope/1592 That problem would go away by magic. looks-like-a-case-of-measure-twice-cut-once-ly y'rs - tim
Tres Seaver wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Jim Fulton wrote:
Tres Seaver wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Fred Drake wrote:
I have a need for 64-bit BTrees (at least for IOBTree and OIBTree), and I'm not the first. I've created a feature development branch for this, and checked in my initial implementation.
I've modified the existing code to use PY_LONG_LONG instead of int for the key and/or value type; there's no longer a 32-bit version in the modified code. Any Python int or long that can fit in 64 bits is accepted; ValueError is raised for values that require 65 bits (or more). Keys and values that can be reported as Python ints are, and longs are only returned when the value cannot be converted to a Python int.
This can have a substantial effect on memory consumption, since keys and/or values now take twice the space. There may be performance issues as well, but those have not been tested.
There are new unit tests, but more are likely needed.
If you're interested in getting the code from Subversion, it's available at:
svn://svn.zope.org/repos/main/ZODB/branches/fdrake-64bits/
Ideally, this or some variation on this could be folded back into the main development for ZODB. If this is objectionable, making 64-bit btrees available would require introducing new versions of the btrees (possibly named LLBTree, LOBTree, and OLBTree).
I think coming up with new types is the only reasonable thing to do, given the prevalence of persistent BTrees out in the wild. Changing the runtime behavior (footprint, performance) of those objects is probably not something which most users are going to want, at least not without carefully considering the implications.
It really depends on what the impact is. It would be nice to get a feel for whether this really impacts memory or performance for real applications. This adds 4-bytes per key or value. That isn't much, especially in a typical Zope application. Similarly, it's hard to say what the difference in C integer operations will be. I can easily imagine it being negligible (or being significant :).
OTOH, adding a new type could be a huge PITA. We'd like to use these with existing catalog and index code, all of which uses IIBTrees. If the performance impacts are modest, I'd much rather declare IIBTrees to use 64-bit rather than 32-bit integers.
I suppose an alternative would be to add a mechanism to configure IIBTrees to use either 32-bit or 64-bit integers at run-time.
Who uses IOBTree / OIBTree / IIBTree?
- Catalogs map RIDs to UIDs as IOBTrees (one record per indexed object)
- Most indexes (those derived from Unindex) map RID to indexed value as an IOBTree (one record per object with a value meaningful to that index) and map values to RIDs as OOBTrees (where the second O is usually an IITreeSet).
- ZCTextIndex uses IIBTrees to map word IDs to RIDs, in various ways, and make use of IOBTrees as wel..
- Relationship "indexes" (typically not stored within catalogs) usually have an IIBTree which is the mapping of the edges as pairs of internal node IDs (one per explicit relationship), with OIBTrees to map the user-supplied node value to a node ID.
I would guess that if you could do a census of all the OIDs in all the Datas.fs in the world, a significant majority of them would be instances of classes declared in IOBTree / IIBTree (certainly the bulk of *transaction* records are going to be tied up with them).
OK. I think we are misscommunicating. Using 64 bits for IIBTrees types would not in any way invalidate existing pickles. 64-bit IIBTrees types can be unpickled from existing data. Of course, someone who created 64-bit BTrees type instances that had values outside the 32-bit range would have trouble reading these values with 32-bit IIBTrees, The fact that IIBTrees is so widely used is exatly the reason I want to use 64-bits for the existing types rather than having to introduce a new type. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
On 4/17/06, Jim Fulton <jim@zope.com> wrote:
The fact that IIBTrees is so widely used is exatly the reason I want to use 64-bits for the existing types rather than having to introduce a new type.
Oops, I was checking in the separated version of 64-bit BTrees while this was landing in my inbox. ;-/ Once we determine which approach we're going with, I should make an alpha release of ZODB 3.7 and knit that into the Zope 3 trunk so we can get more testing in context. -Fred -- Fred L. Drake, Jr. <fdrake at gmail.com> "Don't let schooling interfere with your education." -- Mark Twain
Fred Drake wrote:
(possibly named LLBTree, LOBTree, and OLBTree).
For my half-penny's worth, this is the way I'd like to see it go. Explicit is better than implicit and all that. If you need more than 32-bits, you can explicitly use them. The implicit change to make them all 64-bits which results in some unknown slowdown for all "I" BTree users seems a bit too scary to bite off... Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
Previously Chris Withers wrote:
The implicit change to make them all 64-bits which results in some unknown slowdown for all "I" BTree users seems a bit too scary to bite off...
Has anyone done any benchmarks to prove that 64-bits is slower or faster? It would be interesting to see benchmarks on modern 32bit and a 64bit systems. Until then it's all hand-waiving. Wichert. -- Wichert Akkerman <wichert@wiggy.net> It is simple to make things. http://www.wiggy.net/ It is hard to make things simple.
Wichert Akkerman wrote:
Previously Chris Withers wrote:
The implicit change to make them all 64-bits which results in some unknown slowdown for all "I" BTree users seems a bit too scary to bite off...
Has anyone done any benchmarks to prove that 64-bits is slower or faster? It would be interesting to see benchmarks on modern 32bit and a 64bit systems. Until then it's all hand-waiving.
One data point: on my AMD 64 box, when I run 32 bit Linux, I get about 30,000 pystones, while 64 bit Linux yields 50,000 pystones. This doesn't mean 64 bit code is necessarily faster, but it does suggest that 64 bit CPUs prefer 64 bit code. Shane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Wichert Akkerman wrote:
Previously Chris Withers wrote:
The implicit change to make them all 64-bits which results in some unknown slowdown for all "I" BTree users seems a bit too scary to bite off...
Has anyone done any benchmarks to prove that 64-bits is slower or faster? It would be interesting to see benchmarks on modern 32bit and a 64bit systems. Until then it's all hand-waiving.
Sure, but the "burden of proof" needs to be on the party proposing the cnange. "First, do no harm" is a good principle in such cases. Tres. - -- =================================================================== Tres Seaver +1 202-558-7113 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFERpCv+gerLs4ltQ4RAuoIAJ9fKveUfI54DtYWnD9pWDudhqf5LACgmZu1 GFD9FgvQDR9BdvIyTpIm2tI= =QgmO -----END PGP SIGNATURE-----
Fred Drake wrote:
I have a need for 64-bit BTrees (at least for IOBTree and OIBTree), and I'm not the first. I've created a feature development branch for this, and checked in my initial implementation.
I've modified the existing code to use PY_LONG_LONG instead of int for the key and/or value type; there's no longer a 32-bit version in the modified code. Any Python int or long that can fit in 64 bits is accepted; ValueError is raised for values that require 65 bits (or more). Keys and values that can be reported as Python ints are, and longs are only returned when the value cannot be converted to a Python int.
This can have a substantial effect on memory consumption, since keys and/or values now take twice the space. There may be performance issues as well, but those have not been tested.
There are new unit tests, but more are likely needed.
If you're interested in getting the code from Subversion, it's available at:
svn://svn.zope.org/repos/main/ZODB/branches/fdrake-64bits/
Ideally, this or some variation on this could be folded back into the main development for ZODB. If this is objectionable, making 64-bit btrees available would require introducing new versions of the btrees (possibly named LLBTree, LOBTree, and OLBTree).
-1 to the L*BTree flavours. The 'long' type is disappearing from Python anyways, why should the implementation detail worry us regarding BTrees? +1 to making the I*BTree flavours 64bit-aware. If Jim is right, existing pickles will be read back just fine and as long as the integers stay under 32 bit, they won't even be bigger. The only real implication therefore is the pushing around twice as many bits in memory. Dieter estimates 20% to 35% slowdown for the C algorithms (whatever that means), Tim seems to think it won't have such a big effect. I guess we'll only know after some benchmarks. Philipp
Philipp von Weitershausen wrote:
in memory. Dieter estimates 20% to 35% slowdown for the C algorithms (whatever that means), Tim seems to think it won't have such a big effect. I guess we'll only know after some benchmarks.
Can we please not make any definite decisions until this issue has been resolved, some of us do actually care about performance ;-) Yes, ideally, I'd like to see just IIBTree's but only if there are not perfomance implications. I think BTrees sit low enough in the stack that it's perfectly justifiable to have both an I BTree and an L BTree. If having two isn't acceptable, then why do we have an I and O BTree's, not to mention the special ones used for in-memory ZODB indexes? Surely we should just have one BTree class? cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
Chris Withers wrote:
Philipp von Weitershausen wrote:
in memory. Dieter estimates 20% to 35% slowdown for the C algorithms (whatever that means), Tim seems to think it won't have such a big effect. I guess we'll only know after some benchmarks.
Can we please not make any definite decisions until this issue has been resolved, some of us do actually care about performance ;-)
Nobody made any decisions. I just expressed my opinion. In the above quoted paragraph I merely said that the only real implication seems to be a possible performance loss (pickle size and pickle compatibility seems not to be a problem). Since we don't know how much of a performance loss that would be, I suggested Fred would do some benchmarks so that the decision we would come to eventually would be an informed one and not be based on wild guesses.
Yes, ideally, I'd like to see just IIBTree's but only if there are not perfomance implications. I think BTrees sit low enough in the stack that it's perfectly justifiable to have both an I BTree and an L BTree.
If having two isn't acceptable, then why do we have an I and O BTree's, not to mention the special ones used for in-memory ZODB indexes? Surely we should just have one BTree class?
I see your point. Note that this "one" BTree class you're talking about already exists: OOBTree :). The I*BTree flavours are just there for optimization because things like the catalog deal with integer IDs a lot. Now they have to deal with larger integers than before; I'd rather not create yet another special BTree flavour. Philipp
On 4/13/06, Fred Drake <fdrake@gmail.com> wrote:
I've created a feature development branch for this, and checked in my initial implementation.
I've made another branch for this, with a different twist. I'm not sure it'll be interesting, but I think it'll solve my immediate need until I can get around to reasonable testing of the performance implications. The "fdrake-optional-64bits" branch will compile using the C "int" type for "I" keys and values by default, and using the "PY_LONG_LONG" type if ZODB_64BIT_INTS is defined. This allows 64-bit BTrees by building ZODB like this: python setup.py build_ext -D ZODB_64BIT_INTS build BTrees.IIBTree.using64bits will be True if ZODB_64BIT_INTS is used. -Fred -- Fred L. Drake, Jr. <fdrake at gmail.com> "Don't let schooling interfere with your education." -- Mark Twain
participants (8)
-
Chris Withers -
Fred Drake -
Jim Fulton -
Philipp von Weitershausen -
Shane Hathaway -
Tim Peters -
Tres Seaver -
Wichert Akkerman