Is object lookup in a container O(log n)?
Hi, I just wanted to make sure of something before I make a change that will stick more than 1000 objects in one container... when you're looking up an object by ID, that's a O(log n) function, right? Kevin
Kevin Dangoor wrote:
Hi,
I just wanted to make sure of something before I make a change that will stick more than 1000 objects in one container... when you're looking up an object by ID, that's a O(log n) function, right?
It depends on the container. Folders use Python dictionaries, which use hashing. This is much faster than O(log n), if the container is already loaded in memory. One problem with using Python dictionaries is that for *really* big (hundreds of thousands of objects) containers, you have to have have alot of information in memory and reloading the container into memory if O(n). We plan, in the future, to provide folder-like objects suited to very large collections. These will be based on BTrees and will be O(log n) for lookup. Catalog uses BTrees and is O(log n) for searches. Jim -- Jim Fulton mailto:jim@digicool.com Technical Director (888) 344-4332 Python Powered! Digital Creations http://www.digicool.com http://www.python.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats.
----- Original Message ----- From: "Jim Fulton" <jim@digicool.com> To: "Kevin Dangoor" <kid@kendermedia.com> Cc: <zope-dev@zope.org> Sent: Tuesday, February 08, 2000 9:53 AM Subject: Re: [Zope-dev] Is object lookup in a container O(log n)?
Folders use Python dictionaries, which use hashing. This is much faster than O(log n), if the container is already loaded in memory. One problem with using Python dictionaries is that for *really* big (hundreds of thousands of objects) containers, you have to have have alot of information in memory and reloading the container into memory if O(n).
Ahh. I hadn't realized that Python dictionaries use hashing. That makes sense for most applications. I assume an ObjectManager ZClass will be the same, performance-wise, as a Folder. Do you have any idea where the threshold is as far as the hash table is concerned? For my own application, I'd be looking at 1000-2000 objects right now. For others (since this is code I'm thinking of releasing), they could be looking at much more. I can consider doing something like breaking off the first two characters of the identifiers and creating containers with those characters as an id and splitting the objects up in those folders.
We plan, in the future, to provide folder-like objects suited to very large collections. These will be based on BTrees and will be O(log n) for lookup. Catalog uses BTrees and is O(log n) for searches.
Sounds good... Kevin
On Tue, 8 Feb 2000 10:49:42 -0500, "Kevin Dangoor" <kid@kendermedia.com> wrote:
I assume an ObjectManager ZClass will be the same, performance-wise, as a Folder. Do you have any idea where the threshold is as far as the hash table is concerned?
Dictionaries increase the size of their table to prevent it getting too full. Performance degenerates quite smoothly.
For my own application, I'd be looking at 1000-2000 objects right now. For others (since this is code I'm thinking of releasing), they could be looking at much more.
Hashing will not be a problem here. However something else might be.... When the container is activated, the object database will have to load all 2000 objects.
I can consider doing something like breaking off the first two characters of the identifiers and creating containers with those characters as an id and splitting the objects up in those folders.
This lets ZODB load the objects more gradually. If a different grouping scheme would improve locality, then that would be better still.
We plan, in the future, to provide folder-like objects suited to very large collections. These will be based on BTrees and will be O(log n) for lookup. Catalog uses BTrees and is O(log n) for searches.
Sounds good...
MMmmmmm. Toby Dickenson tdickenson@geminidataloggers.com
participants (3)
-
Jim Fulton -
Kevin Dangoor -
Toby Dickenson