[ZODB-Dev] [fsIndex] surprizing documentation -- inefficiency?
Jim Fulton
jim at zope.com
Mon Jan 21 11:35:43 EST 2008
On Jan 21, 2008, at 6:09 AM, Dieter Maurer wrote:
> "ZODB.fsIndex" tells us in its source code documentation that it
> splits
> the 8 byte oid into a 6 byte prefix and a two byte suffix and
> represents the index by an "OOBTree(prefix -> fsBucket(suffix ->
> position))"
>
> It explains that is uses "fsBucket" (instead of a full tree) because
> the "suffix -> position" would contain at most 256 entries.
>
> This explanation surprises me a bit: why should the bucket contain
> only 256 rather than 256 * 256 (= 64.000) entries?
You are right. The documentation is incorrect. I just fixed it on the
trunk.
> If the assumption is wrong (i.e. the "fsBucket" can contain up to
> 64.000 entries), is the implementation inefficient (because of that)?
I don't think so. This is an in-memory data structure, so there isn't
a problem with a big data structure repeatedly being written to the
file. Also, because new keys are normally computed sequentially, they
will be added at the end of the bucket, so we won't incur the cost of
shifting lots of data around.
It's possible that this is more expensive than one might want during
packing, when inserts are likely to be a bit more random. It might be
interesting to do a memory and cpu comparison using buckets and trees.
Jim
--
Jim Fulton
Zope Corporation
More information about the ZODB-Dev
mailing list