[Zope-dev] BTrees and Persistance
Tim Peters
tim.peters at gmail.com
Wed Jun 1 18:43:20 EDT 2005
[Yair Benita]
> I recently started to use ZODB and python as my chosen database
> solution.
> I am having a few problems with retaining changes in BTrees. I have
> read the documentation and am aware of the _p_changed attribute. Still,
> here is what I observe:
>
> ############################
> # consider this simplified example
> # lets ignore the open database method for now.
> T = OOBTree()
>
> # my trees contain integers as keys and sets as values
> T.update({1:set([1,2,3]), 2:set([5,6,7])})
>
> # I would really really like this to work
> T[1].add(6)
> T._p_changed = True
> get_transaction().commit()
>
> # but it doesn't.
> # changes are not saved when I close the database and reopen it
That's right. A Python set is not itself a persistent object, and so
doesn't magically inform the persistence system when it mutates. A
BTree is a persistent object, but under the covers it's actually a
(potentially very large) graph of distinct persistent objects. That's
what makes it scalable. Setting T._p_changed told the root object of
this graph that its state changed, and that doesn't do you any good
(in fact, the root object did not change). There is no direct way to
access the interior BTree and Bucket nodes in a BTree graph, so you
need to be trickier to make this work.
> # This works:
> T[1].add(6)
> T.update({1:T[1]})
The conventional idiom should also work:
T[1] = T[1]
That manages to (in effect) set _p_changed on the invisible (to you)
interior Bucket node holding T[1]. Sometimes you'll see code like
this:
some_object.some_attr = some_object.some_attr
That's the same trick. For example, if p is an instance of some
persistent class, and p.list is a Python list, then
p.list.append(42)
doesn't mark p as changed, but adding
p.list = p.list
does mark it changed. Whether that's more or less obscure than
p._p_changed = True
is somewhat in the eye of the beholder.
> The thing is my sets tend to be very big
Then you definitely don't want to use a non-persistent type for this.
The entire state of a non-persistent object gets stored all over again
when _any_ part of it changes. That's an easy way to change a
linear-time algorithm into a quadratic-time one.
> and I am not sure but I think that using T.update({1:T[1]}) will slow me
> down since a dictionary is first created with a copy of the set which is
> very big
No; Python never, ever makes a copy of anything unless you explicitly
ask for a copy. T.update({1: T[1]}) is just a little slower than
T[1]=T[1], and both ways just move a few pointers around, independent
of how large len(T[1]) may be.
> and then the OOBTree is updated. Or am I wrong here?
As above. What really kills you here is that the _commit_ time is
proprotional to len(T[1]), because the entire state of a
non-persistent object is stored to disk whenever any part of it
changes. That's why people recommend using an IITreeSet instead.
Like a BTree, that's actually a (potentially very large) graph of
independent persistent objects. Do not use an IISet, use an IITreeSet
here. An IISet is a single persistent object, and has the same
problem as a plain Python set in that the entire state needs to be
stored whenever any piece changes. That isn't true of an IITreeSet.
More information about the Zope-Dev
mailing list