[ZODB-Dev] Problems with Transactions and FileStorage size
Casey Duncan
casey@zope.com
Fri, 19 Jul 2002 08:51:56 -0400
You could use a packless storage, but a better solution would be to not u=
se a=20
simple list for this case. Anytime the list changes the teeniest bit, the=
=20
whole thing must be updated, leading to large transactions and db bloat.=20
Packless storage won't fix the former which is a major performance issue=20
(along with write conflicts if its multi-threaded).
A similar problems happens with Zope ObjectManagers (aka Foldoids) becaus=
e=20
they store a list of object ids and types. When you have a big folder,=20
changing objects in it becomes much more expensive and can lead to huge=20
transactions.
A BTree is a much more efficient data structure that is built to avoid th=
is=20
issue. It is also built to handle write conflicts. However a BTree is mor=
e=20
like a dictionary than a list. One could imagine a BTree (an IOBTree to b=
e=20
precise) where the keys are simply sequencial integer indexes and the val=
ues=20
are the elements or maybe your application could just be refactored aroun=
d=20
using the BTree interface.
One could image a straightforward class using BTrees that has an interfac=
e=20
identical to a list. It is likely such a thing already exists, although I=
'm=20
not aware of one. Managing the keys on deletion would be the trickiest bi=
t to=20
deal with efficiently, but I'm sure it could be done. You could store the=
=20
length using BTrees.Length.Length.
Another option would be to create a linked list structure where each elem=
ent=20
has a reference to the next one in the "list". If random access was not a=
big=20
concern, this would be a simpler solution and would still eliminate the b=
loat=20
problem. Of course each element would need to be a persistent class insta=
nce.
hth,
-Casey
On Friday 19 July 2002 08:12 am, Heiko Hees wrote:
> Hi,
>=20
> i am looking for a switch, to prevent logging of transactions, since=20
> this seems to heavily grow file size.
>=20
> if i run the following program (first run generates an object with an=20
> array, second run changes the array an commits a 1000 times) file size
> grows as follows:
>=20
> heiko@julie:~/tests/persistent$ ls -al a*
> -rw-r--r-- 1 heiko heiko 3158 Jul 19 14:08 a
> -rw-r--r-- 1 heiko heiko 3 Jul 19 14:08 a.lock
> -rw-r--r-- 1 heiko heiko 2966 Jul 19 14:08 a.tmp
> heiko@julie:~/tests/persistent$ ./dbsizeTest.py a
> heiko@julie:~/tests/persistent$ ls -al a*
> -rw-r--r-- 1 heiko heiko 2856903 Jul 19 14:08 a
> -rw-r--r-- 1 heiko heiko 3 Jul 19 14:08 a.lock
> -rw-r--r-- 1 heiko heiko 2823 Jul 19 14:08 a.tmp
>=20
> does anyone have a hint other than running db.pack()?
>=20
> heiko
>=20
> the program:
>=20
> #!/usr/bin/python
> import ZODB, sys,time
> from Persistence import Persistent
> from ZODB import FileStorage, DB
>=20
> class X(Persistent):
> def __init__(self):
> self.a =3D []
> for i in range(1000):
> self.a.append(i)
> self._p_changed =3D 1
>=20
> def change(self):
> self.a[0] +=3D1
> self._p_changed =3D 1
>=20
>=20
>=20
> db =3D DB( FileStorage.FileStorage(sys.argv[1]) )
> connection =3D db.open()
> root =3D connection.root()
>=20
>=20
> if not root.has_key('x'):
> # first run
> root['x'] =3D X()
> get_transaction().commit()
> else:
> # second run
> for i in range(1000):
> root['x'].change()
> get_transaction().commit()
>=20
> connection.close()
>=20
>=20
> --=20
> brainbot technologies ag
> schwalbacherstr. 74 65183 wiesbaden . germany
> vox +49 611 238505-0 fax ++49 611 238505-1
> http://brainbot.com/ mailto:heiko@brainbot.com
>=20
>=20
>=20
> _______________________________________________
> For more information about ZODB, see the ZODB Wiki:
> http://www.zope.org/Wikis/ZODB/
>=20
> ZODB-Dev mailing list - ZODB-Dev@zope.org
> http://lists.zope.org/mailman/listinfo/zodb-dev
>=20