Versioned connectors from ZODB
If I get a versioned connection from the ZODB: conn = Zope.DB.open(version="myVersion") root = conn.root() app = root['Application'] # do some stuff get_transaction().commit() conn.close() Are the changes now in a version? How do I get those changes rolled into the "trunk" version of the ZODB? I guess all objects changed in the version will now be "locked" to that version until I apply the version changes to the trunk? Thanks Etienne
ZODB versions are deprecated, unsupported, buggy and hard to use. Don't use them. Florent Etienne Labuschagne <elabuschagne@gmail.com> wrote:
If I get a versioned connection from the ZODB:
conn = Zope.DB.open(version="myVersion") root = conn.root() app = root['Application']
# do some stuff
get_transaction().commit() conn.close()
Are the changes now in a version? How do I get those changes rolled into the "trunk" version of the ZODB? I guess all objects changed in the version will now be "locked" to that version until I apply the version changes to the trunk?
-- Florent Guillaume, Nuxeo (Paris, France) CTO, Director of R&D +33 1 40 33 71 59 http://nuxeo.com fg@nuxeo.com
On 7/11/05, Florent Guillaume <fg@nuxeo.com> wrote:
ZODB versions are deprecated, unsupported, buggy and hard to use. Don't use them.
Florent
And as I understand, so are temporary connections too. That leaves me with getting a "normal" ZODB connection from the pool which I don't want to do. I really need a "temporary" connection that I can discard. This connection can have a much smaller cache than the normal connections as it makes very little difference in the speed of data loading. Second prize is a connection that will only be used by a specific process and never used for other processes. Versions solves this for me. I can check out a connection and keep it aside only for data loading. But this means that I waste precious memory on a connection that does not really need to cache the amount of objects that the other connections should. In my case, this translates to using 1GB of RAM on one connection that gets used once a day. Please believe me that I really need a "special" connection. For those who really want to know why, below is an attempt at an explanation why: In the application that I have written, I want to be able to get connections that are not part of the normal connection pool. Once my process is finished, I can store these connections for later use, or discard them. Currently my application uses the normal connections in the pool. The problem is that this process "contaminates" the cache of the connections with objects that are not used in "normal" client application use (I use a thick client). This means that the client applications are extremely slow the next day and that it takes a long time before the cache contains the often used objects again.
From there the reason why I DON'T want to use the connections for my once a day data loading process.
My ZODB contains about 700`000 objects. A connection caches about 60`000 objects to give satisfactory client speed. To start up the client before the cache is initialized, takes about 5 minutes. Once the cache is populated, it takes a client seconds to start up. Data loading invalidates all of this, but is worse than a "clean" cache in that it takes long for the "new" objects in the cache to be flushed and replaced by the often used objects again. Data loading does not need such a big cache since it mostly loads data into the ZODB. Unfortunately, the loaded objects also end up in the cache. Why do I need so many objects in the cache? Some searches cannot be done with a mere ZCatalog search and have to run through a subset of all the objects. These tend to fit nicely in the cache.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Etienne Labuschagne wrote:
On 7/11/05, Florent Guillaume <fg@nuxeo.com> wrote:
ZODB versions are deprecated, unsupported, buggy and hard to use. Don't use them.
Florent
And as I understand, so are temporary connections too. That leaves me with getting a "normal" ZODB connection from the pool which I don't want to do.
I really need a "temporary" connection that I can discard. This connection can have a much smaller cache than the normal connections as it makes very little difference in the speed of data loading. Second prize is a connection that will only be used by a specific process and never used for other processes. Versions solves this for me.
I can check out a connection and keep it aside only for data loading. But this means that I waste precious memory on a connection that does not really need to cache the amount of objects that the other connections should. In my case, this translates to using 1GB of RAM on one connection that gets used once a day.
Please believe me that I really need a "special" connection. For those who really want to know why, below is an attempt at an explanation why:
In the application that I have written, I want to be able to get connections that are not part of the normal connection pool. Once my process is finished, I can store these connections for later use, or discard them. Currently my application uses the normal connections in the pool. The problem is that this process "contaminates" the cache of the connections with objects that are not used in "normal" client application use (I use a thick client). This means that the client applications are extremely slow the next day and that it takes a long time before the cache contains the often used objects again.
From there the reason why I DON'T want to use the connections for my once a day data loading process.
My ZODB contains about 700`000 objects. A connection caches about 60`000 objects to give satisfactory client speed. To start up the client before the cache is initialized, takes about 5 minutes. Once the cache is populated, it takes a client seconds to start up. Data loading invalidates all of this, but is worse than a "clean" cache in that it takes long for the "new" objects in the cache to be flushed and replaced by the often used objects again. Data loading does not need such a big cache since it mostly loads data into the ZODB. Unfortunately, the loaded objects also end up in the cache.
Why do I need so many objects in the cache? Some searches cannot be done with a mere ZCatalog search and have to run through a subset of all the objects. These tend to fit nicely in the cache.
Your query would be better served on the zodb-dev list, where Tim Peters hangs out; he can probably explain how to get what you want without guessing. If I had to guess, I would suggest constructing your connection programmatically, where you can specify the object cache size for instance, and then closing / discarding the connection when you are done. Tres. - -- =================================================================== Tres Seaver +1 202-558-7113 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFC0p6H+gerLs4ltQ4RAkxMAKCYMC7HKTddmCgog+yip3GZd/AChgCgr0k8 APQ337C2zCxBQBOYIuWFKNU= =MVtV -----END PGP SIGNATURE-----
[Etienne Labuschagne]
... I really need a "temporary" connection that I can discard. This connection can have a much smaller cache than the normal connections as it makes very little difference in the speed of data loading. Second prize is a connection that will only be used by a specific process and never used for other processes. Versions solves this for me.
Maybe like death would solve my problem with overdue taxes <wink>. Connection pools are associated with DB instances, so if you want connections with different characteristics, create another DB instance. Like, e.g., in the ZODB 3.2 line, otherdb = ZODB.DB(storage, cache_size=100, pool_size=2) Then connections obtained via otherdb.open() will hang if two threads already have connections from `otherdb` (that's the effect of `pool_size`), and will have ZODB memory caches that strive to keep no more than 100 objects in memory across transaction boundaries (the effect of `cache_size`). This is easiest if you're using ZEO (ClientStorage), because doing otherdb.close() also calls close() on the DB's storage. If you, e.g., share a FileStorage directly across multiple DBs, closing any one of the DBs will close the FileStorage across all the DBs using that FileStorage. ZEO makes it easy to open multiple ClientStorage's "on top of" of a single FileStorage, which can be closed independently. If you never close otherdb, this isn't an issue. This answer assumes you're using ZODB directly. I don't know details of how to spell it from within a Zope application (if that's what you need -- unsure).
Tim,
. . . Versions solves this for me.
Maybe like death would solve my problem with overdue taxes <wink>.
I did get the versioned connections to work (so far), BUT, I will definately take your word on it and seek another solution :)
<snip> Like, e.g., in the ZODB 3.2 line,
otherdb = ZODB.DB(storage, cache_size=100, pool_size=2)
Then connections obtained via otherdb.open() will hang if two threads already have connections from `otherdb` (that's the effect of `pool_size`), and will have ZODB memory caches that strive to keep no more than 100 objects in memory across transaction boundaries (the effect of `cache_size`). <snip>
to double check: otherdb = ZODB.DB(existingdb._storage, cache_size=100, pool_size=2) is ok? It seems that you can create more than one DB instance that shares one storage object. I hit upon the idea of creating another DB instance and sharing the storage object myself yesterday, but wasn't sure what the repurcussions will be. Your post answers most of my questions. I have one left, though: if I do decide to share the storage object (and not go ZEO for whatever reason), will the caches between the two DB objects not get out of sync? In other words, will one DB object know to invalidate objects in it's caches should that object be changed through another DB instance? I know ZEO does this for you, but I'd like to know what the case would be for two DBs in one process. My other option is to create the connections "by hand" (that way I can control the cache size easily) and keep my own little pool of connections with a modified close method that does not put my connections back into the "normal" pool. But I'm afraid I may end up with a new can of worms that way.
This answer assumes you're using ZODB directly. I don't know details of how to spell it from within a Zope application (if that's what you need -- unsure).
I use the ZODB directly, but from within Zope. The connections are used in long-running processes that are not nescesarily browser-triggered. Some of them are scheduled events that are started up in their own thread. From there the need to get new connections to the ZODB. I have quite a bit of experience working safely with multiple threads and the ZODB, so I'm sure I have that part right. My problem had more to do with "cache contamination" and reserving "special connections" for specific processes. Thanks for the reply Etienne
[Etienne Labuschagne]
. . . Versions solves this for me.
[Tim Peters]
Maybe like death would solve my problem with overdue taxes <wink>.
[Etienne]
I did get the versioned connections to work (so far), BUT, I will definately take your word on it and seek another solution :)
If that works for you, don't let nay-sayers scare you away. I don't think there are any reports of version bugs open in the Zope collector at present -- but that could just mean that everyone stays away from them now.
<snip> Like, e.g., in the ZODB 3.2 line,
otherdb = ZODB.DB(storage, cache_size=100, pool_size=2)
Then connections obtained via otherdb.open() will hang if two threads already have connections from `otherdb` (that's the effect of `pool_size`), and will have ZODB memory caches that strive to keep no more than 100 objects in memory across transaction boundaries (the effect of `cache_size`). <snip>
to double check:
otherdb = ZODB.DB(existingdb._storage, cache_size=100, pool_size=2)
is ok? It seems that you can create more than one DB instance that shares one storage object.
The code won't stop you from doing that, but as I said last time, I'd use ZEO and use a fresh ClientStorage for each DB. ZEO was designed to support this kind of use; nothing else was.
I hit upon the idea of creating another DB instance and sharing the storage object myself yesterday, but wasn't sure what the repurcussions will be.
Neither am I, if you don't use ZEO. Normally I'd spend time digging into the code trying to find answers, but I don't have time for that today. It's possible that if you asked on the zodb-dev list, Jim Fulton or Jeremy Hylton would know more answers off the tops of their heads. Sorry, but I don't.
Your post answers most of my questions.
At least the ZEO part did <wink>.
I have one left, though: if I do decide to share the storage object (and not go ZEO for whatever reason), will the caches between the two DB objects not get out of sync? In other words, will one DB object know to invalidate objects in it's caches should that object be changed through another DB instance? I know ZEO does this for you, but I'd like to know what the case would be for two DBs in one process.
See above: ZEO should work fine. If you try to do it without ZEO, I'm not sure what will happen. I pointed out one "obvious" bad consequence of trying to share a storage last time (that closing any DB will close the storage across all DB's sharing that storage). In general, invalidations get sent out by a DB, to all (& only) the connections obtained from that DB. So yes, if you're not using ZEO (which goes on to broadcast invalidations to all connected clients), caches can get out of synch across DBs. But I don't know whether that matters to you either. For example, perhaps you're willing to create a new DB whenever you need a temporary connection, and what you do with it then is read-only and finishes quickly, or ... I just don't know.
My other option is to create the connections "by hand" (that way I can control the cache size easily) and keep my own little pool of connections with a modified close method that does not put my connections back into the "normal" pool. But I'm afraid I may end up with a new can of worms that way.
I'd definitely advise against that. The Connection constructor isn't meant to be called outside of ZODB internals. Note that you can't even call it without passing a db, and there's an intricate dance between Connection and DB methods that's mostly undocumented and hard to get right. ....
I use the ZODB directly, but from within Zope. The connections are used in long-running processes that are not nescesarily browser-triggered. Some of them are scheduled events that are started up in their own thread. From there the need to get new connections to the ZODB. I have quite a bit of experience working safely with multiple threads and the ZODB, so I'm sure I have that part right. My problem had more to do with "cache contamination" and reserving "special connections" for specific processes.
Since there's no machinery aiming specifically at that, I'm afraid it's bound to be painful one way or another -- except that, using ZEO, it sounds quite straightforward.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Tim Peters wrote:
[Etienne Labuschagne]
I did get the versioned connections to work (so far), BUT, I will definately take your word on it and seek another solution :)
If that works for you, don't let nay-sayers scare you away. I don't think there are any reports of version bugs open in the Zope collector at present -- but that could just mean that everyone stays away from them now.
The community has grown averse to using versions because they interact poorly with content catalogs (by locking the individual BTree buckets in the catalog's indexes). If Etienne's need doesn't involve touching the catalog, or if he can affort to do catalog-munging updates only within a version, then versions will work as designed. They are still a nice way to experiment with customizing ZPT, etc. (which won't typically touch the catalog). Tres. - -- =================================================================== Tres Seaver +1 202-558-7113 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFC1IRT+gerLs4ltQ4RAqSBAKCGAwmk5l1mCTIDGC1ld2VMDj1ePgCeJwcF BCsoIDBmP1jKKldsufDvR3c= =pyA/ -----END PGP SIGNATURE-----
participants (4)
-
Etienne Labuschagne -
Florent Guillaume -
Tim Peters -
Tres Seaver