how bad are per-request-write-transactions
Hi, How bad are per-request transactions in a non-ZEO environment? I.e. each request on a folder or its subobjects will cause a write transaction (somewhat like a non-fs counter, but worse as it happens for all subobjects) And if this is really bad, are there any workarounds except for writing to the filesystem? Cheers Ivo -- Drs. I.R. van der Wijk -=- Brouwersgracht 132 Amaze Internet Services V.O.F. 1013 HA Amsterdam, NL -=- Tel: +31-20-4688336 Linux/Web/Zope/SQL/MMBase Fax: +31-20-4688337 Network Solutions Web: http://www.amaze.nl/ Consultancy Email: ivo@amaze.nl -=-
This will kill performance, especially concurrent use of the site. It will also cause large amounts of database bloat. Do you need real time numbers, or is a delay (such as 24 hours) acceptable? If you can stand a delay, another approach would be to write a script which scans the z2.log file (or another log that you generate on page hits) each night and in a single transaction updates a counter on each object hit. If you use the z2.log, no additional writing is needed to the FS, and you get the benefit of easy access to the counts directly from the objects, without degrading performance or db bloat. -Casey Ivo van der Wijk wrote:
Hi,
How bad are per-request transactions in a non-ZEO environment? I.e. each request on a folder or its subobjects will cause a write transaction (somewhat like a non-fs counter, but worse as it happens for all subobjects)
And if this is really bad, are there any workarounds except for writing to the filesystem?
Cheers
Ivo
I developed a profiler service for a production site about 8 months ago. I essentially did what you are asking. I needed to see how customers were using the various navigational elements and other services provided within the site layout. The logging service could not give me a sense of the context. To make a long story short, I had a method in the standard_html_header that kicked off the evaluation process. I essentially created a mirror of the site (containers/sub-containers/methods) for each hit for each day for each month , etc... This provided me with a way to see specific site activity in real-time. Each object that was evaluated (for each day) had two tinyTable instances. One recorded each hit as a record (IP, referrer, username, time) while the other tallied the numbers per hit (per unique IP). This was all running on a Sun on a terrible network and I saw little or no performance difference and the ZODB growth was as you might expect adding the additional folder objects and tinyTable instances. It wasn't a high profile site (about 3000 hits per week). I ran the service for three months with no problems. The key was the hits recorded in the tinyTable's did not create a ZODB transaction. Hope this helps Eric ----- Original Message ----- From: "Casey Duncan" <casey@zope.com> To: "Ivo van der Wijk" <ivo@amaze.nl> Cc: <zope-dev@zope.org> Sent: Tuesday, April 16, 2002 10:04 AM Subject: Re: [Zope-dev] how bad are per-request-write-transactions
This will kill performance, especially concurrent use of the site. It will also cause large amounts of database bloat. Do you need real time numbers, or is a delay (such as 24 hours) acceptable?
If you can stand a delay, another approach would be to write a script which scans the z2.log file (or another log that you generate on page hits) each night and in a single transaction updates a counter on each object hit.
If you use the z2.log, no additional writing is needed to the FS, and you get the benefit of easy access to the counts directly from the objects, without degrading performance or db bloat.
-Casey
Ivo van der Wijk wrote:
Hi,
How bad are per-request transactions in a non-ZEO environment? I.e. each request on a folder or its subobjects will cause a write transaction (somewhat like a non-fs counter, but worse as it happens for all subobjects)
And if this is really bad, are there any workarounds except for writing to the filesystem?
Cheers
Ivo
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
I don't agree that high write is always forbidden. I think there are plenty of cases where this can work. It simply becomes unworkable much sooner than other data systems (e.g. a relational database or FS-based solution). For instance, think about bloat for a second. Let's be crazy and say it takes 100 bytes to store an integer representing a count. Let's say you write once a second. That's under 7Mb a day (per counter). Combined with hourly packing, that might be well within limits. Let's take the next step and say that you can live with a little volatility in the data. You write an object that caches ten seconds worth of writes. Whenever a write comes in at the over-ten-second mark, you write the _v_ attribute to the persistent attribute. There's an order of magnitude improvement. Finally, you store all your counters in a non-versioned storage. Now you have *no* bloat problem. :^) Regarding performance, maybe his application isn't doing 50 requests/second and he'd be willing to trade the slight performance hit and bloat for a decrease in system complexity. All of the above has downsides as well. My point, though, is that we shouldn't automatically dismiss the zodb as inappropriate for *all* high-write situations. In fact, with Andreas and Matt Hamilton's TextIndexNG, you might even be able to write to catalogued applications at a faster rate than one document per minute. :^) --Paul Casey Duncan wrote:
This will kill performance, especially concurrent use of the site. It will also cause large amounts of database bloat. Do you need real time numbers, or is a delay (such as 24 hours) acceptable?
If you can stand a delay, another approach would be to write a script which scans the z2.log file (or another log that you generate on page hits) each night and in a single transaction updates a counter on each object hit.
If you use the z2.log, no additional writing is needed to the FS, and you get the benefit of easy access to the counts directly from the objects, without degrading performance or db bloat.
-Casey
Ivo van der Wijk wrote:
Hi,
How bad are per-request transactions in a non-ZEO environment? I.e. each request on a folder or its subobjects will cause a write transaction (somewhat like a non-fs counter, but worse as it happens for all subobjects)
And if this is really bad, are there any workarounds except for writing to the filesystem?
Cheers
Ivo
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
On Wed, Apr 17, 2002 at 07:54:04AM -0400, Paul Everitt wrote:
I don't agree that high write is always forbidden. I think there are plenty of cases where this can work. It simply becomes unworkable much sooner than other data systems (e.g. a relational database or FS-based solution).
First of all, what the product does is, besides accounting the number of objects and their total size, measure traffic / number of hits. This way, zopehosters (like us) can measure the traffic generated by customers and optionally block further traffic.
For instance, think about bloat for a second. Let's be crazy and say it takes 100 bytes to store an integer representing a count. Let's say you write once a second. That's under 7Mb a day (per counter). Combined with hourly packing, that might be well within limits.
I'd still rather avoid such numbers
Let's take the next step and say that you can live with a little volatility in the data. You write an object that caches ten seconds worth of writes. Whenever a write comes in at the over-ten-second mark, you write the _v_ attribute to the persistent attribute. There's an order of magnitude improvement.
This sounds like a good solution. Actually, I only have to write to the persited attribute if the object is persisted itself (i.e. in setstate), until then, I'll just add the self._counter and self._v_counter, right?
Finally, you store all your counters in a non-versioned storage. Now you have *no* bloat problem. :^)
How would this work? (Wouldn't this be a solution for some of the HelpSystem issues as well?) Cheers, Ivo -- Drs. I.R. van der Wijk -=- Brouwersgracht 132 Amaze Internet Services V.O.F. 1013 HA Amsterdam, NL -=- Tel: +31-20-4688336 Linux/Web/Zope/SQL/MMBase Fax: +31-20-4688337 Network Solutions Web: http://www.amaze.nl/ Consultancy Email: ivo@amaze.nl -=-
Ivo van der Wijk writes:
On Wed, Apr 17, 2002 at 07:54:04AM -0400, Paul Everitt wrote:
Let's take the next step and say that you can live with a little volatility in the data. You write an object that caches ten seconds worth of writes. Whenever a write comes in at the over-ten-second mark, you write the _v_ attribute to the persistent attribute. There's an order of magnitude improvement.
This sounds like a good solution. Actually, I only have to write to the persited attribute if the object is persisted itself (i.e. in setstate), until then, I'll just add the self._counter and self._v_counter, right? I would expect, you lose lots of counts with this approach (at least, if you are not careful):
Each connection has its own "_v_counter" copy which count independent from one another. If one of the parent objects is written, the copies of it in the other connections are invalidated and later reloaded from the ZODB. I expect the original "_v_counter" values are lost. Dieter
On Wed, 17 Apr 2002 07:54:04 -0400, Paul Everitt <paul@zope.com> wrote:
Whenever a write comes in at the over-ten-second mark, you write the _v_ attribute to the persistent attribute.
That would be bad. _v_ attributes are lost when the object is deactivated and removed from the ZODB memory cache.... It would lose the majority of counts for all the most frequently accessed counters. For this approach, the right implementation is to store the incremental count somewhere other than a persistent object.
Finally, you store all your counters in a non-versioned storage. Now you have *no* bloat problem. :^)
Is there one which is also low maintenance? Toby Dickenson tdickenson@geminidataloggers.com
Toby Dickenson <tdickenson@geminidataloggers.com> wrote:
Whenever a write comes in at the over-ten-second mark, you write the _v_ attribute to the persistent attribute.
That would be bad. _v_ attributes are lost when the object is deactivated and removed from the ZODB memory cache.... It would lose the majority of counts for all the most frequently accessed counters.
This reminds me of a question I had: given that (from what I understand) _v_ attributes only live in the object cache of a given Zope, what happens with ZEO if a product uses _v_ attributes (with a timeout) as a cache to avoid going to an slower database ? Here I'm thinking specifically of LDAPUserFolder. Suppose a user hits ZEOclient1, which caches in _v_* his info. Then the same user hits ZEOclient2 and changes some info in LDAP. This new info is cached on ZEOclient2. When the user goes back to ZEOclient1, he will see old data. Or am I misunderstanding something ? My question really relates to any use of _v_ as a cache that can survive on publisher transaction, really. Should _v_ never be used like that ? Florent -- Florent Guillaume, Nuxeo (Paris, France) +33 1 40 33 79 87 http://nuxeo.com mailto:fg@nuxeo.com
Florent Guillaume wrote:
Or am I misunderstanding something ? My question really relates to any use of _v_ as a cache that can survive on publisher transaction, really. Should _v_ never be used like that ?
There's a case to be made for attributes that not persisted (like _v_ attributes) and are cleared at transaction boundaries. -- Steve Alexander
On Thu, 18 Apr 2002 16:23:15 +0000 (UTC), Florent Guillaume <fg@nuxeo.com> wrote:
This reminds me of a question I had: given that (from what I understand) _v_ attributes only live in the object cache of a given Zope,
True, and more accurate that I think you expected.... The issue is that one Zope has more than one ZODB object cache, even without ZEO. There is one per worker thread. Each cache has independant _v_ attributes
Here I'm thinking specifically of LDAPUserFolder. Suppose a user hits ZEOclient1, which caches in _v_* his info. Then the same user hits ZEOclient2 and changes some info in LDAP. This new info is cached on ZEOclient2. When the user goes back to ZEOclient1, he will see old data.
yes. Ive never looked at LDAPUserFolder so this may be irrelevant, but is it possible for LDAPUserFolder to validate that the cached _v_ information is still fresh? If that validation is quicker than fetching a new copy then this is still an overall win.
Should _v_ never be used like that ?
If data consistency is an absolute requirement, then you *have* to hit some shared storage on every transaction. Toby Dickenson tdickenson@geminidataloggers.com
Ive never looked at LDAPUserFolder so this may be irrelevant, but is it possible for LDAPUserFolder to validate that the cached _v_ information is still fresh? If that validation is quicker than fetching a new copy then this is still an overall win.
yes it does have a very rough way of validating the cache. there's a timeout on the cached objects. and yes, it's a *big* performance win. jens
Toby Dickenson <tdickenson@geminidataloggers.com> wrote:
This reminds me of a question I had: given that (from what I understand) _v_ attributes only live in the object cache of a given Zope,
True, and more accurate that I think you expected....
The issue is that one Zope has more than one ZODB object cache, even without ZEO. There is one per worker thread. Each cache has independant _v_ attributes
Ah, yes, I knew this in the back of my mind.
Ive never looked at LDAPUserFolder so this may be irrelevant, but is it possible for LDAPUserFolder to validate that the cached _v_ information is still fresh? If that validation is quicker than fetching a new copy then this is still an overall win.
I'm not sure there's a fast way for what LDAPUserFolder needs. Actually the LDAP server itself does have a cache, so things should be fast enough without caching in Zope. I'll have to try it.
If data consistency is an absolute requirement, then you *have* to hit some shared storage on every transaction.
Ok. I'll investigate clearing the _v_ caches at the end of the transaction, using the REQUEST._hold hack mentionned earlier. Anyway there are other problems with LDAP, seeing that there's no way to undo a transaction on error... Florent -- Florent Guillaume, Nuxeo (Paris, France) +33 1 40 33 79 87 http://nuxeo.com mailto:fg@nuxeo.com
On Thu, 18 Apr 2002 17:35:17 +0000 (UTC), Florent Guillaume <fg@nuxeo.com> wrote:
I'll investigate clearing the _v_ caches at the end of the transaction, using the REQUEST._hold hack mentionned earlier.
Below is the class I use for this. Just call attribute_cleaner(self,'_v_my_attribute') before assigning to _v_my_attribute, and it will be cleared at the end of the ZODB transaction. I wonder if this is generally useful enough to go in the ZODB distribution? class attribute_cleaner: def __init__(self,client,attr): self.client = client self.attr = attr get_transaction().register(self) def ClearCache(self,*args): try: delattr(self.client, self.attr) except AttributeError: pass except KeyError: pass tpc_finish = tpc_abort = abort = abort_sub = ClearCache def tpc_begin(self,transaction,subtransaction=None): pass def commit(self,object,transaction): pass def tpc_vote(self,transaction): pass def commit_sub(self,transaction): pass Toby Dickenson tdickenson@geminidataloggers.com
Paul Everitt wrote:
I don't agree that high write is always forbidden. I think there are plenty of cases where this can work. It simply becomes unworkable much sooner than other data systems (e.g. a relational database or FS-based solution).
I agree, but I loath to approve of any solution which demands a write for every read of an object.
For instance, think about bloat for a second. Let's be crazy and say it takes 100 bytes to store an integer representing a count. Let's say you write once a second. That's under 7Mb a day (per counter). Combined with hourly packing, that might be well within limits.
Yes, but the counter is not the only thing written. The whole containing object is written out to the storage. Now that doesn't include binary data (such as image and file data), but it does include any primitive data stored in the object's attributes (strings, lists, dicts, etc). Hourly packing seems like a blunderbus solution to the bloat problem. You can't tell me that won't kill performance...
Let's take the next step and say that you can live with a little volatility in the data. You write an object that caches ten seconds worth of writes. Whenever a write comes in at the over-ten-second mark, you write the _v_ attribute to the persistent attribute. There's an order of magnitude improvement.
Only if you run single threaded. For multi-threaded Zope apps (the default), you would need to use a transient object which introduces its own complexities.
Finally, you store all your counters in a non-versioned storage. Now you have *no* bloat problem. :^)
Right, the transient object or something else that writes to disk. Now you have to make sure the counters can be related to the object robustly. Bookkeeping... This is certainly a possibility, I would hesitate to argue for it on the notion it is less complex though.
Regarding performance, maybe his application isn't doing 50 requests/second and he'd be willing to trade the slight performance hit and bloat for a decrease in system complexity.
That could be a good trade, I just wanted to make sure the issues were known.
All of the above has downsides as well. My point, though, is that we shouldn't automatically dismiss the zodb as inappropriate for *all* high-write situations. In fact, with Andreas and Matt Hamilton's TextIndexNG, you might even be able to write to catalogued applications at a faster rate than one document per minute. :^)
Of course not, but the obvious and easiest solution (just incrementing a counter on the objects on every read) is probably not the best solution. -Casey
On Wed, 2002-04-17 at 11:44, Casey Duncan wrote:
Paul Everitt wrote:
I don't agree that high write is always forbidden. I think there are plenty of cases where this can work. It simply becomes unworkable much sooner than other data systems (e.g. a relational database or FS-based solution).
I agree, but I loath to approve of any solution which demands a write for every read of an object.
Even if the pertinent objects are only read once a minute? That's pretty severe.
For instance, think about bloat for a second. Let's be crazy and say it takes 100 bytes to store an integer representing a count. Let's say you write once a second. That's under 7Mb a day (per counter). Combined with hourly packing, that might be well within limits.
Yes, but the counter is not the only thing written. The whole containing object is written out to the storage. Now that doesn't include binary data (such as image and file data), but it does include any primitive data stored in the object's attributes (strings, lists, dicts, etc).
That's only if you do it as a property. It doesn't have to be done that way. Shane and I discussed a counter that existed as a central datastructure. Objects that were being counted would simply have methods to increment the count and display the count. This data structure would likely be some kind of tree, to avoid itself being completely written on every change.
Hourly packing seems like a blunderbus solution to the bloat problem. You can't tell me that won't kill performance...
Again, some people might not care if, once an hour, there is a 20 second performance penalty. The tradeoff might be worth. But I was being hypothetical here. It's better to get it to a once-a-day pack, which people should do anyway. Of course blunderbus is in the eye of the beholder. Writing a cron job to wake up every N seconds, scan a log, and update the count of pages seems a bit blunderbus-y to me as well. :^)
Let's take the next step and say that you can live with a little volatility in the data. You write an object that caches ten seconds worth of writes. Whenever a write comes in at the over-ten-second mark, you write the _v_ attribute to the persistent attribute. There's an order of magnitude improvement.
Only if you run single threaded. For multi-threaded Zope apps (the default), you would need to use a transient object which introduces its own complexities.
Correct. The ideal is a data structure built for this kind of problem. Fortunately this isn't unknown territory.
Finally, you store all your counters in a non-versioned storage. Now you have *no* bloat problem. :^)
Right, the transient object or something else that writes to disk. Now you have to make sure the counters can be related to the object robustly. Bookkeeping... This is certainly a possibility, I would hesitate to argue for it on the notion it is less complex though.
Hmm, I thought this was a fairly common pattern courtesy of the catalog. An object changes. Something else is told to update itself.
Regarding performance, maybe his application isn't doing 50 requests/second and he'd be willing to trade the slight performance hit and bloat for a decrease in system complexity.
That could be a good trade, I just wanted to make sure the issues were known.
Completely agreed. My disagreement is portraying the counter problem as impossible with the zodb. I think some people, as evidenced by some of the responses, are willing to live with the tradeoffs. Other people will find managing a log file on disk to be a more manageable solution.
All of the above has downsides as well. My point, though, is that we shouldn't automatically dismiss the zodb as inappropriate for *all* high-write situations. In fact, with Andreas and Matt Hamilton's TextIndexNG, you might even be able to write to catalogued applications at a faster rate than one document per minute. :^)
Of course not, but the obvious and easiest solution (just incrementing a counter on the objects on every read) is probably not the best solution.
If people can live within the limitations (e.g. they have a small number of infrequently-changing things to count), then it's unlikely to be much of a problem. All in all, an interesting discussion from which not much is likely to change, as _I'm_ certainly not going to implement what I describe. :^) --Paul
That's only if you do it as a property. It doesn't have to be done that way. Shane and I discussed a counter that existed as a central datastructure. Objects that were being counted would simply have methods to increment the count and display the count.
FWIW, this already mostly exists in Zope as the (tiny) BTrees.Length.Length class. It's a awfully nifty little piece of code. Anybody who is interested should read it and try to understand it because it's subtly mindbending and ingenious and it is a prime example of why we love Jim. ;-)
Completely agreed. My disagreement is portraying the counter problem as impossible with the zodb. I think some people, as evidenced by some of the responses, are willing to live with the tradeoffs. Other people will find managing a log file on disk to be a more manageable solution.
It would be best to make make a dual-mode undoing and nonundoing storage on a per-object basis. But a half step would be to make it easier to use mounted storages ala http://dev.zope.org/Wikis/DevSite/Proposals/StorageAndConnectionTypeRegistri es.
Chris McDonough wrote:
It would be best to make make a dual-mode undoing and nonundoing storage on a per-object basis.
...if anyone achieves this, I will have plenty of beer to send to them. Chris - please, pretty please :-)
"CM" == Chris McDonough <chrism@zope.com> writes:
Completely agreed. My disagreement is portraying the counter problem as impossible with the zodb. I think some people, as evidenced by some of the responses, are willing to live with the tradeoffs. Other people will find managing a log file on disk to be a more manageable solution.
CM> It would be best to make make a dual-mode undoing and nonundoing CM> storage on a per-object basis. I'd really like to do this for ZODB4, but it seems hard to get it into FileStorage, without adding automatic incremental packing to FileStorage. Example: Object A is marked as save enough revisions to do a single undo. When a transaction updates A and makes older revisions unnecessary, there's no obvious way to remove them without doing a pack. We could write a garbage collector that removed unneeded things (as opposed to packing everything to a particular date), but it doesn't seem very useful if it needs to be run manually. Also, how would you specifiy the object's packing policy? I'm thinking an _p_revision_control attribute or something like that. If the attribute exists on an object, it sets a particular policy for that object. Do individual transactions need to play in this game, too? I'm imagining a use case where an object is marked as "no revisions" but you want to be able to undo a particular transaction. I'm not sure if that means : - you can undo the transaction, but the "no revisions" object keeps its current state. - you can undo the transaction, and because the transaction is specially marked as undoable, there actually is a revision - you can't undo the transaction The first choice seems appropriate for a counter (I think), but I'm not sure if it makes sense for all possible revision-less objects. Jeremy
Jeremy Hylton wrote:
"CM" == Chris McDonough <chrism@zope.com> writes:
Completely agreed. My disagreement is portraying the counter problem as impossible with the zodb. I think some people, as evidenced by some of the responses, are willing to live with the tradeoffs. Other people will find managing a log file on disk to be a more manageable solution.
CM> It would be best to make make a dual-mode undoing and nonundoing CM> storage on a per-object basis.
I'd really like to do this for ZODB4, but it seems hard to get it into FileStorage, without adding automatic incremental packing to FileStorage.
Example: Object A is marked as save enough revisions to do a single undo. When a transaction updates A and makes older revisions unnecessary, there's no obvious way to remove them without doing a pack. We could write a garbage collector that removed unneeded things (as opposed to packing everything to a particular date), but it doesn't seem very useful if it needs to be run manually.
One idea I've been floating in my head is the idea of a "forked" storage, where some objects are stored in an undoable storage and others are stored in a non-undoable storage. I could try to explain it in English but pseudocode is easier: class ForkedStorage: def __init__(self, undoable_storage, non_undoable_storage): self.undoable = undoable_storage self.non_undoable = non_undoable_storage def store(self, oid, data, serial): if not serial or serial == '\0' * 8: # For new objects, choose a storage. want_undo = self.wantUndoableStorage(data) if want_undo: storage = self.undoable else: storage = self.non_undoable else: # For existing objects, use the storage chosen previously. if self.undoable.load(oid): storage = self.undoable else: storage = self.non_undoable storage.store(oid, data, serial) def load(self, oid): data, serial = self.undoable.load(oid) if not data: data, serial = self.non_undoable.load(oid) if not data: raise POSException, 'data not found' return data, serial def wantUndoableStorage(self, data): u = cpickle.Unpickler() module, name = u.loads(data) class_ = getattr(__import__(module), name) if getattr(class_, '_p_undoable', 1): return 1 else: return 0 Only a simple idea. :-)
Also, how would you specifiy the object's packing policy? I'm thinking an _p_revision_control attribute or something like that. If the attribute exists on an object, it sets a particular policy for that object.
Do individual transactions need to play in this game, too? I'm imagining a use case where an object is marked as "no revisions" but you want to be able to undo a particular transaction. I'm not sure if that means :
- you can undo the transaction, but the "no revisions" object keeps its current state.
- you can undo the transaction, and because the transaction is specially marked as undoable, there actually is a revision
- you can't undo the transaction
The first choice seems appropriate for a counter (I think), but I'm not sure if it makes sense for all possible revision-less objects.
The first choice also makes sense for a catalog. Here's another possible variation: transactions that involve *only* non-undoable objects are non-undoable; all other transactions are undoable and revert the revision of non-undoable objects as well. Shane
ForkedStorage, I like it simply for the coolness of the name. :^) But it sparked a different kind of idea, leveraging a pattern that might emerge in Zope 3. Let's say we had a queue in Zope. We could asynchronously send changes into the queue. Later, based on some policy (e.g. idle time, clock ticks, etc.), those changes would be enacted/committed. Imagine the queue itself is in a different storage, likely non-versioned. Imagine that the queue is processed every N seconds. It takes all the work to do and performs it, but in a subtransaction. Thus you might send the queue ten increments to a counter, but only one will be committed to the main storage. To make programmers have to think less about the queue (send in the object reference, the method to use, and the parameters), you could make it look like a special form of subtransactions. That is, you say: tm.beginQueuingTransactions() self.incrementCounter() self.title='Simple change' self.body = upload_file tm.endQueueingTransactions() At the transaction level, all enclosed changes are queued for later commit. You don't have to think any differently than rather object state management. This pattern applies better when you have a lot of document cataloging to be done. A separate process can wake up, make a ZEO connection, and process the queue. I don't think that indexing documents *has* to be a transactional part of every document save. Under this cron-style approach, you also pay less of a conflict-error penalty, as you can increase the backoff period. There's no web browser on the other end, impatiently waiting for their flaming logo. :^) Ahh well, fun to talk about. Maybe this time next year we can repeat the conversation. :^) --Paul Shane Hathaway wrote:
Jeremy Hylton wrote:
> "CM" == Chris McDonough <chrism@zope.com> writes:
Completely agreed. My disagreement is portraying the counter problem as impossible with the zodb. I think some people, as evidenced by some of the responses, are willing to live with the tradeoffs. Other people will find managing a log file on disk to be a more manageable solution.
CM> It would be best to make make a dual-mode undoing and nonundoing CM> storage on a per-object basis.
I'd really like to do this for ZODB4, but it seems hard to get it into FileStorage, without adding automatic incremental packing to FileStorage.
Example: Object A is marked as save enough revisions to do a single undo. When a transaction updates A and makes older revisions unnecessary, there's no obvious way to remove them without doing a pack. We could write a garbage collector that removed unneeded things (as opposed to packing everything to a particular date), but it doesn't seem very useful if it needs to be run manually.
One idea I've been floating in my head is the idea of a "forked" storage, where some objects are stored in an undoable storage and others are stored in a non-undoable storage. I could try to explain it in English but pseudocode is easier:
class ForkedStorage:
def __init__(self, undoable_storage, non_undoable_storage): self.undoable = undoable_storage self.non_undoable = non_undoable_storage
def store(self, oid, data, serial): if not serial or serial == '\0' * 8: # For new objects, choose a storage. want_undo = self.wantUndoableStorage(data) if want_undo: storage = self.undoable else: storage = self.non_undoable else: # For existing objects, use the storage chosen previously. if self.undoable.load(oid): storage = self.undoable else: storage = self.non_undoable storage.store(oid, data, serial)
def load(self, oid): data, serial = self.undoable.load(oid) if not data: data, serial = self.non_undoable.load(oid) if not data: raise POSException, 'data not found' return data, serial
def wantUndoableStorage(self, data): u = cpickle.Unpickler() module, name = u.loads(data) class_ = getattr(__import__(module), name) if getattr(class_, '_p_undoable', 1): return 1 else: return 0
Only a simple idea. :-)
Also, how would you specifiy the object's packing policy? I'm thinking an _p_revision_control attribute or something like that. If the attribute exists on an object, it sets a particular policy for that object.
Do individual transactions need to play in this game, too? I'm
imagining a use case where an object is marked as "no revisions" but you want to be able to undo a particular transaction. I'm not sure if that means :
- you can undo the transaction, but the "no revisions" object keeps its current state.
- you can undo the transaction, and because the transaction is specially marked as undoable, there actually is a revision
- you can't undo the transaction
The first choice seems appropriate for a counter (I think), but I'm not sure if it makes sense for all possible revision-less objects.
The first choice also makes sense for a catalog. Here's another possible variation: transactions that involve *only* non-undoable objects are non-undoable; all other transactions are undoable and revert the revision of non-undoable objects as well.
Shane
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Paul Everitt wrote:
Let's say we had a queue in Zope. We could asynchronously send changes into the queue. Later, based on some policy (e.g. idle time, clock ticks, etc.), those changes would be enacted/committed.
Imagine the queue itself is in a different storage, likely non-versioned. Imagine that the queue is processed every N seconds. It takes all the work to do and performs it, but in a subtransaction.
Thus you might send the queue ten increments to a counter, but only one will be committed to the main storage.
To make programmers have to think less about the queue (send in the object reference, the method to use, and the parameters), you could make it look like a special form of subtransactions. That is, you say:
tm.beginQueuingTransactions() self.incrementCounter() self.title='Simple change' self.body = upload_file tm.endQueueingTransactions()
At the transaction level, all enclosed changes are queued for later commit. You don't have to think any differently than rather object state management.
Wow, on the surface, that would be very easy to do. Transaction.register() might dump to a long-lived queue instead of the single-transaction queue.
This pattern applies better when you have a lot of document cataloging to be done. A separate process can wake up, make a ZEO connection, and process the queue. I don't think that indexing documents *has* to be a transactional part of every document save.
Right. Here's another way to think about it: we could use a catalog lookalike which, instead of updating indexes directly, asks a special ZEO client to perform the reindexing. The special client might decide to batch updates.
Under this cron-style approach, you also pay less of a conflict-error penalty, as you can increase the backoff period. There's no web browser on the other end, impatiently waiting for their flaming logo. :^)
A variant on your idea is that when the transaction is finishing, if there are any regular objects to commit, the long-lived queue gets committed too. That would be beneficial for counters, logs, and objects like Python Scripts which have to cache the compiled code in ZODB, but not as beneficial for catalogs. Ok, thinking further... how about a Zope object called a "peer delegate" which can act like other Zope objects, but which actually calls out to another ZEO client to do the work? It could be very interesting... it might use some standard RPC or RMI mechanism. We would want to be careful to make it simple.
Ahh well, fun to talk about. Maybe this time next year we can repeat the conversation. :^)
I hope we'll be talking about what we did instead of what we'll do. :-) The change to transactions seems simple. Another thought: the long-lived queue might be committed only when there are regular objects to commit *and* a certain amount of time has passed since the last commit of the long-lived queue. That might work well for catalogs. Cool! Shane
Shane Hathaway writes:
.... The change to transactions seems simple. Another thought: the long-lived queue might be committed only when there are regular objects to commit *and* a certain amount of time has passed since the last commit of the long-lived queue. That might work well for catalogs. Cool! Of course, you are aware that this decoupling of activities significantly increases the probability to observe inconsistencies:
* added or modified documents not (yet) found by a catalog search * real inconsistencies due to restarts/failures The first example, above, can be tackled by documentation and education. For the second, cleanup operations that either go back to a previous consistent state or finish what was on the long lived queue. Not that easy, I fear. At least, I calls for a formal project... Dieter
On Fri, 19 Apr 2002 07:54:42 -0400, Paul Everitt <paul@zope.com> wrote:
This pattern applies better when you have a lot of document cataloging to be done. A separate process can wake up, make a ZEO connection, and process the queue. I don't think that indexing documents *has* to be a transactional part of every document save.
Ive used something similar to that in a previous project that didnt get beyond prototype stage.
Under this cron-style approach, you also pay less of a conflict-error penalty, as you can increase the backoff period.
You dont need a 'backoff period' as such; you just move any jobs that have suffered a conflict further back in the work queue. In some cases you can almost eliminate ConflictErrors by making the background process single-threaded. Toby Dickenson tdickenson@geminidataloggers.com
Yo, I have been following this thread for quite some time now, and call me stupid if you must, but why don't you just keep the data in the session and write it all out when the session gets cleaned up? For the original problem (keeping statistics of site usage) this will be more than enough. I did a webmining project using this in 2000 (ok, it was jsp and not zope, but the approach is still valid, moreover since from 2.5? onwards, you have a built in SESSION object you can use) have fun, Sloot.
This is a pretty good idea... the default RAM-based storage that is used for sessions (TemporaryStorage) tries hard to resist conflicts. It is also nonundoing and does its own reference counting, so it needn't be packed unless there it contains cyclic datastructures (there is no UI to pack the default mounted storage anyway, so the problem is kind of moot). The TransientObject code (the SESSION object is an instance of a TransientObject) can make use of ZODB conflict resolution in many cases. However, conflicts are still a problem with TemporaryStorage because it is a ZODB Storage implementation and it uses the same "optimistic concurrency control" as does FileStorage et. al. But I imagine for most applications, the "out of the box" configuration would work just fine for things like counters and whatnot. Someone could probably implement a limited-functionality session data storage that did not rely on ZODB or any other database that might be even better for this kind of thing. Romain Slootmaekers wrote:
Yo,
I have been following this thread for quite some time now, and call me stupid if you must, but why don't you just keep the data in the session and write it all out when the session gets cleaned up?
For the original problem (keeping statistics of site usage) this will be more than enough.
I did a webmining project using this in 2000 (ok, it was jsp and not zope, but the approach is still valid, moreover since from 2.5? onwards, you have a built in SESSION object you can use)
have fun,
Sloot.
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
-- Chris McDonough Zope Corporation http://www.zope.org http://www.zope.com "Killing hundreds of birds with thousands of stones"
Jeremy Hylton wrote:
"CM" == Chris McDonough <chrism@zope.com> writes:
Completely agreed. My disagreement is portraying the counter problem as impossible with the zodb. I think some people, as evidenced by some of the responses, are willing to live with the tradeoffs. Other people will find managing a log file on disk to be a more manageable solution.
CM> It would be best to make make a dual-mode undoing and nonundoing CM> storage on a per-object basis.
I'd really like to do this for ZODB4, but it seems hard to get it into FileStorage, without adding automatic incremental packing to FileStorage.
This might be possible without incremental packing, if the object will be of a fixed size. I'm thinking of a simple counter here, something like: class Counter(object): __slots__ = ['__count'] def __init__(self): self.__count = 0 def increment(self): self.__count += 1 def getValue(self): return self.__count Now, imagine that Counter was somehow Persistent too. (There would need to be a few more _p_... declarations in __slots__, and possibly some changes in the persistence machinery to allow for slots based instances as well as __dict__ based ones.) I would naively expect a pickle of Counter instance to always remain the same size. Therefore, it could be updated in-place. Of course, this would break various other nice behaviours of FileStorage. Another variation on the same theme: have a fixed-size "external reference" instead of the object's pickle. The fixed-size reference points to a separate some_object.pickle file which contains the pickle for that one object. The some_object.pickle file gets overwritten on each update. -- Steve Alexander
On Wed, 17 Apr 2002 23:01:04 -0400, "Chris McDonough" <chrism@zope.com> wrote:
It would be best to make make a dual-mode undoing and nonundoing storage on a per-object basis. But a half step would be to make it easier to use mounted storages ala http://dev.zope.org/Wikis/DevSite/Proposals/StorageAndConnectionTypeRegistri es.
A dual mode storage, or simply dual storages? Storing counter objects *only* in a non-undo storage would be more pleasant if ZODB supported cross-storage object references. Toby Dickenson tdickenson@geminidataloggers.com
A dual mode storage, or simply dual storages?
The former as a long-term goal, the latter as a short-term goal. The proposal I mentioned would make it easier to build tools that allow you to mount storages.
Storing counter objects *only* in a non-undo storage would be more pleasant if ZODB supported cross-storage object references.
Yup. I don't think this is anywhere on the radar, though... -- Chris McDonough Zope Corporation http://www.zope.org http://www.zope.com "Killing hundreds of birds with thousands of stones"
Chris McDonough wrote:
Storing counter objects *only* in a non-undo storage would be more pleasant if ZODB supported cross-storage object references.
Yup. I don't think this is anywhere on the radar, though...
How hard would they be to add? cheers, Chris
On Fri, 19 Apr 2002 08:18:47 -0400, Chris McDonough <chrism@zope.com> wrote:
Storing counter objects *only* in a non-undo storage would be more pleasant if ZODB supported cross-storage object references.
Yup. I don't think this is anywhere on the radar, though...
Hmmmm. cross-storage 'symbolic links' would help too. I think we could implement that using the same trickery as mounted storages. Toby Dickenson tdickenson@geminidataloggers.com
participants (14)
-
Casey Duncan -
Chris McDonough -
Chris Withers -
Dieter Maurer -
Eric Roby -
Florent Guillaume -
Ivo van der Wijk -
Jens Vagelpohl -
jeremy@zope.com -
Paul Everitt -
Romain Slootmaekers -
Shane Hathaway -
Steve Alexander -
Toby Dickenson