Re: [Zope-dev] how bad are per-request-write-transactions

17 Apr 2002

      On Wed, 2002-04-17 at 11:44, Casey Duncan wrote:
...
Paul Everitt wrote:
...
I don't agree that high write is always forbidden.  I think there are 
plenty of cases where this can work.  It simply becomes unworkable much 
sooner than other data systems (e.g. a relational database or FS-based 
solution).
I agree, but I loath to approve of any solution which demands a write 
for every read of an object.
Even if the pertinent objects are only read once a minute?  That's
pretty severe.
...
...
For instance, think about bloat for a second.  Let's be crazy and say it 
takes 100 bytes to store an integer representing a count.  Let's say you 
write once a second.  That's under 7Mb a day (per counter).  Combined 
with hourly packing, that might be well within limits.
Yes, but the counter is not the only thing written. The whole containing 
object is written out to the storage. Now that doesn't include binary 
data (such as image and file data), but it does include any primitive 
data stored in the object's attributes (strings, lists, dicts, etc).
That's only if you do it as a property.  It doesn't have to be done that
way.  Shane and I discussed a counter that existed as a central
datastructure.  Objects that were being counted would simply have
methods to increment the count and display the count.

This data structure would likely be some kind of tree, to avoid itself
being completely written on every change.
...
Hourly packing seems like a blunderbus solution to the bloat problem. 
You can't tell me that won't kill performance...
Again, some people might not care if, once an hour, there is a 20 second
performance penalty.  The tradeoff might be worth.  But I was being
hypothetical here.  It's better to get it to a once-a-day pack, which
people should do anyway.

Of course blunderbus is in the eye of the beholder.  Writing a cron job
to wake up every N seconds, scan a log, and update the count of pages
seems a bit blunderbus-y to me as well. :^)
...
...
Let's take the next step and say that you can live with a little 
volatility in the data.  You write an object that caches ten seconds 
worth of writes.  Whenever a write comes in at the over-ten-second mark, 
you write the _v_ attribute to the persistent attribute.  There's an 
order of magnitude improvement.
Only if you run single threaded. For multi-threaded Zope apps (the 
default), you would need to use a transient object which introduces its 
own complexities.
Correct.  The ideal is a data structure built for this kind of problem. 
Fortunately this isn't unknown territory.
...
...
Finally, you store all your counters in a non-versioned storage.  Now 
you have *no* bloat problem. :^)
Right, the transient object or something else that writes to disk. Now 
you have to make sure the counters can be related to the object 
robustly. Bookkeeping... This is certainly a possibility, I would 
hesitate to argue for it on the notion it is less complex though.
Hmm, I thought this was a fairly common pattern courtesy of the
catalog.  An object changes.  Something else is told to update itself.
...
...
Regarding performance, maybe his application isn't doing 50 
requests/second and he'd be willing to trade the slight performance hit 
and bloat for a decrease in system complexity.
That could be a good trade, I just wanted to make sure the issues were 
known.
Completely agreed.  My disagreement is portraying the counter problem as
impossible with the zodb.  I think some people, as evidenced by some of
the responses, are willing to live with the tradeoffs.  Other people
will find managing a log file on disk to be a more manageable solution.
...
...
All of the above has downsides as well.  My point, though, is that we 
shouldn't automatically dismiss the zodb as inappropriate for *all* 
high-write situations.  In fact, with Andreas and Matt Hamilton's 
TextIndexNG, you might even be able to write to catalogued applications 
at a faster rate than one document per minute. :^)
Of course not, but the obvious and easiest solution (just incrementing a 
counter on the objects on every read) is probably not the best solution.
If people can live within the limitations (e.g. they have a small number
of infrequently-changing things to count), then it's unlikely to be much
of a problem.

All in all, an interesting discussion from which not much is likely to
change, as _I'm_ certainly not going to implement what I describe. :^)

--Paul