On Wed, 2002-04-17 at 11:44, Casey Duncan wrote:
Paul Everitt wrote:
I don't agree that high write is always forbidden. I think there are plenty of cases where this can work. It simply becomes unworkable much sooner than other data systems (e.g. a relational database or FS-based solution).
I agree, but I loath to approve of any solution which demands a write for every read of an object.
Even if the pertinent objects are only read once a minute? That's pretty severe.
For instance, think about bloat for a second. Let's be crazy and say it takes 100 bytes to store an integer representing a count. Let's say you write once a second. That's under 7Mb a day (per counter). Combined with hourly packing, that might be well within limits.
Yes, but the counter is not the only thing written. The whole containing object is written out to the storage. Now that doesn't include binary data (such as image and file data), but it does include any primitive data stored in the object's attributes (strings, lists, dicts, etc).
That's only if you do it as a property. It doesn't have to be done that way. Shane and I discussed a counter that existed as a central datastructure. Objects that were being counted would simply have methods to increment the count and display the count. This data structure would likely be some kind of tree, to avoid itself being completely written on every change.
Hourly packing seems like a blunderbus solution to the bloat problem. You can't tell me that won't kill performance...
Again, some people might not care if, once an hour, there is a 20 second performance penalty. The tradeoff might be worth. But I was being hypothetical here. It's better to get it to a once-a-day pack, which people should do anyway. Of course blunderbus is in the eye of the beholder. Writing a cron job to wake up every N seconds, scan a log, and update the count of pages seems a bit blunderbus-y to me as well. :^)
Let's take the next step and say that you can live with a little volatility in the data. You write an object that caches ten seconds worth of writes. Whenever a write comes in at the over-ten-second mark, you write the _v_ attribute to the persistent attribute. There's an order of magnitude improvement.
Only if you run single threaded. For multi-threaded Zope apps (the default), you would need to use a transient object which introduces its own complexities.
Correct. The ideal is a data structure built for this kind of problem. Fortunately this isn't unknown territory.
Finally, you store all your counters in a non-versioned storage. Now you have *no* bloat problem. :^)
Right, the transient object or something else that writes to disk. Now you have to make sure the counters can be related to the object robustly. Bookkeeping... This is certainly a possibility, I would hesitate to argue for it on the notion it is less complex though.
Hmm, I thought this was a fairly common pattern courtesy of the catalog. An object changes. Something else is told to update itself.
Regarding performance, maybe his application isn't doing 50 requests/second and he'd be willing to trade the slight performance hit and bloat for a decrease in system complexity.
That could be a good trade, I just wanted to make sure the issues were known.
Completely agreed. My disagreement is portraying the counter problem as impossible with the zodb. I think some people, as evidenced by some of the responses, are willing to live with the tradeoffs. Other people will find managing a log file on disk to be a more manageable solution.
All of the above has downsides as well. My point, though, is that we shouldn't automatically dismiss the zodb as inappropriate for *all* high-write situations. In fact, with Andreas and Matt Hamilton's TextIndexNG, you might even be able to write to catalogued applications at a faster rate than one document per minute. :^)
Of course not, but the obvious and easiest solution (just incrementing a counter on the objects on every read) is probably not the best solution.
If people can live within the limitations (e.g. they have a small number of infrequently-changing things to count), then it's unlikely to be much of a problem. All in all, an interesting discussion from which not much is likely to change, as _I'm_ certainly not going to implement what I describe. :^) --Paul