Re: [Zope-dev] Bulletproof ZCatalog proposal
On Thursday 07 June 2001 12:17, Phillip J. Eby wrote:
At 09:34 AM 6/7/01 -0400, Shane Hathaway wrote:
One thing I didn't make clear in the proposal is that I'm interested in repurposing ZCatalog as a general ZODB indexing mechanism and essentially moving it down from the application layer to the database layer. Catalogs would be updated automatically (in a lazy fashion to avoid performance penalties). There would potentially be a large number of catalogs, though, so the number of conflicts would increase significantly.
But I think I have a solution for all of the issues in conflict resolution. If you, or anyone else, is also interested in this, please show support (if only by saying "please do this"!) :-) I can't work on it unless I have a reason to.
Sounds cool. At one time Ty and I thought we might do something like this by adding to a linked list of "pending updates" which the catalog would use to alter its search results, until the list got "too long", at which point it would post the actual updates.
That sounds like what I have in mind.
The only catch was that this would still produce conflicts at the head end of the linked list. :( Of course, that was in the days before ZODB conflict resolution. Nowadays, you could probably implement it as a simple sequence object with the conflict resolution method implemented. But then there'd be the question of how to resolve conflicts if more than one thread or ZEO client decided to apply the queued changes... a 100 conflicts vs 1 situation. Ugh.
I was thinking the queue would only last the duration of a transaction and that the queue would be thread-specific.
Anyway, I'd be really interested in seeing "a solution for all the issues in conflict resolution".
I was thinking that certain types of objects would be committed by the transaction manager before all others. In this case, the catalog (or a special object in the catalog) would be committed first. It would resolve all conflicts in the contained indices before they occur by replaying the changes in the persisted queues from the transaction history, then setting the _p_serial attributes to convince the storage that the conflicts have already been resolved.
Will it help with arguments, too? How about world peace? ;)
Okay, maybe we haven't solved that. :-) Shane
At 02:07 PM 6/7/01 -0400, Shane Hathaway wrote:
On Thursday 07 June 2001 12:17, Phillip J. Eby wrote:
The only catch was that this would still produce conflicts at the head end of the linked list. :( Of course, that was in the days before ZODB conflict resolution. Nowadays, you could probably implement it as a simple sequence object with the conflict resolution method implemented. But then there'd be the question of how to resolve conflicts if more than one thread or ZEO client decided to apply the queued changes... a 100 conflicts vs 1 situation. Ugh.
I was thinking the queue would only last the duration of a transaction and that the queue would be thread-specific.
That's not hard to do - you could just toss an object into the transaction queue to do it during transaction commit, similar to what ZPatterns does in a more general way. (Note that ZPatterns users can already get the benefits of deferring catalog changes until the end of a (sub)transaction, by using triggers to implement automatic cataloging).
Anyway, I'd be really interested in seeing "a solution for all the issues in conflict resolution".
I was thinking that certain types of objects would be committed by the transaction manager before all others. In this case, the catalog (or a special object in the catalog) would be committed first. It would resolve all conflicts in the contained indices before they occur by replaying the changes in the persisted queues from the transaction history, then setting the _p_serial attributes to convince the storage that the conflicts have already been resolved.
Hm. Sounds to me like what you actually want is for the transaction manager to do this *after* everything else, rather than before. Thus, you would catch any changes which occur *during* transaction commit - such as commit-phase cataloging (as some folks do with ZPatterns currently). That is, in ZPatterns one can specify triggers such as: WHEN OBJECT DELETED, CHANGED CALL someCatalog.manage_uncatalog(self.absolute_url(1)) WHEN OBJECT ADDED, CHANGED CALL someCatalog.manage_catalog(self,self.absolute_url(1)) Which will be executed during transaction commit if the conditions apply. If you have the catalog process its queue *before* other objects commit, it will not have received these calls yet. This will end up making the catalog try to commit itself a second time, which will fail with a conflict error - 100% of the time. (And retry won't help, because the transaction is conflicting with itself.) Of course, this issue could be fixed by minor adjustment to the ZODB API and implementation such that it is permissible to store() an object more than once during a transaction's commit phase without creating a ConflictError.
"Phillip J. Eby" wrote:
That is, in ZPatterns one can specify triggers such as:
WHEN OBJECT DELETED, CHANGED CALL someCatalog.manage_uncatalog(self.absolute_url(1)) WHEN OBJECT ADDED, CHANGED CALL someCatalog.manage_catalog(self,self.absolute_url(1))
After I read this again I realized what you were saying. This capability of ZPatterns is very brittle, don't you think? If the catalog is updated manually before the special ZPatterns object is added to the queue, the behavior is undefined AFAIK--either the later changes to the catalog will be ignored, will cause a conflict, or some objects will be written twice in the same transaction. However, if we could specify transaction commit priorities, and the ZPatterns update came first, auto-indexing came second, and everything else followed, I think it would work. Or perhaps ZPatterns should be able to register things that occur *before* the two-phase commit protocol. Shane
At 07:07 PM 6/7/01 -0400, Shane Hathaway wrote:
"Phillip J. Eby" wrote:
That is, in ZPatterns one can specify triggers such as:
WHEN OBJECT DELETED, CHANGED CALL someCatalog.manage_uncatalog(self.absolute_url(1)) WHEN OBJECT ADDED, CHANGED CALL someCatalog.manage_catalog(self,self.absolute_url(1))
After I read this again I realized what you were saying. This capability of ZPatterns is very brittle, don't you think?
Yep. That's why I've previously described ZPatterns as a "hack". :)
If the catalog is updated manually before the special ZPatterns object is added to the queue, the behavior is undefined AFAIK--either the later changes to the catalog will be ignored, will cause a conflict, or some objects will be written twice in the same transaction.
True. But this behavior is avoidable through the use of subtransaction commits, in the event that someone has to have transactions which update a ZCatalog directly. Usually, when someone is using catalogs with ZPatterns, they use triggers to do all the updates and don't touch the catalog manually. Note that I'm not saying this still isn't a hack. But it's the best I could do without either fixing the multi-commit issue in ZODB, or with some kind of priority scheme.
However, if we could specify transaction commit priorities, and the ZPatterns update came first, auto-indexing came second, and everything else followed, I think it would work. Or perhaps ZPatterns should be able to register things that occur *before* the two-phase commit protocol.
Yep. One of the last two times I spoke with Jim in person (either the January DC visit or IPC 8, I forget which), he said something about it maybe being a good idea to have some kind of priority system like that. I'd love to see something like it exist, if it would make some of ZPatterns' hackery unnecessary. The implementation could consist of two subscription queues: ruleAgents and indexingAgents. ZCatalog would register in indexingAgents, and ZPatterns objects would register in one or the other, usually ruleAgents. (I can think of some circumstances where it would be nice to use the indexingAgents queue, but right now ZPatterns apps have to work around this by defining their rules in execution priority order.) Upon being told to perform a transaction or subtransaction commit, the transaction would notify all the ruleAgents, and then all the indexingAgents. Objects could still subscribe to either queue while this notifying is taking place. (So that triggered actions could cause indexish objects to register as indexingAgents, not to mention causing updated objects to fire additional triggers.) Once all agents in a queue are notified, that queue should be cleared so that notifications are on a per-subtransaction basis. Once both queues are cleared, normal transaction behavior goes forward. Hm. That's simpler than I thought it was going to be. Shoot, I can even see how to implement it as a runtime patch, that would've been simpler than all the shenanigans ZPatterns goes through to fake out the transaction machinery... and it's a better implementation. Ah well. At the time I wanted to avoid trying to convince Jim to add machinery to transactions "just" for ZPatterns, given that ZPatterns wasn't a particularly established product at the time. Let me know if you guys put something like this in, though, and I'll definitely look at reworking ZPatterns to use the mechanism. It could potentially cut a lot of code out and improve the robustness at the same time.
participants (2)
-
Phillip J. Eby -
Shane Hathaway