[ZODB-Dev] Re: self.length._p_deactivate() and MVCC
Jim Fulton
jim at zope.com
Fri Apr 30 10:05:38 EDT 2004
Casey Duncan wrote:
> On Fri, 30 Apr 2004 09:21:06 -0400
> Jim Fulton <jim at zope.com> wrote:
>
>
>>Casey Duncan wrote:
>>
>>...
>>
>>
>>>I wrote the comment based on speculative semantics for MVCC. It
>>>looks like it will still work as intended with MVCC as it is now
>>>implemented, so I will remove the comment.
>>
>>OK, I'll bite. What is the point of this code? Is it meant to be
>>an optimization? Length write conflicts can always resolved.
>
>
> We discussed this about a year ago. It tries to reduce write conflict
> errors when assigning new wids. The length of the Lexicon is used to
> find the first candidate wid. Wids are assigned in ascending order to
> allow the document word lists to be compressed better, I think it
> assumes popular words will tend to get lower wids. The word lists are
> used for unindexing and phrase matching.
>
> In order to pick the next wid, it deactivates the length and then reads
> it again. It increments it until it finds an unused wid in the wid=>word
> btree. The idea is that if another concurrent transaction was indexing
> and adding words at the same time and commited before this point, we
> could read its last wid value and carry on from there, rather than
> picking the same starting wid that it used and getting a write conflict
> (in the btree).
>
> It was effective in eliminating write conflicts in some tests I wrote,
> so I included it. Its practical value is probably lessened by the fact
> that once a large enough corpus of documents is indexed, few words are
> added to the lexicon as new ones are indexed, at least in common usage.
OK. I didn't realize when this thread started that "wids" were word ids.
I still wonder how effective this is in practice. If wouldn't expect
new words to be frequent in a mature corpus. If there are a lot of
conflicts, this technique won't prevent all of them, but, I can see that
it could reduce them.
It's a shame we need to employ such tricks here, but scalability often makes us
do things like this. A fuller comment describing what's going on is in order.
(These are hard to write when you first implement something like this, because,
at that time, you aren't objective.)
Jim
--
Jim Fulton mailto:jim at zope.com Python Powered!
CTO (540) 361-1714 http://www.python.org
Zope Corporation http://www.zope.com http://www.zope.org
More information about the ZODB-Dev
mailing list