[Zope-Checkins] CVS: Zope/lib/python/Products/ZCTextIndex - BaseIndex.py:1.28.58.1
Casey Duncan
casey@zope.com
Tue, 4 Feb 2003 13:01:28 -0500
Update of /cvs-repository/Zope/lib/python/Products/ZCTextIndex
In directory cvs.zope.org:/tmp/cvs-serv29054
Modified Files:
Tag: casey-zctextindex-optimize-branch
BaseIndex.py
Log Message:
Optimize unindex_doc method. Previous implementation exhibited pathological performance characteristics as index size increased. This was basically due to a len(btree) call inside of an inner loop.
This change has a small tradeoff as it eliminates a pickle size optimization where btrees are mutated to dicts when they are resized below a certain threshold. This inverse mutation still occurs at index time, however and there the benefits are clearer.
=== Zope/lib/python/Products/ZCTextIndex/BaseIndex.py 1.28 => 1.28.58.1 ===
--- Zope/lib/python/Products/ZCTextIndex/BaseIndex.py:1.28 Wed Aug 14 18:25:14 2002
+++ Zope/lib/python/Products/ZCTextIndex/BaseIndex.py Tue Feb 4 13:01:26 2003
@@ -286,17 +286,11 @@
def _del_wordinfo(self, wid, docid):
doc2score = self._wordinfo[wid]
del doc2score[docid]
- numdocs = len(doc2score)
- if numdocs == 0:
+ if doc2score:
+ self._wordinfo[wid] = doc2score # not redundant: Persistency!
+ else:
del self._wordinfo[wid]
self.length.change(-1)
- return
- if numdocs == self.DICT_CUTOFF:
- new = {}
- for k, v in doc2score.items():
- new[k] = v
- doc2score = new
- self._wordinfo[wid] = doc2score # not redundant: Persistency!
def inverse_doc_frequency(term_count, num_items):
"""Return the inverse doc frequency for a term,