[Zope] patch: "FullTextIndex" fix for ZCatalog weirdness
Michael Halle
halazar@media.mit.edu
Tue, 21 Sep 1999 02:16:14 -0400
The fact that ZCatalog currently does unexpected things with query
words that are "stop words" was a show-stopper for my application.
While discussion on the mailing list included possible big-picture
fixes, I chose a simple quick solution to the problem. I added a new
Index type, "FullTextIndex," to augment the existing "TextIndex" and
"FieldIndex".
Using "FullTextIndex", the stop word dictionary is set to {}, forcing
a full text catalog to be built. While noise words will cause inflated
indices and spurious searches, at least the results are intuitive.
"FullTextIndex" makes sense for titles and keyword fields; it doesn't
make sense for long documents.
Diffs from Zope-2.0.0 follow. The patch also includes the fix to
sort-order posted to this list. Apply with patch -p0 from the top level
Zope directory.
Remember kids, this isn't an official patch!
Michael Halle
mhalle@media.mit.edu
-------------------------------------------------------------------------------
*** lib/python/Products/ZCatalog/ZCatalog.py.~1~ Wed Sep 1 14:28:58 1999
--- lib/python/Products/ZCatalog/ZCatalog.py Tue Sep 21 01:05:46 1999
***************
*** 195,201 ****
self._catalog.addIndex('id', 'FieldIndex')
self._catalog.addColumn('title')
! self._catalog.addIndex('title', 'TextIndex')
self._catalog.addColumn('meta_type')
self._catalog.addIndex('meta_type', 'FieldIndex')
--- 195,201 ----
self._catalog.addIndex('id', 'FieldIndex')
self._catalog.addColumn('title')
! self._catalog.addIndex('title', 'FullTextIndex')
self._catalog.addColumn('meta_type')
self._catalog.addIndex('meta_type', 'FieldIndex')
*** lib/python/Products/ZCatalog/Catalog.py.~1~ Wed Sep 1 11:40:24 1999
--- lib/python/Products/ZCatalog/Catalog.py Tue Sep 21 01:09:35 1999
***************
*** 253,258 ****
--- 253,260 ----
indexes[name] = UnIndex.UnIndex(name)
elif type == 'TextIndex':
indexes[name] = UnTextIndex.UnTextIndex(name)
+ elif type == 'FullTextIndex':
+ indexes[name] = UnTextIndex.UnTextIndex(name, stop_word_dict={})
self.indexes = indexes
***************
*** 408,414 ****
rs=data.items()
append(LazyMap(self.instantiate, rs))
else:
! for k, intset in sort_index.items():
append((k,LazyMap(self.__getitem__, intset)))
elif rs:
if sort_index is None:
--- 410,416 ----
rs=data.items()
append(LazyMap(self.instantiate, rs))
else:
! for k, intset in sort_index._index.items():
append((k,LazyMap(self.__getitem__, intset)))
elif rs:
if sort_index is None:
*** lib/python/Products/ZCatalog/catalogIndexes.dtml.~1~ Thu Aug 26 10:20:43 1999
--- lib/python/Products/ZCatalog/catalogIndexes.dtml Tue Sep 21 00:56:43 1999
***************
*** 29,34 ****
--- 29,35 ----
of Index Type: <select name="type">
<option value="TextIndex">TextIndex</option>
+ <option value="FullTextIndex">FullTextIndex</option>
<option value="FieldIndex">FieldIndex</options>
</select>
<input name="manage_addIndex:method" type=submit value=" Add ">
106c106,107
< def __init__(self, id=None, ignore_ex=None, call_methods=None):
---
> def __init__(self, id=None, ignore_ex=None, call_methods=None,
> stop_word_dict=None):
119a121,122
> 'stop_word_dict' -- An dictionary of stop words.
>
121c124
< if not id==ignore_ex==call_methods==None:
---
> if not my_stop_word_dict==id==ignore_ex==call_methods==None:
127,128c130,133
< self._syn=stop_word_dict
<
---
> if my_stop_word_dict is None:
> self._syn=default_stop_word_dict
> else:
> self._syn=stop_word_dict
132d136
<
630,631c634,635
< stop_word_dict={}
< for word in stop_words: stop_word_dict[word]=None
---
> default_stop_word_dict={}
> for word in stop_words: default_stop_word_dict[word]=None