patch: "FullTextIndex" fix for ZCatalog weirdness
The fact that ZCatalog currently does unexpected things with query words that are "stop words" was a show-stopper for my application. While discussion on the mailing list included possible big-picture fixes, I chose a simple quick solution to the problem. I added a new Index type, "FullTextIndex," to augment the existing "TextIndex" and "FieldIndex". Using "FullTextIndex", the stop word dictionary is set to {}, forcing a full text catalog to be built. While noise words will cause inflated indices and spurious searches, at least the results are intuitive. "FullTextIndex" makes sense for titles and keyword fields; it doesn't make sense for long documents. Diffs from Zope-2.0.0 follow. The patch also includes the fix to sort-order posted to this list. Apply with patch -p0 from the top level Zope directory. Remember kids, this isn't an official patch! Michael Halle mhalle@media.mit.edu ------------------------------------------------------------------------------- *** lib/python/Products/ZCatalog/ZCatalog.py.~1~ Wed Sep 1 14:28:58 1999 --- lib/python/Products/ZCatalog/ZCatalog.py Tue Sep 21 01:05:46 1999 *************** *** 195,201 **** self._catalog.addIndex('id', 'FieldIndex') self._catalog.addColumn('title') ! self._catalog.addIndex('title', 'TextIndex') self._catalog.addColumn('meta_type') self._catalog.addIndex('meta_type', 'FieldIndex') --- 195,201 ---- self._catalog.addIndex('id', 'FieldIndex') self._catalog.addColumn('title') ! self._catalog.addIndex('title', 'FullTextIndex') self._catalog.addColumn('meta_type') self._catalog.addIndex('meta_type', 'FieldIndex') *** lib/python/Products/ZCatalog/Catalog.py.~1~ Wed Sep 1 11:40:24 1999 --- lib/python/Products/ZCatalog/Catalog.py Tue Sep 21 01:09:35 1999 *************** *** 253,258 **** --- 253,260 ---- indexes[name] = UnIndex.UnIndex(name) elif type == 'TextIndex': indexes[name] = UnTextIndex.UnTextIndex(name) + elif type == 'FullTextIndex': + indexes[name] = UnTextIndex.UnTextIndex(name, stop_word_dict={}) self.indexes = indexes *************** *** 408,414 **** rs=data.items() append(LazyMap(self.instantiate, rs)) else: ! for k, intset in sort_index.items(): append((k,LazyMap(self.__getitem__, intset))) elif rs: if sort_index is None: --- 410,416 ---- rs=data.items() append(LazyMap(self.instantiate, rs)) else: ! for k, intset in sort_index._index.items(): append((k,LazyMap(self.__getitem__, intset))) elif rs: if sort_index is None: *** lib/python/Products/ZCatalog/catalogIndexes.dtml.~1~ Thu Aug 26 10:20:43 1999 --- lib/python/Products/ZCatalog/catalogIndexes.dtml Tue Sep 21 00:56:43 1999 *************** *** 29,34 **** --- 29,35 ---- of Index Type: <select name="type"> <option value="TextIndex">TextIndex</option> + <option value="FullTextIndex">FullTextIndex</option> <option value="FieldIndex">FieldIndex</options> </select> <input name="manage_addIndex:method" type=submit value=" Add "> 106c106,107 < def __init__(self, id=None, ignore_ex=None, call_methods=None): ---
def __init__(self, id=None, ignore_ex=None, call_methods=None, stop_word_dict=None):
119a121,122
'stop_word_dict' -- An dictionary of stop words.
121c124 < if not id==ignore_ex==call_methods==None: ---
if not my_stop_word_dict==id==ignore_ex==call_methods==None:
127,128c130,133 < self._syn=stop_word_dict < ---
if my_stop_word_dict is None: self._syn=default_stop_word_dict else: self._syn=stop_word_dict
132d136 < 630,631c634,635 < stop_word_dict={} < for word in stop_words: stop_word_dict[word]=None ---
default_stop_word_dict={} for word in stop_words: default_stop_word_dict[word]=None
participants (1)
-
Michael Halle