[Zope-CMF] ZCSearchPatch
Eric Dunn
endunn@rocketmail.com
Wed, 14 May 2003 07:36:29 -0700 (PDT)
ZCatalog issue:
Have code to strip out html tags so that the ZCatalog
does not pick up the html code when catalogging.
Works great... almost too good.
Our users are only copy-n-paste managers.
I found that stripping the " " (html space tag)
makes the catalog concantenate text... i.e.
1234 1234 1234 1234 becomes '1234123412341234' in the
catalog.
Question: How can I tell the SearchPatch.py file to
ignore the space tag or treat it as a space?
import re
from SearchIndex.UnTextIndex import UnTextIndex
from string import find
# HTML regex to substitute tags and entities
html_re = re.compile(r'<[^\s0-9].*?>|&[a-zA-Z]*?;',
re.DOTALL)
class FauxDocument:
"""Proxy document to store munged source text"""
def __init__(self, name, value):
setattr(self, name, value)
# Get a reference to the original index_object method
# so we can head patch it
original_index_object = UnTextIndex.index_object
def index_object(self, documentId, obj,
threshold=None):
# sniff the object for our 'id', the 'document
source' of the
# index is this attribute. If it smells callable,
call it.
try:
source = getattr(obj, self.id)
if callable(source):
source = str(source())
else:
source = str(source)
except (AttributeError, TypeError):
return 0
if find(source, '<') != -1:
# Strip HTML tags and comments from source
source = html_re.sub('', source)
# Create faux document with stripped source
content
obj = FauxDocument(self.id, source)
# Call original index method
return original_index_object(self, documentId,
obj, threshold)
# Patch UnTextIndex class
UnTextIndex.index_object = index_object
=====
Eric N. Dunn
other email: endunn@aol.com
__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com