[Zope-CMF] Re: ZCSearchPatch
Eric Dunn
endunn@rocketmail.com
Wed, 14 May 2003 08:44:35 -0700 (PDT)
Worked like a charm!
Thankyou :)
--- Casey Duncan <casey@zope.com> wrote:
> Actually this should be very easy to fix, see inline
> comment below:
>
> On Wednesday 14 May 2003 10:36 am, Eric Dunn wrote:
> > ZCatalog issue:
> > Have code to strip out html tags so that the
> ZCatalog
> > does not pick up the html code when catalogging.
> > Works great... almost too good.
> > Our users are only copy-n-paste managers.
> > I found that stripping the " " (html space
> tag)
> > makes the catalog concantenate text... i.e.
> > 1234 1234 1234 1234 becomes '1234123412341234' in
> the
> > catalog.
> >
> > Question: How can I tell the SearchPatch.py file
> to
> > ignore the space tag or treat it as a space?
> >
> >
> > import re
> > from SearchIndex.UnTextIndex import UnTextIndex
> > from string import find
> >
> > # HTML regex to substitute tags and entities
> > html_re =
> re.compile(r'<[^\s0-9].*?>|&[a-zA-Z]*?;',
> > re.DOTALL)
> >
> > class FauxDocument:
> > """Proxy document to store munged source
> text"""
> > def __init__(self, name, value):
> > setattr(self, name, value)
> >
> > # Get a reference to the original index_object
> method
> > # so we can head patch it
> > original_index_object = UnTextIndex.index_object
> >
> > def index_object(self, documentId, obj,
> > threshold=None):
> > # sniff the object for our 'id', the 'document
> > source' of the
> > # index is this attribute. If it smells
> callable,
> > call it.
> > try:
> > source = getattr(obj, self.id)
> > if callable(source):
> > source = str(source())
> > else:
> > source = str(source)
> > except (AttributeError, TypeError):
> > return 0
> >
> > if find(source, '<') != -1:
> > # Strip HTML tags and comments from source
> > source = html_re.sub('', source)
>
> Change the above line to:
>
> source = html_re.sub(' ', source)
>
> (Insert a space between the single quotes)
>
> > # Create faux document with stripped
> source
> > content
> > obj = FauxDocument(self.id, source)
> >
> > # Call original index method
> > return original_index_object(self, documentId,
> > obj, threshold)
> >
> > # Patch UnTextIndex class
> > UnTextIndex.index_object = index_object
>
> Hope that helps,
>
> -Casey
>
=====
Eric N. Dunn
other email: endunn@aol.com
__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com