[Zope-CMF] Re: ZCSearchPatch

Wed, 14 May 2003 08:44:35 -0700 (PDT)

Worked like a charm!
Thankyou :)

--- Casey Duncan <casey@zope.com> wrote:
> Actually this should be very easy to fix, see inline
> comment below:
> 
> On Wednesday 14 May 2003 10:36 am, Eric Dunn wrote:
> > ZCatalog issue:
> > Have code to strip out html tags so that the
> ZCatalog
> > does not pick up the html code when catalogging.
> > Works great... almost too good.
> > Our users are only copy-n-paste managers.
> > I found that stripping the "&nbsp;" (html space
> tag)
> > makes the catalog concantenate text... i.e.
> > 1234 1234 1234 1234 becomes '1234123412341234' in
> the
> > catalog.
> > 
> > Question: How can I tell the SearchPatch.py file
> to
> > ignore the space tag or treat it as a space?
> > 
> > 
> > import re
> > from SearchIndex.UnTextIndex import UnTextIndex
> > from string import find
> > 
> > # HTML regex to substitute tags and entities
> > html_re =
> re.compile(r'<[^\s0-9].*?>|&[a-zA-Z]*?;',
> > re.DOTALL)
> > 
> > class FauxDocument:
> >     """Proxy document to store munged source
> text"""
> >     def __init__(self, name, value):
> >         setattr(self, name, value)
> > 
> > # Get a reference to the original index_object
> method 
> > # so we can head patch it
> > original_index_object = UnTextIndex.index_object
> > 
> > def index_object(self, documentId, obj,
> > threshold=None):
> >     # sniff the object for our 'id', the 'document
> > source' of the
> >     # index is this attribute.  If it smells
> callable,
> > call it.
> >     try:
> >         source = getattr(obj, self.id)
> >         if callable(source):
> >             source = str(source())
> >         else:
> >             source = str(source)
> >     except (AttributeError, TypeError):
> >         return 0
> >         
> >     if find(source, '<') != -1:
> >         # Strip HTML tags and comments from source
> >         source = html_re.sub('', source)
> 
> Change the above line to:
> 
>          source = html_re.sub(' ', source)
> 
> (Insert a space between the single quotes)
> 
> >         # Create faux document with stripped
> source
> > content
> >         obj = FauxDocument(self.id, source)
> >         
> >     # Call original index method
> >     return original_index_object(self, documentId,
> > obj, threshold)
> > 
> > # Patch UnTextIndex class
> > UnTextIndex.index_object = index_object
> 
> Hope that helps,
> 
> -Casey
> 

=====
Eric N. Dunn
other email: endunn@aol.com

__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com