Ignore stopwords/characters in alphabetical results
Hi, I have been asked to improve the order of search results, so that stop words and certain characters at the beginning of a title are ignored. "Final Report", "The Final Report" and "[Final] Report" all need to appear under the letter 'F'. We are running Zope 2.7.8-final with python 2.3.5, under FreeBSD6. Until I was hit with this request, the default ZCatalog and ZCTextIndexes have given good results. I did try to install TextIndexNG3, according to the instructions in the readme but was unable to restart Zope (no message was left in event.log). Would this product make the difference I need? Thanks, Ken __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
--On 2. Februar 2006 00:41:38 -0800 Ken Ara <feedreader@yahoo.com> wrote:
I did try to install TextIndexNG3, according to the instructions in the readme but was unable to restart Zope (no message was left in event.log).
As documented: TXNG 3 does not work with pre-Zope 2.8 installation unless you have proper Five installation. -aj
I have been asked to improve the order of search results, so that stop words and certain characters at the beginning of a title are ignored. "Final Report", "The Final Report" and "[Final] Report" all need to appear under the letter 'F'.
We are running Zope 2.7.8-final with python 2.3.5, under FreeBSD6. Until I was hit with this request, the default ZCatalog and ZCTextIndexes have given good results.
ZCTextIndex has a list of stop words that you could probably modify. This should get you pointed in the right direction: http://www.zope.org/Members/dedalu/ZCTextIndex_python hth Jonathan
Ken Ara wrote at 2006-2-2 00:41 -0800:
I have been asked to improve the order of search results, so that stop words and certain characters at the beginning of a title are ignored. "Final Report", "The Final Report" and "[Final] Report" all need to appear under the letter 'F'.
Thus, you only want the change the result order. Unless you want relevancy ranking (which is not the case, depending on your description), ordering has nothing to do with the indexes (at least not the text indexes). Ordering can be done with "sequence.sort" (documented in the Zope Online help system) or with Python's "sort" method. In both cases, you can provide your own comparison function. The comparison faction to use the vocabulary to check for stopwords (words not known by the vocabulary are stopwords). -- Dieter
Thanks Dieter and others for helping me understand this problem. In the end I added the following code to my product: def norm_title(self): """Returns a normalized copy of the title for sorting purposes""" nt = '' if hasattr(self, 'title'): nt = re.sub('^A |^An |^The |\W', ' ', self.title) nt = join(split(nt)) return nt I then added a norm_title index to my ZCatalog for sorting. I'm a regex newbie so any improvements are welcome! Thanks, Ken --- Dieter Maurer <dieter@handshake.de> wrote:
Ken Ara wrote at 2006-2-2 00:41 -0800:
I have been asked to improve the order of search results, so that stop words and certain characters at the beginning of a title are ignored. "Final Report", "The Final Report" and "[Final] Report" all need to appear under the letter 'F'.
Thus, you only want the change the result order.
Unless you want relevancy ranking (which is not the case, depending on your description), ordering has nothing to do with the indexes (at least not the text indexes).
Ordering can be done with "sequence.sort" (documented in the Zope Online help system) or with Python's "sort" method. In both cases, you can provide your own comparison function. The comparison faction to use the vocabulary to check for stopwords (words not known by the vocabulary are stopwords).
-- Dieter
__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
participants (4)
-
Andreas Jung -
Dieter Maurer -
Jonathan -
Ken Ara