[Zope-dev] Modifying Splitter.c to search on '+' & '#', and single
letter words
Michel Pelletier
michel@digicool.com
Wed, 25 Jul 2001 22:25:53 -0700
Harry Wilkinson wrote:
>
> I have two problems with getting ZCatalog to search for what I need:
>
> 1) Need to be able to search for words like 'J++' and 'C#'
> - this is relatively simple to do by editing Splitter.c a little
> and recompiling
> 2) Need to be able to search for single-letter words like 'C'
> - this is easy to modify Splitter.c to accomodate, but causes
> errors in GlobbingLexicon.py, even though the vocabulary is standard
>
> So far I have solved problem (1) by changing the contents of Splitter.c,
> but that's a bit messy. Currently I don't know of an alternative
> though.
>
> I have modified Splitter.c so it indexes the extra characters, and
> reduced the mimimum word length to 1, which works fine when indexing,
> and I can see all the symbol-inclusive words and single-letter words in
> the vocabulary. Unfortunately, any search on a single-letter word gives
> an IndexError, "String out of range".
This is because the globbinglexicon never anticipated single letter
patterns. This is a bug. Try this (untested) quick patch:
Index: GlobbingLexicon.py
===================================================================
RCS file:
/cvs-repository/Zope2/lib/python/SearchIndex/GlobbingLexicon.py,v
retrieving revision 1.9
diff -c -r1.9 GlobbingLexicon.py
*** GlobbingLexicon.py 2001/04/02 18:19:45 1.9
--- GlobbingLexicon.py 2001/07/26 05:21:48
***************
*** 221,226 ****
--- 221,229 ----
if i == 0:
digrams.insert(i, (self.eow + pattern[i]) )
+ if len(pattern) == 1:
+ digrams.append( (pattern[i] + self.eow) )
+ break
digrams.append((pattern[i] + pattern[i+1]))
else:
try:
> I am stuck on problem (2) and don't know how to avoid the errors arising
> in GlobbingLexicon.py without editing in some kind of hack to get around
> it.
That's exactly what this patch does.
> I don't even know why GlobbingLexicon is getting involved in the
> search process since I am not trying to use wildcards and haven't
> elected to use a globbing vocabulary (AFAIK).
You must have somehow, GlobbingLexicon is never the default.
-Michel