[Zope] substring search on zcatalog textindex

Casey Duncan casey.duncan@state.co.us
Mon, 14 Aug 2000 13:18:13 -0600


This is illuminating. I have a question maybe you (or somebody else) could
answer:

Searching TextIndexes you can use "and", "or" or "andnot" as query criteria.
I also see support in the source code for near searches using "..." in the
query string. I have not been able to get this to work (although the first
three work great for me), and would like to.

Am I missing something?

-----Original Message-----
From: R. David Murray [mailto:bitz@bitdance.com]
Sent: Monday, August 14, 2000 12:59 PM
To: Chris Withers
Cc: casey.duncan@state.co.us; zope@zope.org
Subject: Re: [Zope] substring search on zcatalog textindex


On Thu, 10 Aug 2000, Chris Withers wrote:
> Casey Duncan wrote:
> > TextIndexes index individual words separately (using a vocabulary object
to
> > identify each word in the catalog). All non-alphanumeric characters
(such as
> > punctuation) are dropped so that excludes searching for "?" or "*" or
any
> > other non-alphanumerics using TextIndexes.
>
> IIRC, you can use another Vocabulary that wouldn't necessarilly behave
> like that.
> I wonder if this works yet?
>
> If it does, my initial question still remains ;-)

In fact, to use substring matching on a text index, you have to
set the Vocabulary to support it *when you first create the ZCatalog*.
Which seems a little bit broken to me, since unless I missed
something you have to choose not to add a vocabulary when you first
create the ZCatalog, and then go add one with substring checked
from the management screens afterwards.  In the code the difference
between the checkbox checked and not checked is two different
lexicon implementations, not just the setting of a flag.  So you
can't just change the flag, you have to see to it that the whole
lexicon gets rebuilt.  I haven't checked into how you do that yet
<wry grin>.

As for your question about different vocabularies, punctuation,
and globbing support, all of that happens at the lexicon level.
That is, the lexicon implements the breaking up of strings into
indexable words (calling the splitter) and it also implements the
expansion of wildcards into matches.  It gets to do that *before*
the text index machinery parses the query, so it can decide on the
syntax rules, as far as I can see.  So if you want a Vocabulary
that supports punctuation and globbing, you'll have to write one,
and then *you* get to decide what the syntax is to handle the case
you ask about <grin>.

--RDM