RE: [Zope] substring search on zcatalog textindex
So the question is, does anyone know of a simple way to get the zcatalog to also find substring matches on a textindex?
-- Peter Armstrong
For Text Indexes you can use wildcards like * and ? in searches. So that searching for Foo* would find Foo, FooBar, fool, etc. I'm not sure if this works for Field or Keyword indexes though, I haven't tried it. My though is it only works for text indexes which should help you anyway. Good Luck, Casey Duncan
Casey Duncan wrote:
So the question is, does anyone know of a simple way to get the zcatalog to also find substring matches on a textindex?
-- Peter Armstrong
For Text Indexes you can use wildcards like * and ? in searches. So that searching for Foo* would find Foo, FooBar, fool, etc. I'm not sure if this works for Field or Keyword indexes though, I haven't tried it. My though is it only works for text indexes which should help you anyway.
What do you do if you actually want to search for a '*' or a '?' ? cheers, Chris
TextIndexes index individual words separately (using a vocabulary object to identify each word in the catalog). All non-alphanumeric characters (such as punctuation) are dropped so that excludes searching for "?" or "*" or any other non-alphanumerics using TextIndexes. Not much of this is formally documented, but you can review the source (in Python and C) for the indexing mechanism in: {YourZopeDir}/lib/python/SearchIndex -----Original Message----- From: Chris Withers [mailto:chrisw@nipltd.com] Sent: Thursday, August 10, 2000 2:37 AM To: casey.duncan@state.co.us Cc: pja@clari.net.au; zope@zope.org Subject: Re: [Zope] substring search on zcatalog textindex Casey Duncan wrote:
So the question is, does anyone know of a simple way to get the zcatalog to also find substring matches on a textindex?
-- Peter Armstrong
For Text Indexes you can use wildcards like * and ? in searches. So that searching for Foo* would find Foo, FooBar, fool, etc. I'm not sure if this works for Field or Keyword indexes though, I haven't tried it. My though
is
it only works for text indexes which should help you anyway.
What do you do if you actually want to search for a '*' or a '?' ? cheers, Chris
Casey Duncan wrote:
TextIndexes index individual words separately (using a vocabulary object to identify each word in the catalog). All non-alphanumeric characters (such as punctuation) are dropped so that excludes searching for "?" or "*" or any other non-alphanumerics using TextIndexes.
IIRC, you can use another Vocabulary that wouldn't necessarilly behave like that. I wonder if this works yet? If it does, my initial question still remains ;-) cheers, Chris
On Thu, 10 Aug 2000, Chris Withers wrote:
Casey Duncan wrote:
TextIndexes index individual words separately (using a vocabulary object to identify each word in the catalog). All non-alphanumeric characters (such as punctuation) are dropped so that excludes searching for "?" or "*" or any other non-alphanumerics using TextIndexes.
IIRC, you can use another Vocabulary that wouldn't necessarilly behave like that. I wonder if this works yet?
If it does, my initial question still remains ;-)
In fact, to use substring matching on a text index, you have to set the Vocabulary to support it *when you first create the ZCatalog*. Which seems a little bit broken to me, since unless I missed something you have to choose not to add a vocabulary when you first create the ZCatalog, and then go add one with substring checked from the management screens afterwards. In the code the difference between the checkbox checked and not checked is two different lexicon implementations, not just the setting of a flag. So you can't just change the flag, you have to see to it that the whole lexicon gets rebuilt. I haven't checked into how you do that yet <wry grin>. As for your question about different vocabularies, punctuation, and globbing support, all of that happens at the lexicon level. That is, the lexicon implements the breaking up of strings into indexable words (calling the splitter) and it also implements the expansion of wildcards into matches. It gets to do that *before* the text index machinery parses the query, so it can decide on the syntax rules, as far as I can see. So if you want a Vocabulary that supports punctuation and globbing, you'll have to write one, and then *you* get to decide what the syntax is to handle the case you ask about <grin>. --RDM
This is illuminating. I have a question maybe you (or somebody else) could answer: Searching TextIndexes you can use "and", "or" or "andnot" as query criteria. I also see support in the source code for near searches using "..." in the query string. I have not been able to get this to work (although the first three work great for me), and would like to. Am I missing something? -----Original Message----- From: R. David Murray [mailto:bitz@bitdance.com] Sent: Monday, August 14, 2000 12:59 PM To: Chris Withers Cc: casey.duncan@state.co.us; zope@zope.org Subject: Re: [Zope] substring search on zcatalog textindex On Thu, 10 Aug 2000, Chris Withers wrote:
Casey Duncan wrote:
TextIndexes index individual words separately (using a vocabulary object to identify each word in the catalog). All non-alphanumeric characters (such as punctuation) are dropped so that excludes searching for "?" or "*" or any other non-alphanumerics using TextIndexes.
IIRC, you can use another Vocabulary that wouldn't necessarilly behave like that. I wonder if this works yet?
If it does, my initial question still remains ;-)
In fact, to use substring matching on a text index, you have to set the Vocabulary to support it *when you first create the ZCatalog*. Which seems a little bit broken to me, since unless I missed something you have to choose not to add a vocabulary when you first create the ZCatalog, and then go add one with substring checked from the management screens afterwards. In the code the difference between the checkbox checked and not checked is two different lexicon implementations, not just the setting of a flag. So you can't just change the flag, you have to see to it that the whole lexicon gets rebuilt. I haven't checked into how you do that yet <wry grin>. As for your question about different vocabularies, punctuation, and globbing support, all of that happens at the lexicon level. That is, the lexicon implements the breaking up of strings into indexable words (calling the splitter) and it also implements the expansion of wildcards into matches. It gets to do that *before* the text index machinery parses the query, so it can decide on the syntax rules, as far as I can see. So if you want a Vocabulary that supports punctuation and globbing, you'll have to write one, and then *you* get to decide what the syntax is to handle the case you ask about <grin>. --RDM
On Mon, 14 Aug 2000, Casey Duncan wrote:
Searching TextIndexes you can use "and", "or" or "andnot" as query criteria. I also see support in the source code for near searches using "..." in the query string. I have not been able to get this to work (although the first three work great for me), and would like to.
I haven't tried to get near searches to work yet; haven't had the need. I'm just reading the code <grin>. I think Dieter looked at this in 2.1.x and found some bugs, and I don't know if his fixes got into 2.2; I don't think he's had opportunity to test 2.2 yet. If you scan the zope.nipltd.com archives for '...' you'll probably find the relevant messages. Or 'zcatalog and near' if '...' isn't a valid search string. --RDM
participants (3)
-
Casey Duncan -
Chris Withers -
R. David Murray