Re: [Zope] Search with partial words on own ZClass
Is that just partial searching, or also wildcard and even regexp searching?
I define 'Partial' and 'wildcard' as the same thing, but I'm using my own terminology so I could be wrong. I define partial as 'finding part or all of a word', which can be acomplished with wildcards '*part*'.
Yes [snip explanation]
Regular expressions are not feasiable in any searching system. Although it may be possible, with the existing lexical analysis that globbing lexicons do, to implement a larger subset of regexp than just * and ?, it is not feasable to implement the entire regexp language.
No, of course not. I was more thinking along the lines of (to stick with your example) fl[e|a|u]c*, but it doesn't really matter.
And since you keep locations of the words, is there proximity searching also possible?
The location in the document is not kept, just the score. There are TextIndex methods however for finding the positions of words in a document, this is used to support the 'Near' operator, which is '...' This operator exists in TextIndexes now (it allways has, since I took over the indexing realm), I tested them a few months ago but couldn't get the concept to work. I suspect it's buggy, the code holds over from ZTables.
Ok, that's clear
Another question: how do I retrieve a list of unique words from a
full-text
catalog?
In 2.1, you need to hack the lexicon from Python. In 2.2, you call a Vocabulary object's 'words' method, or you can call the Vocabulary with a pattern '*' to match all words, or a more restrictive pattern if you only want all the unique words that match a pattern, like '*ing', all the words that end in ing.
That's very nice, and opens up (even more) possibilities for an already great product.
Now, I know there is no standard way, but is it possible at all.
In 2.2 it is standard (and documented in the Interfaces Wiki).
Can I use the items, keys etc interfaces of the text index (perhaps with some python hacking)?
TextIndexes do not store the word, they store an integer that the lexicon maps to a word. This is so text indexes can be language independent.
right Rik
participants (1)
-
Rik Hoekstra