Minor typos/changes to ZCatalog.
Today's show opens with an usability bug (that's what it looks like to me, anyway). Sit back and feel free to buy an albatross from the strange man. I've got a ZCatalog named Catalog, living its normal life at /. When I'm in /Catalog/manage_catalogIndexes the index names are <a href>'s. Strange, since they all link to /Catalog. In the ZCatalog.py's doc string, it says: "[...] ZCatalog's can index either 'Field' values of object, or 'Text' values [...]". What about Keyword indexes? Am I confusing myself? And finally, the thing I really came here for, what good is the size? And who calculates it? I have three objects that has the attribute sales_line, although the sales_line index' size is 847. Clues? Also, could anyone point me to some good documentation (reference material would be best) about how the different Indexes work? I'm having a bit of a struggle with numbers, FieldIndexes and TextIndexes. Thanks! This is Zope 2.3.1b1 by the way; Linux.
Today's show opens with an usability bug (that's what it looks like to me, anyway). Sit back and feel free to buy an albatross from the strange man.
I've got a ZCatalog named Catalog, living its normal life at /.
When I'm in /Catalog/manage_catalogIndexes the index names are <a href>'s. Strange, since they all link to /Catalog.
Hmmm. I think this may be an aborted feature. It will either be removed or finished.
In the ZCatalog.py's doc string, it says: "[...] ZCatalog's can index either 'Field' values of object, or 'Text' values [...]". What about Keyword indexes? Am I confusing myself?
Probably. Keyword indexes work too.
And finally, the thing I really came here for, what good is the size? And who calculates it? I have three objects that has the attribute sales_line, although the sales_line index' size is 847. Clues?
This is the number of objects indexed by the index. If it's not working, that's a bug.
Also, could anyone point me to some good documentation (reference material would be best) about how the different Indexes work? I'm having a bit of a struggle with numbers, FieldIndexes and TextIndexes.
No. ;-) This is one of the things I'd like to get done soon.
Chris McDonough wrote:
Also, could anyone point me to some good documentation (reference material would be best) about how the different Indexes work? I'm having a bit of a struggle with numbers, FieldIndexes and TextIndexes.
No. ;-) This is one of the things I'd like to get done soon.
On the subject of numbers, I was wondering how to index alphanumeric values like ISBN numbers. They're unique values, so perhaps some other approach than an index is waranted, butthe simplest approach seems to be indexing them as well. Here are some sample ISBN's: 0201433311 087584877X As you can see by the second example, an ISBN can have letters as well as numbers in it, so it cannot be represented by an integer. Text indexes seem to ignore 'words' that contain numbers, though. Any suggestions? Michael Bernstein.
Erik Enge wrote:
On Fri, 23 Feb 2001, Michael R. Bernstein wrote:
On the subject of numbers, I was wondering how to index alphanumeric values like ISBN numbers.
Why can't you use FieldIndexes?
Because I'm actually Using a SkinScript to concatenate several attributes (Author, Title, id) +into one , so that I can index them all with a single text index. In that way, I reduce the indexing overhead, and it's easy to search multiple attributes for a match from a single search box. So how do I get the text index to index the alphanumeric ISBN values as well? Thanks, Michael Bernstein.
The short answer is "you can't, easily". The splitter breaks text into discrete words. The splitter also removes "stop" words, words under two characters long, numbers, and symbols. It returns a (non-unduped) list of words after pruning the text of stop words, symbols, and numbers. The current splitter implementation (as of Zope 2.3.0) is written in C, and it is most effective when used against English text. The splitter may also remove semantically desirable symbols which are part of words, or it may remove words completely. For example, the splitter will split the word "t-shirt" into "t" and "shirt". It will then drop "t" (because it's less than two characters), leaving "shirt". Another example: the splitter will turn the word "C++" into "C" (after removing symbols). It will then drop "C", removing the word entirely. If you wish to change this behavior, you need to delve into code to replace the splitter implementation. ----- Original Message ----- From: "Michael R. Bernstein" <webmaven@lvcm.com> To: "Erik Enge" <erik@esol.no> Cc: "Chris McDonough" <chrism@digicool.com>; <zope-dev@zope.org> Sent: Friday, February 23, 2001 9:15 PM Subject: Re: [Zope-dev] Minor typos/changes to ZCatalog.
Erik Enge wrote:
On Fri, 23 Feb 2001, Michael R. Bernstein wrote:
On the subject of numbers, I was wondering how to index alphanumeric values like ISBN numbers.
Why can't you use FieldIndexes?
Because I'm actually Using a SkinScript to concatenate several attributes (Author, Title, id) +into one , so that I can index them all with a single text index. In that way, I reduce the indexing overhead, and it's easy to search multiple attributes for a match from a single search box.
So how do I get the text index to index the alphanumeric ISBN values as well?
Thanks,
Michael Bernstein.
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
On Fri, 23 Feb 2001, Chris McDonough wrote:
The short answer is "you can't, easily".
I'm a bit confused. Will FieldIndexes also behave like TextIndex, in that they remove stop words, digits and so forth? I think I've picked up somewhere that FieldIndexes treats the whole content of the attribute it indexes as one big chunk of .. erm... characters, and doesn't remove anythink? Correct?
Erik Enge wrote:
Will FieldIndexes also behave like TextIndex, in that they remove stop words, digits and so forth?
No.
I think I've picked up somewhere that FieldIndexes treats the whole content of the attribute it indexes as one big chunk of .. erm... characters, and doesn't remove anythink? Correct?
Yes. But it's not just characters. A field index indexes an object, and uses the overloaded comparison operators for that object to put it in an appropriate place. So, you can index DateTime objects, tuples, strings, numbers, floats... -- Steve Alexander Software Engineer Cat-Box limited http://www.cat-box.net
[Steve Alexander] | But it's not just characters. A field index indexes an object, and uses | the overloaded comparison operators for that object to put it in an | appropriate place. So, you can index DateTime objects, tuples, strings, | numbers, floats... Could a field index succesfully handle the example you posted some time ago, storing the relative path of each object? -Morten
On Fri, 23 Feb 2001, Michael R. Bernstein wrote:
As you can see by the second example, an ISBN can have letters as well as numbers in it, so it cannot be represented by an integer. Text indexes seem to ignore 'words' that contain numbers, though.
Any suggestions?
A field or keyword index would work. Or you can make a very small change to splitter.c to stop it from ignoring numbers. Which is something I'd like to see as a standard feature of Catalog, actually. I can't think of any instances of using a text index where I did *not* want words with numbers indexed, and have a number of instances where I *do* want words with numbers indexed. Actually, the same applies to pure numbers, as well. --RDM
"R. David Murray" wrote:
On Fri, 23 Feb 2001, Michael R. Bernstein wrote:
As you can see by the second example, an ISBN can have letters as well as numbers in it, so it cannot be represented by an integer. Text indexes seem to ignore 'words' that contain numbers, though.
Any suggestions?
A field or keyword index would work. Or you can make a very small change to splitter.c to stop it from ignoring numbers.
Which is something I'd like to see as a standard feature of Catalog, actually. I can't think of any instances of using a text index where I did *not* want words with numbers indexed, and have a number of instances where I *do* want words with numbers indexed. Actually, the same applies to pure numbers, as well.
Hmm. this seems like there ought to be a checkbox next to the 'Add Index' form field labeled 'index numbers?'. Or maybe a 'Text and Numbers' index as an additional index type. What change needs to be made to splitter.c? Do I have to recompile Zope afterwards, or will a restart do it? Michael Bernstein.
Chris Withers [chrisw@nipltd.com] wrote:
"Michael R. Bernstein" wrote:
Hmm. this seems like there ought to be a checkbox next to the 'Add Index' form field labeled 'index numbers?'. Or maybe a 'Text and Numbers' index as an additional index type.
I like these ideas :-)
One of the ideas that has been tossed around for almost a year (and Jim and I both liked last we discussed it) was "drop-in" indexes. These would be individually managed, and you would be able to control the splitter more (which is what you're reffering to in this case. In addition, you might even have more detailed control over the searching behaviours... SMOC :-) Chris -- | Christopher Petrilli | petrilli@amber.org
Christopher Petrilli wrote:
SMOC :-)
You're in the South Michigan Orienteering Club? http://www.angelfire.com/mi/SMOC/ You think we should implement indexes using the Shell Multivariable Optimising Controller? http://www.yokogawa.com.sg/Public/Solution/SMOC.asp Or some Simulation Middleware Object Classes? http://www.ntsc.navy.mil/Programs/Tech/ModSim/SMOC/smoc.htm?men=PGD Otherwise, I'm stumped :-) -- Steve Alexander Software Engineer Cat-Box limited
Steve Alexander [steve@cat-box.net] wrote:
Christopher Petrilli wrote:
SMOC :-)
You're in the South Michigan Orienteering Club? http://www.angelfire.com/mi/SMOC/
Oh sorry, old reference... SMOC = Simple Matter of Code :) Chris -- | Christopher Petrilli | petrilli@amber.org
On Sat, 24 Feb 2001 21:44:36 -0500, Christopher Petrilli <petrilli@amber.org> wrote:
Chris Withers [chrisw@nipltd.com] wrote:
"Michael R. Bernstein" wrote:
Hmm. this seems like there ought to be a checkbox next to the 'Add Index' form field labeled 'index numbers?'. Or maybe a 'Text and Numbers' index as an additional index type.
I like these ideas :-)
One of the ideas that has been tossed around for almost a year (and Jim and I both liked last we discussed it) was "drop-in" indexes. These would be individually managed, and you would be able to control the splitter more (which is what you're reffering to in this case. In addition, you might even have more detailed control over the searching behaviours...
SMOC :-)
If you are interested in a short-term hack, it is possible implement your own type of index and add it to an existing catalog, without having to modify any of the ZCatalog product. Ive used that to implement a variant of KeywordIndex that uses a get_keyword method (rather than getattr) Toby Dickenson tdickenson@geminidataloggers.com
Toby Dickenson wrote:
If you are interested in a short-term hack, it is possible implement your own type of index and add it to an existing catalog, without having to modify any of the ZCatalog product.
Ok, how? Please keep in mind that I'm more of a designer and integrator than a coder. Thanks, Michael Bernstein.
On Mon, 26 Feb 2001 19:00:58 -0800, "Michael R. Bernstein" <webmaven@lvcm.com> wrote:
Toby Dickenson wrote:
If you are interested in a short-term hack, it is possible implement your own type of index and add it to an existing catalog, without having to modify any of the ZCatalog product.
Ok, how? Please keep in mind that I'm more of a designer and integrator than a coder.
Today it requires some development effort.... ZCatalogs are a zopeish wrapper around a zope-neutral catalog object, which is stored in the _catalog attribute. That leading underscore is a clue that you shouldnt be using it directly, however you need to in order to create a custom index. Liek I said, this is a hack. The main problem is that catalog (and hence ZCatalog) implements a factory interface where you specify the name of the index type (for example "TextIndex", and it creates the indexing objects. I use the function below to: 1. Use a catalogs factory interface to create a KeywordIndex, to allow it a chance to raise an exception if anything is wrong. 2. If nothing goes wrong then I assume it is safe to replace the standard KeywordIndex with my custom subclass of a KeywordIndex. def ensure_question_is_indexed(self,question): question = unicode(question) cat = self.storage.timeseries_catalog index = UnTrackingIndex(question) if index.id not in cat.indexes(): # Add and remove a keyword index using # the published interface, # to allow the catalog a chance to complain. cat._catalog.addIndex(index.id,'KeywordIndex') cat._catalog.delIndex(index.id) # Use the private interface to do the real work cat._catalog.indexes[index.id] = index cat._catalog._p_changed = 1 You will need to implement a subclass derived from one of the standard indexes to provide your custom indexing policy, whatever that is. Toby Dickenson tdickenson@geminidataloggers.com
Toby Dickenson wrote:
On Mon, 26 Feb 2001 19:00:58 -0800, "Michael R. Bernstein" <webmaven@lvcm.com> wrote:
Toby Dickenson wrote:
If you are interested in a short-term hack, it is possible implement your own type of index and add it to an existing catalog, without having to modify any of the ZCatalog product.
Ok, how? Please keep in mind that I'm more of a designer and integrator than a coder.
ZCatalogs are a zopeish wrapper around a zope-neutral catalog object, which is stored in the _catalog attribute. That leading underscore is a clue that you shouldnt be using it directly, however you need to in order to create a custom index. Liek I said, this is a hack.
The main problem is that catalog (and hence ZCatalog) implements a factory interface where you specify the name of the index type (for example "TextIndex", and it creates the indexing objects.
[snip description and code]
I am assuming that the code you provided goes into a manage_addCustomIndex method that is part of a CustomIndex Python Product.
You will need to implement a subclass derived from one of the standard indexes to provide your custom indexing policy, whatever that is.
Can you provide the code for your custom KeywordIndex, so I have a starting point? I realize yours subclasses a KeywordIndex, and I probably need to subclass a TextIndex, but it would still probably help. I can integrate and hack on other peoples code better than I can write my own from scratch. Thanks, Michael Bernstein.
On Fri, 23 Feb 2001, Chris McDonough wrote:
Probably. Keyword indexes work too.
Yeah, that was what I was getting at :)
This is the number of objects indexed by the index. If it's not working, that's a bug.
Then it looks like a bug. Lucky us, I don't have time to analyze this in a week or so.
No. ;-) This is one of the things I'd like to get done soon.
Great! When you get around to do it, maybe you could poke at Amos to update his ZCatalogHowTo also? It's quite out of date. Maybe a "This How-To covers product such-and-such version x.x" is in order as a standard feature of the How-To?
Probably. Keyword indexes work too.
Yeah, that was what I was getting at :)
Yes.
This is the number of objects indexed by the index. If it's not working, that's a bug.
Then it looks like a bug. Lucky us, I don't have time to analyze this in a week or so.
Sorry. I'd like to help.
No. ;-) This is one of the things I'd like to get done soon.
Great! When you get around to do it, maybe you could poke at Amos to update his ZCatalogHowTo also? It's quite out of date.
It should perhaps be removed in light of the fact that the Zope Book has a catalog chapter.
Maybe a "This How-To covers product such-and-such version x.x" is in order as a standard feature of the How-To?
This is a good idea. It would be an even better idea to allow folks to add comments to howto pages, so that if the original author did not maintain it well, well-placed comments could prevent folks from taking wild goose chases.
On Fri, 23 Feb 2001, Chris McDonough wrote:
Maybe a "This How-To covers product such-and-such version x.x" is in order as a standard feature of the How-To?
This is a good idea. It would be an even better idea to allow folks to add comments to howto pages, so that if the original author did not maintain it well, well-placed comments could prevent folks from taking wild goose chases.
Yes, maybe like the way the ACS lets you? That would work very nicely. Great suggestion! :)
I think Ethan has this in the pipeline... ----- Original Message ----- From: "Erik Enge" <erik@esol.no> To: "Chris McDonough" <chrism@digicool.com> Cc: <zope-dev@zope.org> Sent: Friday, February 23, 2001 1:56 PM Subject: Re: [Zope-dev] Minor typos/changes to ZCatalog.
On Fri, 23 Feb 2001, Chris McDonough wrote:
Maybe a "This How-To covers product such-and-such version x.x" is in order as a standard feature of the How-To?
This is a good idea. It would be an even better idea to allow folks to add comments to howto pages, so that if the original author did not maintain it well, well-placed comments could prevent folks from taking wild goose chases.
Yes, maybe like the way the ACS lets you? That would work very nicely. Great suggestion! :)
participants (9)
-
Chris McDonough -
Chris Withers -
Christopher Petrilli -
Erik Enge -
Michael R. Bernstein -
morten@esol.no -
R. David Murray -
Steve Alexander -
Toby Dickenson