Hello folks, I'm just reporting some general weirdness with ZCatalog. I'm sure by now almost everyone has tried the CD class and catalog, right? Okay, I just so happen to have a CD titled "For You For Me". ZCatalog did not like it. Searching finds no entry with that data. I tried "Dare to Dream" and it works on words "dare" and "dream"...but not "to". (Yes, in all cases they are strings - textindex) So I thought...maybe it's the title - some strange restriction...I tried in other fields (eg. artist) and still...nada. I have found that ZCatalog does not like the words: for, you, me, to. Why? Are these somehow special words? Searching for these words fail everytime. Also, ZCatalog-aware does not seem to update the catalog properly. A manual "update catalog" is still needed. Has anyone else encountered these problems? I tried other classes (non CD) and I get the same weirdness...ZCatalog cannot search for those words above. (there might well be more "phantom" words about...) Could it be that ZCatalog considers words of less than 4 letters..."non words"? -- ----------------------------------------- Kuraiken - Python fanatic. ----------------------------------------- Python. Try it. It'll swallow you whole! -----------------------------------------
Kuraiken wrote:
Hello folks,
I'm just reporting some general weirdness with ZCatalog.
I'm sure by now almost everyone has tried the CD class and catalog, right? Okay, I just so happen to have a CD titled "For You For Me".
ZCatalog did not like it. Searching finds no entry with that data. I tried "Dare to Dream" and it works on words "dare" and "dream"...but not "to".
(Yes, in all cases they are strings - textindex)
So I thought...maybe it's the title - some strange restriction...I tried in other fields (eg. artist) and still...nada.
I have found that ZCatalog does not like the words: for, you, me, to. Why? Are these somehow special words? Searching for these words fail everytime.
The answer is simple; ZCatalog doesn't like your taste in music. Seriously, 'you', 'me' and 'for' are stopwords. They're frequency in the english languge is so common that searching for them would return te majority of a document 'corpus'. Given that you are using titles, in which each word is more 'important' that those of a document this is a mis-feature for you.
Also, ZCatalog-aware does not seem to update the catalog properly. A manual "update catalog" is still needed.
Are you sure 'CatalogAwareness' is the first base class for your ZClass?
Has anyone else encountered these problems? I tried other classes (non CD) and I get the same weirdness...ZCatalog cannot search for those words above. (there might well be more "phantom" words about...)
Could it be that ZCatalog considers words of less than 4 letters..."non words"?
Not necesarrily, but it does consider one letter words as stopwords. This kinda kicks ya when try and search for anyone with 'C' programming experience. The solution to all of these problems is for us to impliment some kind of vocabulary object which allows you to beter control the vocabulary that the catalog uses. -Michel
Thanks also to Martijn for pointing out the list file.
I have found that ZCatalog does not like the words: for, you, me, to. Why? Are these somehow special words? Searching for these words fail everytime.
The answer is simple; ZCatalog doesn't like your taste in music.
Gee, thanks. :-(
Seriously, 'you', 'me' and 'for' are stopwords. They're frequency in the english languge is so common that searching for them would return te majority of a document 'corpus'. Given that you are using titles, in which each word is more 'important' that those of a document this is a mis-feature for you.
I can understand prepositions to be in this list. Is there a way to get the catalog to search a whole string? eg: "For You For Me" - the frequency of _that_ combination would be a lot less, wouldn't it? Also, since these words are in a title, not being able to look for it sucks. Is there a way to force ZCatalog to "ignore" the stopword list for specified/specific fields? (in this case "title" but there could be others in other applications) I can also see applications where one would want to look for sentences where such words exist...where ballooning would be tolerated for vgrepping the specific occurance of that word (in say a legal doc or something) in a number of objects.
Also, ZCatalog-aware does not seem to update the catalog properly. A manual "update catalog" is still needed.
Are you sure 'CatalogAwareness' is the first base class for your ZClass?
I'm not _totally_ new on this list ;-) Yes, it is. The first and in fact the only one I specified as base). So it's a child of ZObject and Catalog_aware. When I added the objects, the object seems to be listed in the catalog (when I go to the catalog's "cataloged objects" tab) but searching for it does not work. It says entry not found. And, if I have a few objects with the same data in a field: eg. Several CDs with publisher "BMG Records" And add a new CD object with the same publisher above, a search for objects with "BMG" in the relevent field yields only the ones already stored (and had Zcatalog updated with "update catalog") appear in the result list. The new one would be missing - everytime. Unless I did a manual "update catalog" first. So, now my workaround is to call update catalog before a search is made - which defeats the purpose of the aware objects... It seems the object is stored but the fields (index?) are not. Or maybe some of it is not. The id is, and so is another index I cannot recall. Try it.
Has anyone else encountered these problems? I tried other classes (non CD) and I get the same weirdness...ZCatalog cannot search for those words above. (there might well be more "phantom" words about...)
Could it be that ZCatalog considers words of less than 4 letters..."non words"?
Not necesarrily, but it does consider one letter words as stopwords. This kinda kicks ya when try and search for anyone with 'C' programming experience.
Exactly, this is why I suggest a "switch" or toggle mechanism to "disregard stopword list".
The solution to all of these problems is for us to impliment some kind of vocabulary object which allows you to beter control the vocabulary that the catalog uses.
-Michel
Hmmm...isn't a switch easier? (the user would of course be _forwarned_ of the potentially large result set - and this mode would not be "default") -- ----------------------------------------- Kuraiken - Python fanatic. ----------------------------------------- Python. Try it. It'll swallow you whole! -----------------------------------------
Michel Pelletier wrote:
Could it be that ZCatalog considers words of less than 4 letters..."non words"?
Not necesarrily, but it does consider one letter words as stopwords. This kinda kicks ya when try and search for anyone with 'C' programming experience.
The solution to all of these problems is for us to impliment some kind of vocabulary object which allows you to beter control the vocabulary that the catalog uses.
Hmm, interesting. A controlled vocabulary (complete with synonyms and stemming) would be a very interesting product. few (if any) automated products exist for this in the web application space. I've been interested in controlled vocabularies for a while now, ever since I read 'Information Architecture for the World Wide Web' by Louis Rosenfeld and Peter Morville. (http://www.amazon.com/exec/obidos/ASIN/1565922824/) However I was disappointed at the time that all such vocabularies had to be created and maintained by hand, even if you were using an existing one (say from an online glossary), I've never had a client that would spring for creating a vocabulary maintenance tool. ('why do we need that?'). With a controlled vocabulary for cataloging and retreiving objects, and Topics to arange objects into arbitrary heirarchies, Zope could credibly claim to have one of the most advanced Web Content Management Sytems on the market. Michael Bernstein.
participants (3)
-
Kuraiken -
Michael Bernstein -
Michel Pelletier