[Zope] Something odd about ZCatalog...

Kuraiken arashi1@pd.jaring.my
Sun, 12 Sep 1999 13:08:46 +0800


Thanks also to Martijn for pointing out the list file.

> > I have found that ZCatalog does not like the words: for, you, me, to. Why? Are
> > these somehow special words? Searching for these words fail everytime.
> 
> The answer is simple; ZCatalog doesn't like your taste in music.

Gee, thanks. :-(

> 
> Seriously, 'you', 'me' and 'for' are stopwords.  They're frequency in
> the english languge is so common that searching for them would return te
> majority of a document 'corpus'.  Given that you are using titles, in
> which each word is more 'important' that those of a document this is a
> mis-feature for you.

I can understand prepositions to be in this list. Is there a way to get the
catalog to search a whole string? eg:

"For You For Me" - the frequency of _that_ combination would be a lot less,
wouldn't it? 

Also, since these words are in a title, not being able to look for it sucks. Is
there a way to force ZCatalog to "ignore" the stopword list for
specified/specific fields? (in this case "title" but there could be others in
other applications)

I can also see applications where one would want to look for sentences where
such words exist...where ballooning would be tolerated for vgrepping the
specific occurance of that word (in say a legal doc or something) in a number of
objects.

> 
> > Also, ZCatalog-aware does not seem to update the catalog properly. A manual
> > "update catalog" is still needed.
> 
> Are you sure 'CatalogAwareness' is the first base class for your ZClass?

I'm not _totally_ new on this list ;-)

Yes, it is. The first and in fact the only one I specified as base). So it's a
child of ZObject and Catalog_aware.

When I added the objects, the object seems to be listed in the catalog (when I
go to the catalog's "cataloged objects" tab) but searching for it does not work.
It says entry not found. And, if I have a few objects with the same data in a
field:

eg. Several CDs with publisher "BMG Records"

And add a new CD object with the same publisher above, a search for objects with
"BMG" in the relevent field yields only the ones already stored (and had
Zcatalog updated with "update catalog") appear in the result list. The new one
would be missing - everytime. Unless I did a manual "update catalog" first.

So, now my workaround is to call update catalog before a search is made - which
defeats the purpose of the aware objects...

It seems the object is stored but the fields (index?) are not. Or maybe some of
it is not. The id is, and so is another index I cannot recall. Try it.

> 
> > Has anyone else encountered these problems? I tried other classes (non CD) and I
> > get the same weirdness...ZCatalog cannot search for those words above. (there
> > might well be more "phantom" words about...)
> >
> > Could it be that ZCatalog considers words of less than 4 letters..."non words"?
> 
> Not necesarrily, but it does consider one letter words as stopwords.
> This kinda kicks ya when try and search for anyone with 'C' programming
> experience.

Exactly, this is why I suggest a "switch" or toggle mechanism to "disregard
stopword list".

> 
> The solution to all of these problems is for us to impliment some kind
> of vocabulary object which allows you to beter control the vocabulary
> that the catalog uses.
> 
> -Michel

Hmmm...isn't a switch easier? (the user would of course be _forwarned_ of the
potentially large result set - and this mode would not be "default")

-- 
-----------------------------------------
Kuraiken - Python fanatic.
-----------------------------------------
Python. Try it. It'll swallow you whole!
-----------------------------------------