[Zope-CMF] Re: Searching multilingual CMF sites

danielle.d-avout danielle.d-avout@wanadoo.fr
Sat, 8 Mar 2003 00:37:31 +0100


> OK, I'm well on the way to solving this problem.  Thought I'd share my
> approach for posterity -- future archive-searchers will no doubt thank
> me.  ;-)

present ones as well no doubt!



> Turns out this was a CMF/Zope question; Localizer barely enters into it.
> (It's only needed to find out the user's current language at search
> time.)  Here's what I did:
>
>   * in $portal/portal_catalog, create two new vocabularies:
>     vocab_en and vocab_fr
>
>   * then pop over to the "Indexes" tab and create two new indeces:
>     SearchableText_en and SearchableText_fr.  Use the corresponding
>     language-specific vocabulary in each index.
>
>   * I already had a SearchableText() method in LocDublinCore,
>     which all of my content classes inherit from (shamelessly
>     stolen from Rainer Thaden's LocCMFProduct); I extended it to
>     have a language-neutral mode and language-specific modes,
>     then added trivial SearchableText_en() and SearchableText_fr(
>     wrappers.  Here's the code:
>
>       def SearchableText (self, language=None):
>           words = []
>           for pty in self._local_properties.keys():
>               pty_val = self._local_properties[pty]
>               if language is None:        # index all languages
>                   for (lang, val) in pty_val.items():
>                       if lang and val:
>                           words.append(val)
>               else:                       # only index selected language
>                   val = pty_val.get(language)
>                   if val:
>                       words.append(val)
>
>           return " ".join(words)
>
>       def SearchableText_en (self):
>           return self.SearchableText(language="en")
>
>       def SearchableText_fr (self):
>           return self.SearchableText(language="fr")
>
>     This is fairly evil, since it grubs rudely through data structures
>     inherited from LocalPropertyManager (part of Localizer).  I didn't
>     see a clean + efficient way to do this, so I went with rude +
>     efficient.  ;-(
>
>     Also, hard-coding the set of languages into those two wrapper
>     methods is Just Wrong.  I think I can get around that with a clever
>     __getattr__() method, but haven't done that yet.
>
>   * finally, I modified the search method to select the index to
>     search based on the user's current language.  My search form
>     looks (roughly) like this:
>
>       <form name="searchform" action="search"
>             tal:attributes="action string:${portal_url}/search"
method="GET">
>         <input id="searchGadget"
>                name="text"
>                type="text"
>                size="15"
>                value="">
>       </form>
>
>     And here's the Python Script that processes this form:
>
>       text = context.REQUEST.get("text")
>       if text:
>           lang = context.Localizer.get_selected_language()
>           key = "SearchableText_%s" % lang
>           query = {key : text}
>           return context.portal_catalog(query)
>       else:
>           return []
>
> ...and this works fine!  There are only two problems left:
>
>   * search results are shown in the language that was current when
>     the object was cataloged, presumably because of the way ZCatalog
>     harvests meta-data at catalog-time.  I suspect I can fix this if
>     I can persuade ZCatalog to harvest meta-data in all available
>     languages.
>
>   * searching for words with non-ASCII characters is tricky -- IMHO,
>     searching for "francais" should yield the same as searching for
>     "français", ie. the index should take care of collapsing accented
>     characters somehow.  But I'm no linguist -- that might just
>     squeak by with accents in French, but whether the same approach
>     would work for Nordic å or German ß, I don't know.  Anyways,
>     this should be up to either the index or the vocabulary -- it's
>     not my problem!
>
> --
> Greg Ward <gward@python.net>                         http://www.gerg.ca/
>
> _______________________________________________
> Zope-CMF maillist  -  Zope-CMF@zope.org
> http://mail.zope.org/mailman/listinfo/zope-cmf
>
> See http://collector.zope.org/CMF for bug reports and feature requests
>