[Zope-CMF] Re: Searching multilingual CMF sites
danielle.d-avout
danielle.d-avout@wanadoo.fr
Sat, 8 Mar 2003 00:37:31 +0100
> OK, I'm well on the way to solving this problem. Thought I'd share my
> approach for posterity -- future archive-searchers will no doubt thank
> me. ;-)
present ones as well no doubt!
> Turns out this was a CMF/Zope question; Localizer barely enters into it.
> (It's only needed to find out the user's current language at search
> time.) Here's what I did:
>
> * in $portal/portal_catalog, create two new vocabularies:
> vocab_en and vocab_fr
>
> * then pop over to the "Indexes" tab and create two new indeces:
> SearchableText_en and SearchableText_fr. Use the corresponding
> language-specific vocabulary in each index.
>
> * I already had a SearchableText() method in LocDublinCore,
> which all of my content classes inherit from (shamelessly
> stolen from Rainer Thaden's LocCMFProduct); I extended it to
> have a language-neutral mode and language-specific modes,
> then added trivial SearchableText_en() and SearchableText_fr(
> wrappers. Here's the code:
>
> def SearchableText (self, language=None):
> words = []
> for pty in self._local_properties.keys():
> pty_val = self._local_properties[pty]
> if language is None: # index all languages
> for (lang, val) in pty_val.items():
> if lang and val:
> words.append(val)
> else: # only index selected language
> val = pty_val.get(language)
> if val:
> words.append(val)
>
> return " ".join(words)
>
> def SearchableText_en (self):
> return self.SearchableText(language="en")
>
> def SearchableText_fr (self):
> return self.SearchableText(language="fr")
>
> This is fairly evil, since it grubs rudely through data structures
> inherited from LocalPropertyManager (part of Localizer). I didn't
> see a clean + efficient way to do this, so I went with rude +
> efficient. ;-(
>
> Also, hard-coding the set of languages into those two wrapper
> methods is Just Wrong. I think I can get around that with a clever
> __getattr__() method, but haven't done that yet.
>
> * finally, I modified the search method to select the index to
> search based on the user's current language. My search form
> looks (roughly) like this:
>
> <form name="searchform" action="search"
> tal:attributes="action string:${portal_url}/search"
method="GET">
> <input id="searchGadget"
> name="text"
> type="text"
> size="15"
> value="">
> </form>
>
> And here's the Python Script that processes this form:
>
> text = context.REQUEST.get("text")
> if text:
> lang = context.Localizer.get_selected_language()
> key = "SearchableText_%s" % lang
> query = {key : text}
> return context.portal_catalog(query)
> else:
> return []
>
> ...and this works fine! There are only two problems left:
>
> * search results are shown in the language that was current when
> the object was cataloged, presumably because of the way ZCatalog
> harvests meta-data at catalog-time. I suspect I can fix this if
> I can persuade ZCatalog to harvest meta-data in all available
> languages.
>
> * searching for words with non-ASCII characters is tricky -- IMHO,
> searching for "francais" should yield the same as searching for
> "français", ie. the index should take care of collapsing accented
> characters somehow. But I'm no linguist -- that might just
> squeak by with accents in French, but whether the same approach
> would work for Nordic å or German ß, I don't know. Anyways,
> this should be up to either the index or the vocabulary -- it's
> not my problem!
>
> --
> Greg Ward <gward@python.net> http://www.gerg.ca/
>
> _______________________________________________
> Zope-CMF maillist - Zope-CMF@zope.org
> http://mail.zope.org/mailman/listinfo/zope-cmf
>
> See http://collector.zope.org/CMF for bug reports and feature requests
>