[Zope-CMF] Re: Searching multilingual CMF sites
Greg Ward
gward@python.net
Fri, 7 Mar 2003 17:22:51 -0500
On 07 March 2003, To nuxeo-localizer@nongnu.org said:
> [sorry for posting to two lists, but I'm really not sure if the
> Localizer community or the CMF community is the right place to ask!]
>
> What's the best way to implement searching on a multilingual site? I've
> got a CMF site with bilingual content up-and-running thanks to
> Localizer, and managed to cobble together a fairly functional "search"
> box by stealing some scripts from Plone. But it gets weird when you
> cross language boundaries.
OK, I'm well on the way to solving this problem. Thought I'd share my
approach for posterity -- future archive-searchers will no doubt thank
me. ;-)
Turns out this was a CMF/Zope question; Localizer barely enters into it.
(It's only needed to find out the user's current language at search
time.) Here's what I did:
* in $portal/portal_catalog, create two new vocabularies:
vocab_en and vocab_fr
* then pop over to the "Indexes" tab and create two new indeces:
SearchableText_en and SearchableText_fr. Use the corresponding
language-specific vocabulary in each index.
* I already had a SearchableText() method in LocDublinCore,
which all of my content classes inherit from (shamelessly
stolen from Rainer Thaden's LocCMFProduct); I extended it to
have a language-neutral mode and language-specific modes,
then added trivial SearchableText_en() and SearchableText_fr(
wrappers. Here's the code:
def SearchableText (self, language=None):
words = []
for pty in self._local_properties.keys():
pty_val = self._local_properties[pty]
if language is None: # index all languages
for (lang, val) in pty_val.items():
if lang and val:
words.append(val)
else: # only index selected language
val = pty_val.get(language)
if val:
words.append(val)
return " ".join(words)
def SearchableText_en (self):
return self.SearchableText(language="en")
def SearchableText_fr (self):
return self.SearchableText(language="fr")
This is fairly evil, since it grubs rudely through data structures
inherited from LocalPropertyManager (part of Localizer). I didn't
see a clean + efficient way to do this, so I went with rude +
efficient. ;-(
Also, hard-coding the set of languages into those two wrapper
methods is Just Wrong. I think I can get around that with a clever
__getattr__() method, but haven't done that yet.
* finally, I modified the search method to select the index to
search based on the user's current language. My search form
looks (roughly) like this:
<form name="searchform" action="search"
tal:attributes="action string:${portal_url}/search" method="GET">
<input id="searchGadget"
name="text"
type="text"
size="15"
value="">
</form>
And here's the Python Script that processes this form:
text = context.REQUEST.get("text")
if text:
lang = context.Localizer.get_selected_language()
key = "SearchableText_%s" % lang
query = {key : text}
return context.portal_catalog(query)
else:
return []
...and this works fine! There are only two problems left:
* search results are shown in the language that was current when
the object was cataloged, presumably because of the way ZCatalog
harvests meta-data at catalog-time. I suspect I can fix this if
I can persuade ZCatalog to harvest meta-data in all available
languages.
* searching for words with non-ASCII characters is tricky -- IMHO,
searching for "francais" should yield the same as searching for
"français", ie. the index should take care of collapsing accented
characters somehow. But I'm no linguist -- that might just
squeak by with accents in French, but whether the same approach
would work for Nordic å or German ß, I don't know. Anyways,
this should be up to either the index or the vocabulary -- it's
not my problem!
--
Greg Ward <gward@python.net> http://www.gerg.ca/