zope, latin-1 and accented words
How could I can tell the Splitter of ZCText intedex to not split words as "aaaèbbb" in "aaa" and "bbb"? I would like to tell zope that è,à and so on are alphanumeric letters... In Splitter.c I have: class Splitter: import re rx = re.compile(r"(?L)\w+") ?L match "as the locale", but I have multilingual latin-1 contents... \w would match only [a..z,A..Z]! TIA P.S. I've written a small Class for the ZCTextindex pipeline that convert all the accented characters in non accented ones, so I can index "perchè" as "perche". It would work only if I can solve this splitter problem...
Use TextIndexNG...it is better suited for such purposes. -aj --On 14. Juni 2005 16:54:19 +0200 Yuri <yurj@alfa.it> wrote:
How could I can tell the Splitter of ZCText intedex to not split words as "aaaèbbb" in "aaa" and "bbb"?
I would like to tell zope that è,à and so on are alphanumeric letters... In Splitter.c I have:
class Splitter:
import re rx = re.compile(r"(?L)\w+")
?L match "as the locale", but I have multilingual latin-1 contents... \w would match only [a..z,A..Z]!
TIA
P.S. I've written a small Class for the ZCTextindex pipeline that convert all the accented characters in non accented ones, so I can index "perchè" as "perche". It would work only if I can solve this splitter problem... _______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Yuri wrote at 2005-6-14 16:54 +0200:
How could I can tell the Splitter of ZCText intedex to not split words as "aaaèbbb" in "aaa" and "bbb"?
It may obey the current "locale". Try to set it (correctly) in your Zope configuration file. -- Dieter
participants (3)
-
Andreas Jung -
Dieter Maurer -
Yuri