Re: [Zope] Pre-indexing filter and accented letters
Please stay on the list -- readded... Yuri wrote at 2005-6-9 13:18 +0200:
Please read carefully the ZCatalog chapter of the Zope Book, when you do not understand why using a new name can help you with this...
http://www.plope.com/Books/2_7Edition/SearchingZCatalog.stx
there's no mention of Indexes as "NormalizedSearchableText". There's SearchableText, but It is not related to the topic...
Of course, "NormalizedSearchableText" is *not* mentioned. It is no predefined index. Instead, you should create it. Please reread the chapter again. You are looking for the (general) description how the catalog interacts with the object to determine for which values it should index the object. Once you have understood that, you will understand my proposal to solve your problem...
... I mean, I know that chapter, I know Zcatalog. What I want is prefilter an existing, named, index.
You cannot prefilter an existing index (I told you already!). You must create a new one, define a script with the name of the new index and there do your normalization. You can trust me (in this regard) ... If you use a "ZCTextIndex", then you can keep the "SearchableText" name for the (new!) index. In this case, you must use the name of your normalizing script as "Indexed attributes" in the definition of your "ZCTextIndex". -- Dieter
ZCTextindex to search and catalog accented words as non accented. - Step 1 Add to Lexicon.py (around line 190) this code, which filters the things in the pipeline: --- class RemoveAccented: def filter_word(self, w): """ filter the non ascii letters to ascii""" # trasformo la stringa w in unicode... parola = unicode(w,'latin-1') xlate={0xc0:'A', 0xc1:'A', 0xc2:'A', 0xc3:'A', 0xc4:'A', 0xc5:'A', 0xc6:'Ae', 0xc7:'C', 0xc8:'E', 0xc9:'E', 0xca:'E', 0xcb:'E', 0xcc:'I', 0xcd:'I', 0xce:'I', 0xcf:'I', 0xd0:'Th', 0xd1:'N', 0xd2:'O', 0xd3:'O', 0xd4:'O', 0xd5:'O', 0xd6:'O', 0xd8:'O', 0xd9:'U', 0xda:'U', 0xdb:'U', 0xdc:'U', 0xdd:'Y', 0xde:'th', 0xdf:'ss', 0xe0:'a', 0xe1:'a', 0xe2:'a', 0xe3:'a', 0xe4:'a', 0xe5:'a', 0xe6:'ae', 0xe7:'c', 0xe8:'e', 0xe9:'e', 0xea:'e', 0xeb:'e', 0xec:'i', 0xed:'i', 0xee:'i', 0xef:'i', 0xf0:'th', 0xf1:'n', 0xf2:'o', 0xf3:'o', 0xf4:'o', 0xf5:'o', 0xf6:'o', 0xf8:'o', 0xf9:'u', 0xfa:'u', 0xfb:'u', 0xfc:'u', 0xfd:'y', 0xfe:'th', 0xff:'y', 0xa1:'!', 0xa2:'{cent}', 0xa3:'{pound}', 0xa4:'{currency}', 0xa5:'{yen}', 0xa6:'|', 0xa7:'{section}', 0xa8:'{umlaut}', 0xa9:'{C}', 0xaa:'{^a}', 0xab:'<<', 0xac:'{not}', 0xad:'-', 0xae:'{R}', 0xaf:'_', 0xb0:'{degrees}', 0xb1:'{+/-}', 0xb2:'{^2}', 0xb3:'{^3}', 0xb4:"'", 0xb5:'{micro}', 0xb6:'{paragraph}', 0xb7:'*', 0xb8:'{cedilla}', 0xb9:'{^1}', 0xba:'{^o}', 0xbb:'>>', 0xbc:'{1/4}', 0xbd:'{1/2}', 0xbe:'{3/4}', 0xbf:'?', 0xd7:'*', 0xf7:'/' } r = '' for i in parola: if xlate.has_key(ord(i)): r += xlate[ord(i)] elif r += str(i) return r def process(self, lst): return [self.filter_word(w) for w in lst] element_factory.registerFactory('Remove Accented', 'Remove Accented', RemoveAccented) --- Step 2 Add the locale support for a latin-1 language, I added -L it_IT to zope start (in 2.7 you have to enable it in etc/zope.conf) Then you can search for "aççented" and find "accented" ;-)
participants (2)
-
Dieter Maurer -
Yuri