Pre-indexing filter and accented letters
I would like to index a text property of an object in the ZCatalog. The text is in French language, but I have a problem: I have to find results for the related non accented letters! I mean, If I do a search for "actualite", the index should return also the object which text contains "actualitè". I cannot convert the index to textindexNG, now it is TextIndex (at least, I can covert it to ZCTextIndex). An idea could be, for example, to convert the text before it get indexed... Where should I look in the code? Can it be possible? Any other suggestion? :) TIA!
Yuri wrote at 2005-6-6 11:56 +0200:
I would like to index a text property of an object in the ZCatalog. The text is in French language, but I have a problem: I have to find results for the related non accented letters!
I mean, If I do a search for "actualite", the index should return also the object which text contains "actualitè".
Implement a PythonScript that performs the normalization of "context.SearchableText()", say "NormalizedSearchableText". Ensure, it is acquirable by your indexed objects. Index "NormalizedSearchableText" rather than "SearchableText" and use this index for your searches. Ensure, that you perform the same normalization on search terms before you use them in a query. By the way, "ManagableIndex" greatly facilitates the inclusion of normalizers. However, it currently does not interface with a "TextIndex" (only a "WordIndex"). <http://www.dieter.handshake.de/pyprojects/zope> -- Dieter
Dieter Maurer ha scritto:
Yuri wrote at 2005-6-6 11:56 +0200:
I would like to index a text property of an object in the ZCatalog. The text is in French language, but I have a problem: I have to find results for the related non accented letters!
I mean, If I do a search for "actualite", the index should return also the object which text contains "actualitè".
Implement a PythonScript that performs the normalization of "context.SearchableText()", say "NormalizedSearchableText".
Ensure, it is acquirable by your indexed objects.
Index "NormalizedSearchableText" rather than "SearchableText" and use this index for your searches.
Ensure, that you perform the same normalization on search terms before you use them in a query.
Weel, I cannot change the index, it already has his name... it is a collection of thousands of object, this one I want to pre-filter before index are just a small part... Or you mean I have to do something about SearchableText()? I have to index in a way the user find the term even if it does not use accented letters on a current index that already has indexed thousands of objects... Can I hook somewhere in the middle, so I Index them in the way I want? :)
By the way, "ManagableIndex" greatly facilitates the inclusion of normalizers. However, it currently does not interface with a "TextIndex" (only a "WordIndex").
I'll take a look, thanks :)
Yuri wrote at 2005-6-7 10:37 +0200:
...
Implement a PythonScript that performs the normalization of "context.SearchableText()", say "NormalizedSearchableText".
Ensure, it is acquirable by your indexed objects.
Index "NormalizedSearchableText" rather than "SearchableText" and use this index for your searches.
Ensure, that you perform the same normalization on search terms before you use them in a query.
Weel, I cannot change the index, it already has his name... it is a collection of thousands of object, this one I want to pre-filter before index are just a small part...
But your index currently has unnormalized values. Thus, you must rebuild it. When you rebuild it, you can also give it a different name.
Or you mean I have to do something about SearchableText()?
Yes, replace it by "NormalizedSearchableText".
I have to index in a way the user find the term even if it does not use accented letters on a current index that already has indexed thousands of objects...
I have understood that... And my advice applied to precisely this situation...
Can I hook somewhere in the middle, so I Index them in the way I want? :)
You can (and must) normalized the search terms. However, the indexed values need be normalized, too. Almost surely, there are not now. This means, rebuilding the index -- this time with normalization... -- Dieter
Dieter Maurer ha scritto:
Yuri wrote at 2005-6-7 10:37 +0200:
...
Implement a PythonScript that performs the normalization of "context.SearchableText()", say "NormalizedSearchableText".
Ensure, it is acquirable by your indexed objects.
Index "NormalizedSearchableText" rather than "SearchableText" and use this index for your searches.
Ensure, that you perform the same normalization on search terms before you use them in a query.
Weel, I cannot change the index, it already has his name... it is a collection of thousands of object, this one I want to pre-filter before index are just a small part...
But your index currently has unnormalized values. Thus, you must rebuild it.
I don't need it for other objects I already have. But, as a bonus, it would not be so bad, so it is not really a problem :)
When you rebuild it, you can also give it a different name.
Why? I usually gave it the name of the form input I want to index... I though just to index the new objects... but I miss the picture maybe, what is so important with the names "NormalizedSearchableText" and "SearchableText"?
Or you mean I have to do something about SearchableText()?
Yes, replace it by "NormalizedSearchableText".
How? :-? Maybe I miss some overloading or acquisition?
Can I hook somewhere in the middle, so I Index them in the way I want? :)
You can (and must) normalized the search terms. However, the indexed values need be normalized, too.
Ok
Almost surely, there are not now. This means, rebuilding the index -- this time with normalization...
And how do I add it? Just creating the python script and using acquisition? How does it work? :P
Yuri wrote at 2005-6-8 10:11 +0200:
...
When you rebuild it, you can also give it a different name.
Why? I usually gave it the name of the form input I want to index...
Because you want to include a processing step. Please read carefully the ZCatalog chapter of the Zope Book, when you do not understand why using a new name can help you with this...
I though just to index the new objects... but I miss the picture maybe, what is so important with the names "NormalizedSearchableText" and "SearchableText"?
Read the chapter mentioned above. Come back when you then have more questions... -- Dieter
participants (2)
-
Dieter Maurer -
Yuri