searching with foreign characters in the ZCatalog
As far as I can tell, using any of the default index types that come with Zope 2 (FieldIndex, TextIndex, whatever) if you do a search for "Jurgen" it will not match "Jürgen". In my opinion, this is correct - "Jurgen" is spelled incorrectly. However, I am having some clients push hard to have it behave differently. From what I've looked at, I haven't seen a way to make FieldIndex ignore or somehow manipulate "special" characters, nor have I seen any products that provide a new index type to specifically deal with this. Does anyone know if there is something out there to help with this? Eric -------------------------------------------------------- Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.
--On 10. Oktober 2007 15:42:05 -0400 "Wohnlich, Eric (IMS)" <WohnlichE@imsweb.com> wrote:
As far as I can tell, using any of the default index types that come with Zope 2 (FieldIndex, TextIndex, whatever) if you do a search for "Jurgen" it will not match "Jürgen". In my opinion, this is correct - "Jurgen" is spelled incorrectly. However, I am having some clients push hard to have it behave differently. From what I've looked at, I haven't seen a way to make FieldIndex ignore or somehow manipulate "special" characters, nor have I seen any products that provide a new index type to specifically deal with this. Does anyone know if there is something out there to help with this?
Look at TextIndexNG3 and its normalization support. -aj
Wohnlich, Eric (IMS) wrote at 2007-10-10 15:42 -0400:
As far as I can tell, using any of the default index types that come with Zope 2 (FieldIndex, TextIndex, whatever) if you do a search for "Jurgen" it will not match "Jürgen". In my opinion, this is correct - "Jurgen" is spelled incorrectly.
There are two ways to handle this: * either by normalization: you transform your words into a normal form and index this. You make the same normalization for search terms. This way, any two search terms with the same normalization are equivalent. In your case, your normalization could replace "ü" by "u". * by an expansion of your search terms to search as well for "similar" words. This technique is often used to search for words that sound similarly but can be used for other purposes as well. Applied to your case: when someone searches for "Jurgen" you would effectively replace it with "Jurgen or Jürgen". -- Dieter
participants (3)
-
Andreas Jung -
Dieter Maurer -
Wohnlich, Eric (IMS)