Sin Hang Kin wrote:
I'm am averse to the idea of ZCatalog inserting information into documents for its own purposes, I don't think this is good design, and I doubt it's very portable.
It is not ZCatalog to make the insertion. It is a job for the pre-processor. Splitter is only to recognize the as a break point. However, it is designed by unicode which is a non-joiner. I don't see any portability issue here. Just follow what unicode says.
I mean portability across other objects that may want to 'use' the document object. If the object gets invisibly transformed, and other objects don't expect this, things will break. Also, unless the user specificly wants their text to be transformed they many be suprised/angered that their text was normalized to unicode. Absolutly there should be a Splitter that understand unicode, but there should also be a spliter that does not. This is the idea behind having different vocabulary objects for different languages, because they all have different needs. I'm still a bit lost on what the non-joiner is meant for. I understand that it is used in a document to divide words for languages that do not have a discrete word division character (like whitespace) and I understand that if a unicode aware Splitter encoutered them that it should split on that character, but I don't understand why the Splitter (pre-processor) should actively insert the character, I think that if the character was not there to begin with than it is a sneaky transformation to insert that character for the purposes of cataloging.
Sounds like a NormalizingSplitter of sorts. http://www.zope.org/Members/michel/Projects/Interfaces/Splitter
I am new to these : I do not see any way to input new info. Just a jump text box.
It's simple. Lot into Zope.org (as your memeber account). Go to that page, click on 'Edit this page' and edit the page. Click 'Change'. That's it. Just add your comments. Yes, you can wipe the whole thing and cause havoc if you want, but the Wiki is meant to encourage trust. Try it out and put your comments in there, otherwise they won't get captured and when these issues are worked on your ideas will not be considered. -Michel