[Zope-dev] Some thoughts on splitter

Sin Hang Kin kentsin@poboxes.com
Sun, 16 Apr 2000 12:12:16 +0800


> This is a good idea, although this might be a bit cumbersome to handle. A
> text filled with these entities will be quite hard to read. For that
reason,
> I thought of using spaces. The display engine could strip out single
spaces
> and reduce sequences of more spaces by one.

NO! ‌ is displayed as zero-width, not thing at all! It is one unicode
char. And it does not seems to be used in other way, so it will be safe to
remove from the file without hurting anyone. It is not easy for the browser
or server to strip out the space as you might think.

> I think the encoding should be responsibility of the user -- After all, he
> knows what he wants to do. Sites that use more than one Han-Encoding could
> go with unicode, other sites might prefer to use the local encoding, since
> there are much more tools that can be used.

Not anymore. As I have already point out, you may have your site filled with
only one encoding, but someone may quote your content which might encoded in
other encoding methods. Also, the user might not have the browser with the
encoding you desire, so they will not able to see your content. Even they
can make conversion and see them, the different in encoding prevent them
from searching them. Building search engines for many different encoding is
certainly not fun. Why limited your client with no good reason? Just convert
your content to unicode before publish. That save you and your client a lots
of time. Your content may last longer (than you want?).

> If Zope starts normalizing the text, some of the users might be surprised
by
> the results.

Normalize for the index, not altering the text. This would be the good
practice which improve the search experience. Unicode have been thinking of
the normalization very hard, and I think IBM have some sample implementation
available at their developerWorks site.
> BUt of course, certainly it should be possible to use Unicode! For this,
we
> will have to wait for Python 1.6, though.

It is ver near.

> Right, but then the user should declare the language. As I explained, if
> this is inherited like the acquisition, they might get away with just one
> declaration for a whole site.

I can not speak for every people, but I think your surfing experience will
tell. I would like to suggest working hard to make things last longer.
Bearing other's mistake and prevent frequent re-write.

Rgs,

Kent Sin