[Zope-dev] Problem with XML in Zope 220b1
Brian Lloyd
Brian@digicool.com
Tue, 13 Jun 2000 15:43:16 -0400
> I have a lot of Chinese XML-files stored in Zope. The
> internal encoding is
> UTF-8. Everything was fine in 214, 216 and 220a1. Now with 220b1, some
> characters are (apparently?) randomly turned into lt;, gt;
> and the like. Now
> this looks like some unwanted HTML escaping, but the leading
> '&' is missing
> and the characters are definitely all in the range greater
> 127 (this is a
> property of UTF8), so there is no direct relationship to the
> codepoints of
> >, < and co.
>
> Any ideas what could have gone wrong here?
Yes - during the alpha period we got a bug report concerning the fact
that Netscape browsers honor the windows "extended Latin-1" characters
\213 and \233 (which are < and >). That means that if you don't filter
those as a part of html_quote 'ing then some Netscape versions are
open to the same sort of script-kiddie attacks that they would be if
the HTML was not quoted at all :(
I'm not quite sure what the right answer is here. How are you using
the html_quote format in your application?
Brian Lloyd brian@digicool.com
Software Engineer 540.371.6909
Digital Creations http://www.digicool.com