RE: [Zope-dev] Problem with XML in Zope 220b1

13 Jun 2000

      ...
I have a lot of Chinese XML-files stored in Zope. The 
internal encoding is
UTF-8. Everything was fine in 214, 216 and 220a1. Now with 220b1, some
characters are (apparently?) randomly turned into lt;, gt; 
and the like. Now
this looks like some unwanted HTML escaping, but the leading 
'&' is missing
and the characters are definitely all in the range greater 
127 (this is a
property of UTF8), so there is no direct relationship to the 
codepoints of
>, < and co.
Any ideas what could have gone wrong here?
Yes - during the alpha period we got a bug report concerning the fact 
that Netscape browsers honor the windows "extended Latin-1" characters 
\213 and \233 (which are < and >). That means that if you don't filter 
those as a part of html_quote 'ing then some Netscape versions are 
open to the same sort of script-kiddie attacks that they would be if 
the HTML was not quoted at all :(

I'm not quite sure what the right answer is here. How are you using
the html_quote format in your application?

Brian Lloyd        brian@digicool.com
Software Engineer  540.371.6909              
Digital Creations  http://www.digicool.com