I have a lot of Chinese XML-files stored in Zope. The internal encoding is UTF-8. Everything was fine in 214, 216 and 220a1. Now with 220b1, some characters are (apparently?) randomly turned into lt;, gt; and the like. Now this looks like some unwanted HTML escaping, but the leading '&' is missing and the characters are definitely all in the range greater 127 (this is a property of UTF8), so there is no direct relationship to the codepoints of >, < and co.
Any ideas what could have gone wrong here?
Yes - during the alpha period we got a bug report concerning the fact that Netscape browsers honor the windows "extended Latin-1" characters \213 and \233 (which are < and >). That means that if you don't filter those as a part of html_quote 'ing then some Netscape versions are open to the same sort of script-kiddie attacks that they would be if the HTML was not quoted at all :( I'm not quite sure what the right answer is here. How are you using the html_quote format in your application? Brian Lloyd brian@digicool.com Software Engineer 540.371.6909 Digital Creations http://www.digicool.com