Decoding of source for text/xml ZPTs
Hi All, During complication, the XML parser that processes non-HTML mode ZPT's decodes the string of the source into unicode instructions. In HTML mode, the parse does no decoding and so we get string instructions. My question as a result is: what characterset does the XML parser in non-HTML mode assume and can it be controlled in any way? cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Chris Withers wrote:
During complication, the XML parser that processes non-HTML mode ZPT's ^ +- "compilation", I'm guessing, but see below ;)
decodes the string of the source into unicode instructions.
In HTML mode, the parse does no decoding and so we get string instructions.
My question as a result is: what characterset does the XML parser in non-HTML mode assume and can it be controlled in any way?
XML is UTF-8, unless specified in the top-level processing-directive-like thingy the "xml declaration"), e.g.: <?xml version="1.0" encoding="iso-8859-1"b?> *or* unless the transmission channel spells the encoding (the HTTP "Content-type" header, for instance). See Mark Pilrgrim's rant[1] on the "insanely compilated" interactions between the Content-type header and the document encoding. XML files on the filesystem *must* be encoded as UTF-8, or have an explicity encoding in the declaration. [1] http://diveintomark.org/archives/2004/02/13/xml-media-types Tres. - -- =================================================================== Tres Seaver +1 202-558-7113 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDSA0C+gerLs4ltQ4RAhj+AJ0YVYNJVCmS5Nm7aYm3LMLiq0QUjACdHZge 8S/aikU+0/ZCcBrEZu2fV70= =0O2y -----END PGP SIGNATURE-----
participants (2)
-
Chris Withers -
Tres Seaver