Hi. I just read Amos great article at http://www.xml.com/pub/1999/12/zope/index.html I downloaded Sample.zexp and XMLDocument-1.0a4.tgz, and started to create some sample pages of my own. Maybe I'm wrong, because I don't know much xml, but XMLDocument-1.0a4.tgz seems to have problems handling the swedish characters åäö. This document: <?xml version="1.0"?> <faq> <entry> <test>Långt svårt öppningshål</test> </entry> </faq> displays this error : XML Parsing Error: not well-formed at line 4 when I try to save it in Zope. Change the text to "Långt svårt oppningshål" and it says: XML Parsing Error: mismatched tag at line 5 "Långt svårt oppningshal" saves just ok... /Magnus Heino
FYI. As far as I have had time to test this, it seems to comedown to the pyexpat-parser. Tests with the pyexpattest.py scripts on the document breaks in exactly the sameway as reported by Magnus Heino. Best regards, Johan Carlsson
Hi.
I just read Amos great article at http://www.xml.com/pub/1999/12/zope/index.html
I downloaded Sample.zexp and XMLDocument-1.0a4.tgz, and started to create some sample pages of my own.
Maybe I'm wrong, because I don't know much xml, but XMLDocument-1.0a4.tgz seems to have problems handling the swedish characters åäö.
This document:
<?xml version="1.0"?> <faq> <entry> <test>Långt svårt öppningshål</test> </entry> </faq>
displays this error :
XML Parsing Error: not well-formed at line 4
when I try to save it in Zope.
Change the text to "Långt svårt oppningshål" and it says:
XML Parsing Error: mismatched tag at line 5
"Långt svårt oppningshal" saves just ok...
/Magnus Heino
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
I'm not much of an expert on character sets, but the default character encoding in XML is UTF-8, not ISO 8859 as with HTML. Therefore, entering ISO characters will not translate correctly, and this might account for your problems. To specify that the document contents is encoded in ISO 8859-1 (ie., ISO Latin 1), modify the document heading to say: <?xml version="1.0" encoding="ISO-8859-1"?> Cheers, Alexander Staubo mailto:alex@mop.no http://www.mop.no/~alex/
-----Original Message----- From: zope-admin@zope.org [mailto:zope-admin@zope.org]On Behalf Of Johan Carlsson Sent: Monday, December 20, 1999 2:09 PM To: magnus@vuab.net; amos@digicool.com Cc: zope@zope.org Subject: RE: [Zope] Swedish characters and XMLDocument-1.0a4
FYI. As far as I have had time to test this, it seems to comedown to the pyexpat-parser.
Tests with the pyexpattest.py scripts on the document breaks in exactly the sameway as reported by Magnus Heino.
Best regards, Johan Carlsson
Hi.
I just read Amos great article at http://www.xml.com/pub/1999/12/zope/index.html
I downloaded Sample.zexp and XMLDocument-1.0a4.tgz, and started to create some sample pages of my own.
Maybe I'm wrong, because I don't know much xml, but XMLDocument-1.0a4.tgz seems to have problems handling the swedish characters åäö.
This document:
<?xml version="1.0"?> <faq> <entry> <test>Långt svårt öppningshål</test> </entry> </faq>
displays this error :
XML Parsing Error: not well-formed at line 4
when I try to save it in Zope.
Change the text to "Långt svårt oppningshål" and it says:
XML Parsing Error: mismatched tag at line 5
"Långt svårt oppningshal" saves just ok...
/Magnus Heino
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
I'm not much of an expert on character sets, but the default character encoding in XML is UTF-8, not ISO 8859 as with HTML. Therefore, entering ISO characters will not translate correctly, and this might account for your problems.
To specify that the document contents is encoded in ISO 8859-1 (ie., ISO Latin 1), modify the document heading to say:
<?xml version="1.0" encoding="ISO-8859-1"?>
Well... I add that into the document: <?xml version="1.0" encoding="ISO-8859-1"?> <faq> <entry> <test>Långt svårt öppningshål</test> </entry> </faq> Then I hit "Change", and the saved version will look like this: <?xml version="1.0"?> <faq> <entry> <test>LÃ¥ngt och svÃ¥rt öppningshÃ¥l</test> </entry> </faq> Encoding is removed, and I cant say the text is what I want it to be :-P Time to buy a XML-book... or maybe it is a bug? I dont know. /Magnus Heino
Looks like the XML generation code doesn't like non-UTF encodings, and converts any non-UTF characters back to UTF-8 (which is what the ugly noise you quoted is). This isn't entirely _incorrect_ -- technically, the document's contents is still the same as what you put in -- but it certainly isn't _right_. Alexander Staubo mailto:alex@mop.no http://www.mop.no/~alex/
-----Original Message----- From: magnus@vuab.net [mailto:magnus@vuab.net] Sent: Monday, December 20, 1999 2:39 PM To: Alexander Staubo Cc: johanc@torped.se; amos@digicool.com; zope@zope.org Subject: Re: [Zope] Swedish characters and XMLDocument-1.0a4
I'm not much of an expert on character sets, but the default character encoding in XML is UTF-8, not ISO 8859 as with HTML. Therefore, entering ISO characters will not translate correctly, and this might account for your problems.
To specify that the document contents is encoded in ISO 8859-1 (ie., ISO Latin 1), modify the document heading to say:
<?xml version="1.0" encoding="ISO-8859-1"?>
Well... I add that into the document:
<?xml version="1.0" encoding="ISO-8859-1"?> <faq> <entry> <test>Långt svårt öppningshål</test> </entry> </faq>
Then I hit "Change", and the saved version will look like this:
<?xml version="1.0"?> <faq> <entry> <test>Långt och svårt öppningshål</test> </entry> </faq>
Encoding is removed, and I cant say the text is what I want it to be :-P
Time to buy a XML-book... or maybe it is a bug? I dont know.
/Magnus Heino
Here's some findings and suggestions: A good idea would be to use the ACCEPT-CHARSET attribute of the FORM element control what charset should be uses. e.g. <FORM ACCEPT-CHARSET=%Charsets> Converting to UTF-8 automatically isn't a good solution because neither Internet Explorer nor Navigator supports ACCEPT-CHARSET="UTF-8". Which suggest for defaulting to ISO-8859-1 or the CHARSET used by: <META http-equiv="Content-Type" content="text/html; charset=CHARSET">. [Patch files for UTF-8 edit files for XML Document included] The problem with <META charset="UTF-8"> is that Netscape doesn't handle it to well. Internet Explorer handles it just fine. (Magnus original example document works great with <META charset="UTF-8"> and Internet Explorer, Netscape almost handles it.) The problems found in Netscape also indicates that XML Documents should default to ISO-8859-1 to work properly. At least for now. Regards, Johan Carlsson
Looks like the XML generation code doesn't like non-UTF encodings, and converts any non-UTF characters back to UTF-8 (which is what the ugly noise you quoted is). This isn't entirely _incorrect_ -- technically, the document's contents is still the same as what you put in -- but it certainly isn't _right_.
Alexander Staubo mailto:alex@mop.no http://www.mop.no/~alex/
I'm not much of an expert on character sets, but the default character encoding in XML is UTF-8, not ISO 8859 as with HTML. Therefore, entering ISO characters will not translate correctly, and this might account for your problems.
To specify that the document contents is encoded in ISO 8859-1 (ie., ISO Latin 1), modify the document heading to say:
<?xml version="1.0" encoding="ISO-8859-1"?>
Well... I add that into the document:
<?xml version="1.0" encoding="ISO-8859-1"?> <faq> <entry> <test>Långt svårt öppningshål</test> </entry> </faq>
Then I hit "Change", and the saved version will look like this:
<?xml version="1.0"?> <faq> <entry> <test>Långt och svårt öppningshål</test> </entry> </faq>
Encoding is removed, and I cant say the text is what I want it to be :-P
Time to buy a XML-book... or maybe it is a bug? I dont know.
/Magnus Heino
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
The problem with swedish characters (and other non us-ascii characters) is that XML-Documents (and probably the whole XML implementation in Zope) defaults to UTF-8 (which Expat defaults to). UTF-8 is not compatibel with ISO-8859-1 which Python uses (and there for Zope). Using encoding definitions in the XML tag doesn't work in Zope XML-Document, and probably not in any other part och Zope's XML implementations. e.g. <?xml version="1.0" encoding="iso-8859-1"?> fails in Zope's XML-Document. With Expat on the other hand, the encoding works just fine. XML Documents need to support the encoding attribute to be useful for building anything other then English speeking websites. Best regards, Johan Carlsson
FYI. As far as I have had time to test this, it seems to comedown to the pyexpat-parser.
Tests with the pyexpattest.py scripts on the document breaks in exactly the sameway as reported by Magnus Heino.
Best regards, Johan Carlsson
Hi.
I just read Amos great article at http://www.xml.com/pub/1999/12/zope/index.html
I downloaded Sample.zexp and XMLDocument-1.0a4.tgz, and started to create some sample pages of my own.
Maybe I'm wrong, because I don't know much xml, but XMLDocument-1.0a4.tgz seems to have problems handling the swedish characters åäö.
This document:
<?xml version="1.0"?> <faq> <entry> <test>Långt svårt öppningshål</test> </entry> </faq>
displays this error :
XML Parsing Error: not well-formed at line 4
when I try to save it in Zope.
Change the text to "Långt svårt oppningshål" and it says:
XML Parsing Error: mismatched tag at line 5
"Långt svårt oppningshal" saves just ok...
/Magnus Heino
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
participants (3)
-
Alexander Staubo -
Johan Carlsson -
Magnus Heino