From: Fred Drake [mailto:fdrake@gmail.com]
On Fri, 23 Jul 2004 15:15:50 -0400, Passin, Tom <tpassin@mitretek.org> wrote:
It's probably not quite that CDATA sections "aren't meant to protect you from everything." They _are_, at least in xml (where everything means "<" and "&"). Chances are that the browser here is getting an HTML file, or at least thinks it is, and HTML does not really know about CDATA sections.
Hey Tom!
This is definately for text/html and not any sort of XML, which uses a different parser.
The problem here isn't in the browser; it's a matter of how the HTMLParser module from Python's standard library is treating the "</" pair. As I read it, even in a CDATA marked section, "</" is supposed to be recognized as a "delimiter in context." HTMLParser is doing exactly that, but HTML authors are not accustomed to parsers that are more strict than those found in browsers. Thus, this sort of confusion can arise, and especially so when a more browser-like parser was being used to begin with.
Hey, Fred! Don't want to harp on this but the XML Rec does not agree with the notion of "delimiter in context". A CDATA section exists specifically to say "This may look like markup but it isn't". The XML 1.0 Rec says 'Definition: CDATA sections MAY occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup. CDATA sections begin with the string "<![CDATA[" and end with the string "]]>":"' And also it goes on to say this - 'Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using "<" and "&"' I'd say that is pretty definitive, wouldn't you? If there is some folk "knowledge" about CDATA sections built into the parser that thinks otherwise, I'd say the parser is non-conformant about CDATA sections (hmm, I almost wrote that "C-sections"!). Anyway, as I said I have seen this problem myself, and it occurred entirely within the browser (without any CDATA sections) - no server involved - and that's where I would look first. At least in my cases, it must have been the browser's html parser getting confused about the brackets in the javascript. Breaking up the strings as you suggested worked around the problem. Probably has something to do with assumptions being made about where newlines can go relative to markup tokens, I suppose. Cheers, Tom P