[Zope-dev] Content Type Meta tag stripping in zope.pagetemplate
Charlie Clark
charlie.clark at clark-consulting.eu
Fri Feb 24 20:57:57 UTC 2012
Am 24.02.2012, 09:47 Uhr, schrieb Miano Njoka <mianonjoka at gmail.com>:
> While it is not essential, it is necessary in some cases where the
> finished document will be read from disk or is used by other
> applications eg. Deliverance[http://packages.python.org/Deliverance/].
> In fact w3c's HTML validator throws a warning that one should declare
> the character encoding in the document itself if it is missing.
This is actually what the validator says:
"""
No character encoding information was found within the document, either in
an HTML meta element or an XML declaration. It is often recommended to
declare the character encoding in the document itself, especially if there
is a chance that the document will be read from or saved to disk, CD, etc.
"""
As ZPT produces XHTML the proper place for any encoding declaration is in
the XML declaration, defaulting to UTF-8, which should throw a validation
error if incorrect. Like much of the HTML standard the meta tags were
never really thought through and, because invisible to the user, all too
often copied mindlessly from one project to another: I have customers
today with completely invalid and misleading meta tags of which they and
the rest of the world is blissfully unware. And as a result browsers - the
main consumers of the format were made fault tolerant - after all the user
often had no idea what was causing the problem or how to rectify it. I
have seen many examples of the server saying one think and the meta
something else entirely. I think nearly all browsers believe what the
server says over what's in the meta tag.
According to MAMA, which was instrumental in developing HTML 5 based on
what has actually been written, the charset was set in the
http-headersover 99 % of the time. Unfortunately, it doesn't contain any
stats on discrepancies between the http-header and the meta.
http://dev.opera.com/articles/view/mama
While there is apparently a possible security risk when not declaring the
charset I think the Pythonic principle of "there should be preferably one
obvious way to do something" should apply when within Zope trying to
decide the charset of a file and that should be well documented. I'd
suggest keeping the stripping but implementing a more rigorous approach
such as you suggest.
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226
More information about the Zope-Dev
mailing list