Two issues for Z2.2: XHTML & malicious tags
Zopistas, I have two issues that I think should be rectified before the 2.2 release. I am not very up to date on the goings-on in the list, but a quick search gave me little information on the following: 1. The HTML that Zope outputs is not very standards-compliant (XHTML 1.0) at the moment. Tags like <img /> are rendered as <IMG> etc. I would like to contribute to the cleanup work, but I am a relative newcomer to CVS. How can I participate? Do I just check out the relevant files, modify them and then get somebody that is authorized to review the files and put them back in the CVS? I am not a guru when it comes to Python, but I know enough to not mess up things along the way :) And just to clarify: I'm not talking about a rewrite of the Zope management console here, that will work fine until the Mozilla version comes along - I merely want to make sure that tags are lowercase and terminated, so pages produced by Zope stand a chance when passing through the W3C validator. I always write compliant code when I have the chance, but when using Zope this is impossible, as the tags that are inserted invalidate the page anyway. So, who do I bribe to have a shot at the CVS? :) 2. Malicious HTML tags - is anything being done here? Filtering of these is one of the features Zope 2.2 really shouldn't go without. Most Zope sites have user interaction in some way, and the concept of a post containing a stray </html>, or even worse - script-tags, destroying a page is totally unacceptable IMHO. I'd just like to query what the status is on this, as I think it is one of the most overlooked areas that are lacking in Zope. I know Evan Simpson (malicious tags) and Christopher Petrilli (HTML quality of zope) have been talking about this earlier, any comments? I'm really looking forward to Zope 2.2, the alpha release looks good so far. You guys rock :) Regards, Alexander Limi.
On Fri, Jun 02, 2000 at 06:50:46PM +0200, Alexander Limi wrote:
1. The HTML that Zope outputs is not very standards-compliant (XHTML 1.0) at the moment. Tags like <img /> are rendered as <IMG> etc. I would like to
Perhaps support for XHTML-compliance in Zope should be optional. I wrote a few pages in XHTML some time ago, had no problems with Netscape 4.xx, but was later informed of a few problems by some users. Some still abundant browser versions were particularly not grokking (and thus showing the user) the processing instruction <?xml version="1.0" encoding="ISO-8859-1"?> For generally accessible pages, I'm therefore still using HTML 4.0. This is mentioned on http://www.w3.org/TR/xhtml1/: Be aware that processing instructions are rendered on some user agents. However, also note that when the XML declaration is not included in a document, the document can only use the default character encodings UTF-8 or UTF-16. Anyway, tags like <img /> are considered valid HTML 4.0 by the W3C validator, so they would not cause validation "noise" for those still using HTML 4.0. -- jmce: +351 919838775 ~ http://artenumerica.com/ ~ http://artenumerica.org/
----- Original Message ----- From: Alexander Limi <alexander@limi.net>
2. Malicious HTML tags - is anything being done here? Filtering of these is one of the features Zope 2.2 really shouldn't go without. Most Zope sites have user interaction in some way, and the concept of a post containing a stray </html>, or even worse - script-tags, destroying a page is totally unacceptable IMHO. I'd just like to query what the status is on this, as I think it is one of the most overlooked areas that are lacking in Zope.
I know Evan Simpson (malicious tags) and Christopher Petrilli (HTML quality of zope) have been talking about this earlier, any comments?
I've got a rather crude module going which parses an input string for HTML-ish tags. It allows only tags from an explicit list, and ensures that non-empty tags are closed (either by complaining or adding closing tags). If 'script' is not one of the allowed tags, it also disallows all "On*" attributes and "javascript:*" attribute values in any tag. Unfortunately, it isn't very efficient (based on sgmllib.py) and is rather crude. I had wanted to make it use SAX to do the parsing, so that sgmlop or another high-performance library could be plugged in, but never got there. Also, it has no DTML-level interface; you'd have to wrap it in an External Method to use it from DTML. I've gone ahead and put it up at http://www.zope.org/Members/4am/SafeHTML to see if anyone can make anything of it. Cheers, Evan @ digicool & 4-am
Evan Simpson:
I've got a rather crude module going which parses an input string for HTML-ish tags. It allows only tags from an explicit list, and ensures that non-empty tags are closed (either by complaining or adding closing tags). If 'script' is not one of the allowed tags, it also disallows all "On*" attributes and "javascript:*" attribute values in any tag.
Unfortunately, it isn't very efficient (based on sgmllib.py) and is rather crude. I had wanted to make it use SAX to do the parsing, so that sgmlop or another high-performance library could be plugged in, but never got there. Also, it has no DTML-level interface; you'd have to wrap it in an External Method to use it from DTML.
I've gone ahead and put it up at http://www.zope.org/Members/4am/SafeHTML to see if anyone can make anything of it.
This looks a lot like the code I have lying around, only yours is more comprehensive and user friendly :) Anyway, I assume you are familiar with SAX for Python? http://www.stud.ifi.uio.no/~lmariusg/download/python/xml/saxlib.html It supports sgmlop, like you mentioned. Your code will do beautifully for our project, we are not dependant upon fast code in that specific part. Thanks a lot. Now, can somebody tell me how to help Zope with spitting out XHTML 1.0-compliant tags? :] -- Alexander Limi alexander@limi.net
participants (3)
-
Alexander Limi -
Evan Simpson -
J M Cerqueira Esteves