Patch to check all pages with html-tidy
Hi! Maybe someone is interested in this patch: If you have html-tidy[1] installed, you can apply this patch to lib/python/ZPublisher/HTTPResponse.py to scan every html page with html tidy. Warnings of html-tidy will be displayed in the debug logs. [1]: http://tidy.sourceforge.net/ You apply this patch like this: cd zope/lib/python/ZPublisher cat html-tidy-patch.txt | patch thomas -- Thomas Guettler <guettli@thomas-guettler.de> http://www.thomas-guettler.de
Im not sure I'd want to do this for every request, but maybe we could something like this to ZChecker, which finds bugs and issues in ZPT, DTML etc, including running ZPT through htmltidy? http://www.zope.org/Members/andym/ZChecker Thomas Guettler wrote:
Hi!
Maybe someone is interested in this patch:
If you have html-tidy[1] installed, you can apply this patch to lib/python/ZPublisher/HTTPResponse.py to scan every html page with html tidy.
Warnings of html-tidy will be displayed in the debug logs.
[1]: http://tidy.sourceforge.net/
You apply this patch like this:
cd zope/lib/python/ZPublisher cat html-tidy-patch.txt | patch
thomas
------------------------------------------------------------------------
--- HTTPResponse.py.orig Wed Apr 9 08:37:36 2003 +++ HTTPResponse.py Wed Apr 9 08:26:52 2003 @@ -176,6 +176,46 @@ self.stdout = stdout self.stderr = stderr
+ def html_tidy(self): + """ + Small hack to call html-tidy for every html + page which is serverd by zope. + Call it from lib/python/ZPublisher/HTTPResponse.setBody() + after self.body is set + + if content_type == 'text/html': + self.html_tidy() + """ + import tempfile + import popen2 + ignore=[ + 'Warning: <table> lacks "summary" attribute', + "Can't open", + "Warning: <nobr> is not approved by W3C", + "Warning: inserting missing 'title' element"] + htmlfile=tempfile.mktemp() + fd=open(htmlfile, "wt") + fd.write(self.body) + fd.close() + stdout, stdin = popen2.popen4("tidy -q -errors %s" % htmlfile) + out=stdout.readlines() + os.unlink(htmlfile) + for line in out: + line=line.strip() + cont=0 + for ign in ignore: + if line.find(ign)!=-1: + cont=1 + continue + if cont: + continue + base="unknown base" + if hasattr(self, "base"): + base=self.base + print "HTML-Tidy: %s %s" % ( + self.base, line) + + def retry(self): """Return a response object to be used in a retry attempt """ @@ -329,6 +369,8 @@ body = '>'.join(body.split('\233'))
self.setHeader('content-length', len(self.body)) + if content_type == 'text/html': + self.html_tidy() self.insertBase() if self.use_HTTP_content_compression and \ not self.headers.get('content-encoding',None):
-- Andy McKay
On Wed, Apr 09, 2003 at 11:33:15AM +0100, Andy McKay wrote:
Im not sure I'd want to do this for every request, but maybe we could something like this to ZChecker, which finds bugs and issues in ZPT, DTML etc, including running ZPT through htmltidy?
This patch is of course only usefull for developing with zope. Since it checks in HTTPResponse.py it only can access html and not ZPT or DTML. I found no way to access the requested URL of the HTTPResponse. If this would be possible, you could only check the pages which match a regular expression. thomas BTW, please cut unimporting parts of the email when you reply. -- Thomas Guettler <guettli@thomas-guettler.de> http://www.thomas-guettler.de
Thomas Guettler wrote:
On Wed, Apr 09, 2003 at 11:33:15AM +0100, Andy McKay wrote:
Im not sure I'd want to do this for every request, but maybe we could something like this to ZChecker, which finds bugs and issues in ZPT, DTML etc, including running ZPT through htmltidy?
This patch is of course only usefull for developing with zope. Since it checks in HTTPResponse.py it only can access html and not ZPT or DTML.
I found no way to access the requested URL of the HTTPResponse. If this would be possible, you could only check the pages which match a regular expression.
I'm not quite sure if this really is right, but since I am looking at CookieCrumbler at the moment I thought I'd mention that. CookieCrumbler modifies the response by substituting the unauthorized method, you alter the response by patching in a html_tidy() method. Maybe you can take a look at what CookieCrumbler does and do what you are doing as a product instead of a patch. Should be quite easy and has the benefit of being a product _and_ giving you access to any data which is available inside normal zope (including the requested URL). In the simplest case, one could just drop an instance of the imaginary ZHtmlTidy product into a folder to get anything below that folder checked. cheers, oliver
participants (3)
-
Andy McKay -
Oliver Bleutgen -
Thomas Guettler