[ZPT] OT (and probably a bit long ;-) HTML Filtering
Guido van Rossum
guido@digicool.com
Wed, 16 May 2001 10:32:47 -0500
> > When parsing the following HTML:
> >
> > 'Roses <b>are</B> red,<br/>violets <i>are</i> blue'
> >
> > ...with the following class:
> >
> > class HTML2SafeHTML(sgmllib.SGMLParser):
[proof of broken parser skipped]
>
> Anyway, Ethan pointed out that you guys have probably got quite good at this
> sort of thing while developing ZPT...
>
> So, how should I be approaching this problem?
What *we* did was to rewrite the html parser from the ground up. You
can download TAL from
http://www.zope.org/Members/4am/ZPT/TAL-1.2.1.tar.gz/view
and look at HTMLParser.py.
You could also submit a bug report to Python's bug tracker so we can
fix sgmllib in the next release:
http://sourceforge.net/bugs/?group_id=5470
--Guido van Rossum (home page: http://www.python.org/~guido/)