On Tue, May 08, 2001 at 12:48:10AM +0100, Chris Withers wrote:
- HTML parsing now done using the Strip-o-Gram library.
A few minutes ago, I made a few tests with the html2safehtml function in stripogram.py and found that it is possible to force inclusion of arbitrary tags in the output text. html2safehtml ('Roses <b>are</B> red,<br>violets <i>are</i> blue', valid_tags=['b', 'i', 'br']) returns 'Roses <b>are</b> red,<br>violets <i>are</i> blue' as expected, but html2safehtml ('Roses <b>are</B> red,<br/>violets <i>are</i> blue', valid_tags=['b','i','br']) returns 'Roses <b>are</b> red,<br>>violets <i>are<i> blue' Notice that the (valid for XHTML) '<br/>' becomes '<br>>' and the closing '</i>' at the end comes out as... '<i>'. But it gets more interesting: the result of html2safehtml ('Roses <b>are</B> red,<br/QUACK>violets <i>are</i> blue', valid_tags=['b','i','br']) is 'Roses <b>are</b> red,<br>QUACK>violets <i>are<i> blue' inspiring one to write html2safehtml ('Roses <b>are</B> red,<br/<QUACK>violets <i>are</i> blue', valid_tags=['b','i','br']) getting 'Roses <b>are</b> red,<br><QUACK>violets <i>are<i> blue' or even html2safehtml ('Roses <b>are</B> red,<br/<blink>QUACK<//blink> violets ' '<i>are</i> blue', valid_tags=['b','i','br']) successfully smuggling a <blink>...</blink> inside the result: 'Roses <b>are</b> red,<br><blink>QUACK</blink> violets <i>are</i> blue' (Notice that the closing '</i>' is now OK again, and that I had to use '<//blink>' in order to get '</blink>'. Maybe a problem with sgmllib? I have no time for further tests now... -- jmce: +351 919838775 ~ http://jmce.artenumerica.org/