[Zope] Strip all HTML
ken at practical.org
ken at practical.org
Thu Aug 7 15:22:26 EDT 2003
Chris Withers wrote:
>Are there any other tags where the content should be removed?
AFAICT, the HTML elements which need to be removed together with their content are: style, script, noscript and noframes. At least those are the most common non-proprietary ones.
My strategy was to transform the opening tag into '<!--' and closing one into '-->', and then get rid of '<!--.*?-->', but there must be a more clever way.
I would love to have a fix for Dieter's CatalogSupport.py, since that module was intended for my first use case: to prevent indexing of irrelevant markup; it is already used by the DocumentLibrary product.
My other use case, the display of a text-only version of a web page, also requires removal of all markup and markup-related content.
Is there a reason for any of the HTML conversion modules *not* to incorporate this addition? I am just surprised that no one has reported it as a problem. Thanks to those who are contributing to this thread!
Ken
More information about the Zope
mailing list