[Zope-CMF] Re: Document body extraction even with plain text: Bug or policy?

Sun Aug 8 08:55:54 EDT 2004

Hi!

Tres Seaver wrote:
> Jens Vagelpohl wrote:
> 
>> http://zope.org/Collectors/CMF/214
>>
>> Is the fact that *any* document, even one that explicitly had its 
>> format set to something other than "html", gets checked for HTML 
>> content and its body extracted? I can't remember if there was some 
>> specific policy decided on or if it's just an oversight that can be 
>> remedied as described in the issue.
> 
> 
> As long as the "document_view" moethod HTML quotes the text (if / when 
> rendered inside the CMS), it should be OK to leave it in place.

At the first glance I also thought this could be a security issue. But 
while bodyfinder removes *some* nasty tags, it does not really produce 
safer / cleaner html code.

CookedBody() returns HTML quoted text as long as text_format is not set 
to something else than 'plain'. setFormat() currently fails to update 
cooked_text, so even if the text_format is changed to 'html' using the 
metadata edit form, HTML is still quoted. CookedBody() with a new 
stx_level specified re-cooks the body, so in this case text added in 
'plain' mode is rendered unquoted after switching to 'structured-text' mode.

Conclusions:

- I'm fine with resolving issue #214 as proposed.

- setFormat() seems to be broken because it fails to trigger re-cooking, 
leaving the document in an inconsistent state

- The skin has to make sure users don't add nasty tags if they are 
allowed to use 'structured-text' or 'html' mode. It might be necessary 
to update the validateHTML.py script or scrubHTML to remove header tags.

- ftp / webdav PUT does not validate the HTML at all, so only trusted 
users should be allowed to upload this way.

Cheers,
	Yuppie