[Zope-CMF] Re: Document body extraction even with plain text: Bug
or policy?
yuppie
y.2004_ at wcm-solutions.de
Sun Aug 8 08:55:54 EDT 2004
Hi!
Tres Seaver wrote:
> Jens Vagelpohl wrote:
>
>> http://zope.org/Collectors/CMF/214
>>
>> Is the fact that *any* document, even one that explicitly had its
>> format set to something other than "html", gets checked for HTML
>> content and its body extracted? I can't remember if there was some
>> specific policy decided on or if it's just an oversight that can be
>> remedied as described in the issue.
>
>
> As long as the "document_view" moethod HTML quotes the text (if / when
> rendered inside the CMS), it should be OK to leave it in place.
At the first glance I also thought this could be a security issue. But
while bodyfinder removes *some* nasty tags, it does not really produce
safer / cleaner html code.
CookedBody() returns HTML quoted text as long as text_format is not set
to something else than 'plain'. setFormat() currently fails to update
cooked_text, so even if the text_format is changed to 'html' using the
metadata edit form, HTML is still quoted. CookedBody() with a new
stx_level specified re-cooks the body, so in this case text added in
'plain' mode is rendered unquoted after switching to 'structured-text' mode.
Conclusions:
- I'm fine with resolving issue #214 as proposed.
- setFormat() seems to be broken because it fails to trigger re-cooking,
leaving the document in an inconsistent state
- The skin has to make sure users don't add nasty tags if they are
allowed to use 'structured-text' or 'html' mode. It might be necessary
to update the validateHTML.py script or scrubHTML to remove header tags.
- ftp / webdav PUT does not validate the HTML at all, so only trusted
users should be allowed to upload this way.
Cheers,
Yuppie
More information about the Zope-CMF
mailing list