[Zope] Re: Epoz and Tidy

Cyrille Bonnet cyrille at 3months.com
Thu May 5 17:11:29 EDT 2005


I agree with you, Duncan, the tidy up can not be much more aggressive by 
default. And Kupu probably does the best possible job there.

Now, the button "Clean this up" is a good idea, I think. Did you get 
started on this? I am happy to help if you do develop that feature.

Also, another option for users that need to convert a lot of Word 
documents is, of course, WebDAV + PortalTransform.

Cheers

Cyrille

Duncan Booth wrote:
> Cyrille Bonnet wrote:
> 
> 
>>Daniel Dekany wrote:
>>
>>>BTW, anybody has found a solution for fixing HTML copy-pasted from
>>>Microsoft Word (mostly 2000/XP)? Lot of users has MS Word, and the
>>>HTML pasted from it is a CSS killer mess. I tried mxTidy but it
>>>didn't improved substantially the HTML. So how do you guys do it? I
>>>have looked after solutions for Epoz, but didn't found any. But I
>>>don't stick to Epoz... if there is a solution already for Kupu (is
>>>Kupu already recommended over Epoz anyway?). Certainly the solution
>>>would be an Epoz post-tidy Python script, but I didn't found any for
>>>Word tidying. (However, the ideal would be if the HTML is tidied
>>>right on the client when it pastes it in -- thus user would really
>>>get what it sees, i.e. the HTML wouldn't be changed when he saves it.
>>>That effect is really evil.)
>>>
>>>
>>
>>As Shane pointed out, there is a tidy up in Kupu. However, in my 
>>experience, it is not a very good tidy up (if I remember correctly, a 
>>lot of tags are still there after the tidy up).
>>
> 
> Unfortunately there is a fine line between tidying up the cruft pasted from 
> Word, and not stripping out things which might actually have been entered 
> legitimately. I think Kupu does this pretty well (but then I'm a bit 
> biased), but without any way to detect that the user is pasting from Word I 
> don't see how much more could be stripped.
> 
> So far as I know the only thing which doesn't really get stripped from the 
> pasted Word text are the mso classnames. These can be manually blacklisted, 
> but I never got round to producing a definitive blacklist.
> 
> One of my thoughts is to provide a separate 'clean this up' button which 
> would apply a more aggressive tidy-up than the one when saving. Also, I 
> agree that only applying the tidy on save is bad, but there isn't a cross-
> browser way to detect a paste, and applying the cleanup on a large 
> document every time you cut/paste one word wouldn't be nice either.
> 
> Suggestions for improvements are most welcome.
> 
> P.S. It isn't just pasting bad HTML which is a problem: some Microsoft 
> applications supply RTF on the clipboard but not HTML and it turns out that 
> if you paste RTF into IE it generates seriously invalid HTML with a totally 
> weird and corrupted DOM. That is another area where I think the cleanup 
> code finally does a passable job but not yet a perfect one.
> 
> _______________________________________________
> Zope maillist  -  Zope at zope.org
> http://mail.zope.org/mailman/listinfo/zope
> **   No cross posts or HTML encoding!  **
> (Related lists - 
>  http://mail.zope.org/mailman/listinfo/zope-announce
>  http://mail.zope.org/mailman/listinfo/zope-dev )
> 



More information about the Zope mailing list