How do I control tidy options when using Epoz and uTidyLib? I'd like it to output xhtml, but it is currently outputting uppercase tag names etc. Robert (Jamie) Munro
Robert (Jamie) Munro wrote:
How do I control tidy options when using Epoz and uTidyLib? I'd like it to output xhtml, but it is currently outputting uppercase tag names etc.
If uTidyLib or mxTidy (recommended) is installed correctly, Epoz should output XHTML. Please check if you've installed uTidyLib with the correct python (same as running your Zope-Server). -mj
Friday, April 29, 2005, 7:12:30 PM, Maik Jablonski wrote:
Robert (Jamie) Munro wrote:
How do I control tidy options when using Epoz and uTidyLib? I'd like it to output xhtml, but it is currently outputting uppercase tag names etc.
If uTidyLib or mxTidy (recommended) is installed correctly, Epoz should output XHTML. Please check if you've installed uTidyLib with the correct python (same as running your Zope-Server).
BTW, anybody has found a solution for fixing HTML copy-pasted from Microsoft Word (mostly 2000/XP)? Lot of users has MS Word, and the HTML pasted from it is a CSS killer mess. I tried mxTidy but it didn't improved substantially the HTML. So how do you guys do it? I have looked after solutions for Epoz, but didn't found any. But I don't stick to Epoz... if there is a solution already for Kupu (is Kupu already recommended over Epoz anyway?). Certainly the solution would be an Epoz post-tidy Python script, but I didn't found any for Word tidying. (However, the ideal would be if the HTML is tidied right on the client when it pastes it in -- thus user would really get what it sees, i.e. the HTML wouldn't be changed when he saves it. That effect is really evil.)
-mj
-- Best regards, Daniel Dekany
Kupu takes care of all the M$ crap code from my experience. Shane On 4/29/05, Daniel Dekany <ddekany@freemail.hu> wrote:
Friday, April 29, 2005, 7:12:30 PM, Maik Jablonski wrote:
BTW, anybody has found a solution for fixing HTML copy-pasted from Microsoft Word (mostly 2000/XP)?
As Shane pointed out, there is a tidy up in Kupu. However, in my experience, it is not a very good tidy up (if I remember correctly, a lot of tags are still there after the tidy up). AFAIK, Kupu is integrated in Plone 2.1. Daniel Dekany wrote:
Friday, April 29, 2005, 7:12:30 PM, Maik Jablonski wrote:
Robert (Jamie) Munro wrote:
How do I control tidy options when using Epoz and uTidyLib? I'd like it to output xhtml, but it is currently outputting uppercase tag names etc.
If uTidyLib or mxTidy (recommended) is installed correctly, Epoz should output XHTML. Please check if you've installed uTidyLib with the correct python (same as running your Zope-Server).
BTW, anybody has found a solution for fixing HTML copy-pasted from Microsoft Word (mostly 2000/XP)? Lot of users has MS Word, and the HTML pasted from it is a CSS killer mess. I tried mxTidy but it didn't improved substantially the HTML. So how do you guys do it? I have looked after solutions for Epoz, but didn't found any. But I don't stick to Epoz... if there is a solution already for Kupu (is Kupu already recommended over Epoz anyway?). Certainly the solution would be an Epoz post-tidy Python script, but I didn't found any for Word tidying. (However, the ideal would be if the HTML is tidied right on the client when it pastes it in -- thus user would really get what it sees, i.e. the HTML wouldn't be changed when he saves it. That effect is really evil.)
-mj
Cyrille Bonnet wrote:
Daniel Dekany wrote:
BTW, anybody has found a solution for fixing HTML copy-pasted from Microsoft Word (mostly 2000/XP)? Lot of users has MS Word, and the HTML pasted from it is a CSS killer mess. I tried mxTidy but it didn't improved substantially the HTML. So how do you guys do it? I have looked after solutions for Epoz, but didn't found any. But I don't stick to Epoz... if there is a solution already for Kupu (is Kupu already recommended over Epoz anyway?). Certainly the solution would be an Epoz post-tidy Python script, but I didn't found any for Word tidying. (However, the ideal would be if the HTML is tidied right on the client when it pastes it in -- thus user would really get what it sees, i.e. the HTML wouldn't be changed when he saves it. That effect is really evil.)
As Shane pointed out, there is a tidy up in Kupu. However, in my experience, it is not a very good tidy up (if I remember correctly, a lot of tags are still there after the tidy up).
Unfortunately there is a fine line between tidying up the cruft pasted from Word, and not stripping out things which might actually have been entered legitimately. I think Kupu does this pretty well (but then I'm a bit biased), but without any way to detect that the user is pasting from Word I don't see how much more could be stripped. So far as I know the only thing which doesn't really get stripped from the pasted Word text are the mso classnames. These can be manually blacklisted, but I never got round to producing a definitive blacklist. One of my thoughts is to provide a separate 'clean this up' button which would apply a more aggressive tidy-up than the one when saving. Also, I agree that only applying the tidy on save is bad, but there isn't a cross- browser way to detect a paste, and applying the cleanup on a large document every time you cut/paste one word wouldn't be nice either. Suggestions for improvements are most welcome. P.S. It isn't just pasting bad HTML which is a problem: some Microsoft applications supply RTF on the clipboard but not HTML and it turns out that if you paste RTF into IE it generates seriously invalid HTML with a totally weird and corrupted DOM. That is another area where I think the cleanup code finally does a passable job but not yet a perfect one.
Thursday, May 5, 2005, 9:10:23 AM, Duncan Booth wrote:
Cyrille Bonnet wrote:
Daniel Dekany wrote: [snip] One of my thoughts is to provide a separate 'clean this up' button which would apply a more aggressive tidy-up than the one when saving. Also, I agree that only applying the tidy on save is bad, but there isn't a cross- browser way to detect a paste, and applying the cleanup on a large document every time you cut/paste one word wouldn't be nice either. [snip]
Did anybody considered using other client side technologies than JavaScript, like using a Java Applet or Flash as the editor? Maybe they can capture paste events and such (I don't know...), also they have much less cross-browser problems (like they work with Opera and Safari). Yes, they can't render HTML on the same way the browser, but after all, if we are talking about a place where users enter pure content (I mean, structure) then maybe it is not a that big problem. I mean, the user sees clearly that he has made a paragraph here and ha level 1 heading there, even if he doesn't see how will it exactly look regarding the visual design. At least (s)he will concentrate on content rather than on visual design. Anyway, I think that HTML is not the ideal scheme for entering content. -- Best regards, Daniel Dekany
I agree with you, Duncan, the tidy up can not be much more aggressive by default. And Kupu probably does the best possible job there. Now, the button "Clean this up" is a good idea, I think. Did you get started on this? I am happy to help if you do develop that feature. Also, another option for users that need to convert a lot of Word documents is, of course, WebDAV + PortalTransform. Cheers Cyrille Duncan Booth wrote:
Cyrille Bonnet wrote:
Daniel Dekany wrote:
BTW, anybody has found a solution for fixing HTML copy-pasted from Microsoft Word (mostly 2000/XP)? Lot of users has MS Word, and the HTML pasted from it is a CSS killer mess. I tried mxTidy but it didn't improved substantially the HTML. So how do you guys do it? I have looked after solutions for Epoz, but didn't found any. But I don't stick to Epoz... if there is a solution already for Kupu (is Kupu already recommended over Epoz anyway?). Certainly the solution would be an Epoz post-tidy Python script, but I didn't found any for Word tidying. (However, the ideal would be if the HTML is tidied right on the client when it pastes it in -- thus user would really get what it sees, i.e. the HTML wouldn't be changed when he saves it. That effect is really evil.)
As Shane pointed out, there is a tidy up in Kupu. However, in my experience, it is not a very good tidy up (if I remember correctly, a lot of tags are still there after the tidy up).
Unfortunately there is a fine line between tidying up the cruft pasted from Word, and not stripping out things which might actually have been entered legitimately. I think Kupu does this pretty well (but then I'm a bit biased), but without any way to detect that the user is pasting from Word I don't see how much more could be stripped.
So far as I know the only thing which doesn't really get stripped from the pasted Word text are the mso classnames. These can be manually blacklisted, but I never got round to producing a definitive blacklist.
One of my thoughts is to provide a separate 'clean this up' button which would apply a more aggressive tidy-up than the one when saving. Also, I agree that only applying the tidy on save is bad, but there isn't a cross- browser way to detect a paste, and applying the cleanup on a large document every time you cut/paste one word wouldn't be nice either.
Suggestions for improvements are most welcome.
P.S. It isn't just pasting bad HTML which is a problem: some Microsoft applications supply RTF on the clipboard but not HTML and it turns out that if you paste RTF into IE it generates seriously invalid HTML with a totally weird and corrupted DOM. That is another area where I think the cleanup code finally does a passable job but not yet a perfect one.
_______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Thursday, May 5, 2005, 4:41:15 AM, Cyrille Bonnet wrote:
As Shane pointed out, there is a tidy up in Kupu. However, in my experience, it is not a very good tidy up (if I remember correctly, a lot of tags are still there after the tidy up).
Maybe the key of the tidy problem is that these server side tidiers don't date to be too Draconian, because then they would possibly damage the content. Now if the tidy would happen right when the user pastes in the HTML from the clipboard, then he would immediately see what's the result, so the tidier could be much more brave in killing the frippery. Also, then the content that was entered earlier couldn't be damaged by the tidier (since only the just inserted content is tidied). Anyway, are there any hope that text entered with these browser embedded WYSIWYG editors (like Kupu) will be really pure content? I mean, the input shouldn't even be HTML (as far as the user sees it at least), but something like DocBook.
AFAIK, Kupu is integrated in Plone 2.1.
Daniel Dekany wrote:
Friday, April 29, 2005, 7:12:30 PM, Maik Jablonski wrote:
Robert (Jamie) Munro wrote:
How do I control tidy options when using Epoz and uTidyLib? I'd like it to output xhtml, but it is currently outputting uppercase tag names etc.
If uTidyLib or mxTidy (recommended) is installed correctly, Epoz should output XHTML. Please check if you've installed uTidyLib with the correct python (same as running your Zope-Server).
BTW, anybody has found a solution for fixing HTML copy-pasted from Microsoft Word (mostly 2000/XP)? Lot of users has MS Word, and the HTML pasted from it is a CSS killer mess. I tried mxTidy but it didn't improved substantially the HTML. So how do you guys do it? I have looked after solutions for Epoz, but didn't found any. But I don't stick to Epoz... if there is a solution already for Kupu (is Kupu already recommended over Epoz anyway?). Certainly the solution would be an Epoz post-tidy Python script, but I didn't found any for Word tidying. (However, the ideal would be if the HTML is tidied right on the client when it pastes it in -- thus user would really get what it sees, i.e. the HTML wouldn't be changed when he saves it. That effect is really evil.)
-mj
-- Best regards, Daniel Dekany
participants (6)
-
Cyrille Bonnet -
Daniel Dekany -
Duncan Booth -
Maik Jablonski -
Robert (Jamie) Munro -
Shane Graber