-----Original Message----- From: zope-admin@zope.org [mailto:zope-admin@zope.org]On Behalf Of Johan Carlsson [EasyPublisher] Sent: Monday, December 16, 2002 8:12 AM To: Bryan Capitano Cc: zope@zope.org Subject: RE: [Zope] MSWordDocument and Logictran's R2NET
At 07:55 2002-12-16 -0800, Bryan Capitano said:
I have not tried the MSWordDocument product before. That's interesting, thanks for sharing it. I am familiar with a commercial product from Logictran called 'R2NET'. With this software you can easily convert Word (RTF) files to HTML or XHTML or XML. I use the product extensively at the Linux command line. It is easy to use, very powerful and robust. It gives you lots of control over how documents are converted through a translation file which you can customize if you want more custom output. I think it would be easy to plug into Zope. Bryan
How does Logictran's R2NET compare to vwWare (which is use by MSWordDocuments on Unix)? It seems like they are quite similar.
Regards, Johan Carlsson
Johan, I had evaluated wvWare a couple months ago for a web-to-print project (sharing documents between a website and a printed book publication). wvWare wasn't nearly as feature-rich or robust as R2NET. For example: 1. I was not able to use wvWare to convert DOC/RTF into XML using my own DTD. (I can with R2NET). 2. wvWare did not recognize some of the more complex RTF control codes for font "styles", tables, or anything much more complicated than plain text. It does recognize fonts, font sizes, and italics/bold/etc. But in Word you can define actual styles that you can re-use or apply to sections of a document. wvWare doesn't capture style information. 3. In the publishing world, documents often have hidden codes embedded in the document. In particular, I was concerned about RTF codes \xe, \txe, and \tc. In the document these look like: {xe "this looks like an index code."} or see-also entries like this: {xe "trees" \t "See also Shrubs"}. You might also want to use some hidden table-of-contents codes embedded in your document like this: {tc "Chapter 1, Trees and Shrubs" \l 1}. R2NET will extract this information from RTF documents and put them in your XML if you tell it HOW by using the translation files. wvWare can't do this, at least not to my knowledge. For these reasons, I think wvWare is a good "basic" converter. It's a good first step, and useful for basic doc-->html needs. But if you need more power and extensibility, and if you want to dump Word documents into your own pre-defined XML DTD, then R2NET is worth the $69 dollars. You could also write your own Perl RTF parser by making use of RTF::Tokenizer. I have done this too. It is a more difficult road, but gives you absolute flexibility. There may be a similar RTF tokenizer for Python??? Best regards, Bryan Bryan R. Capitano President, CAPITANO WEb CONSULTING Tel: 541-344-0747 Email: Bryan@capitanoweb.com URL: http://www.capitanoweb.com