RFC: Flaws in Structured text
I would like to outline a few problems, based on observations of real-world usage, with Structured text. Comments are appreciated. As much as I like the format, the current syntax is either too loose, or the transformation logic too dumb, resulting in text that does not come out quite as expected. There are workarounds for all such problems, but authors and editors aren't aware of them, and continually make the same mistakes over and over again. Here are a few gotchas: - Lines starting with the text number-dot (eg., "1.") are considered to be bullets. Problem: In other languages, and sometimes in English, this is a valid non-bullet introduction. In Scandinavian languages (Norwegian is the language of my country), for instance, "1." is an oft-used way of spelling "1nd" or "first" ("2." for "2nd" or "second", and so forth). My experience is that this occurs frequently. Solution: Avoid transforming such paragraphs into bullets when they occur adjacently to valid non-bullet sentences, and only apply the transformation when the number is higher than the preceding paragraph. Validate the number itself -- today you can write 43423. Something 24324. Something else and it's still transformed into <ol><li><p>Something</p><li><p>Something else</p></ol> - Lines starting with dash-space (eg., "- The") are also interpreted as bullets. Problem: As with number bullets, this clashes with non- English conventions. At least in Scandinavian languages, quotes are often given using a preceding dash. Here's an actual example, translated to English for clarity: - Pooh, said Rabbit kindly, you haven't any brain. - I know, said Pooh humbly. Solution: Make this feature optional. - Em dash sequence ("--") usage clashes with definition lists. Problem: People, including me, frequently use an ASCII form ("--") of the em dash ("-"), in the absence of this character in the 7-bit ASCII character set. (Actually, what *I* want is an en dash, which ISO provides.) This transforms into a HTML <dl></dl> list, even when the dash occurs late in the paragraph. Solution: Only transform paragraph to definition list if the dash sequence follows the first, dot-terminated sentence, like so: This is a definition term. -- and this is the definition itself. Not sure whether this is ideal. It would likely break old stx documents. Perhaps a "force literal" control character can be introduced, like the "\" token used in Python, Perl, C, etc. Note to self: There ought to be a way to transform such poor-man's dashes into en/em dashes, and quotation marks to "smart" quotation marks. - Link transformation into HTML anchors tags is poor, or at least too rigidly parsed. Problem: Some examples of links that do not work: The document (found "here":http://www.zope.org). <dtml-var "'This is a \x022link\x022:http://www.zope.org/'" fmt="structured-text">, I'd love to give cite other annoying cases I've come across, but I don't remember them. ;-) Solution: Tolerate parantheses, and accept that URLs that are terminated with end-of-line. Provide better syntax for specifying URLs, such as this: A product called Zope (http://www.zope.org) can be found on the _Zope site_ (http://www.zope.org). which would be transformed into: A product called <a href="http://www.zope.org">Zope</a> can be found on the <a href="http://www.zope.org">Zope site</a>. - Structured text code should not wrap transformed text in paragraph tags (<p></p>), or should at least make this wrapping optional. Problem: I often display stx in places where a new paragraph creates unnecessary vertical padding that violates page design -- for this purpose I have been forced to write a simple External Method that removes the offending paragraph tags. IMHO, this ought to be unnecessary. Solution: Provide necessary option. - Structured text code not available to DTML. Problem: The code is only available to External Methods and products. DTML can only get at stx through dtml-var's fmt attribute. Solution: It would be swell to have an _.stx() or StructuredText() construct. -- Alexander Staubo http://alex.mop.no/ "`This must be Thursday,' said Arthur to himself, sinking low over his beer, `I never could get the hang of Thursdays.'" --Douglas Adams, _The Hitchhiker's Guide to the Galaxy_
on Monday, March 13, 2000 Alexander Staubo wrote : AS> I would like to outline a few problems, based on AS> observations of real-world usage, with Structured text. AS> Comments are appreciated. AS> As much as I like the format, the current syntax is either AS> too loose, or the transformation logic too dumb, resulting AS> in text that does not come out quite as expected. There are AS> workarounds for all such problems, but authors and editors AS> aren't aware of them, and continually make the same mistakes AS> over and over again. ...[snip] I agree with your points, but what is more important for me, is that the rendered HTML is syntatically correct * this is an item * another item * a third item currently renders as : <ul> <li><p>this is an item</p> <li><p>another item</p> <li><p>a third item</p> </ul> which is not very slick.. I'd prefer a forward-compatible list-rendering format, complying with HTML4.0 and XHTML, looking like this : <ul> <li>this is an item</li> <li>another item</li> <li>a third item</li> </ul> Anybody who has tried to make Netscape4.x understand CSS without properly closing tags will understand my point.. -- Geir Bækholt Hansen web-developer/designer geirh@funcom.com http://www.funcom.com
I agree with theese purposes on Structured Text... I would like to be able to write simple tables with Structured Text. Something like : | FirstName | LastName | Age | | Olivier | Deckmyn | 25 | | Guido | Van Rossum | 32 | | Linus | Torvald | 29 | That will build a simple table : <table> <tr> <td>FirstName</td><td>LastName</td><td>Age</td> </tr> <tr> <td>Olivier </td><td>Deckmyn</td><td>25</td> </tr> <tr> <td>Guido</td><td>Van Rossum</td><td>32</td> </tr> <tr> <td>Linus </td><td>Torvald </td><td>29</td> </tr> </table> Customization could be done with CSS... Thanx ! ----- Message d'origine ----- De : Geir B Hansen <geirh@funcom.com> À : Alexander Staubo <alex@mop.no> Cc : Zope Mailing List (E-mail) <zope@zope.org> Envoyé : lundi 13 mars 2000 10:38 Objet : Re: [Zope] RFC: Flaws in Structured text
on Monday, March 13, 2000 Alexander Staubo wrote : AS> I would like to outline a few problems, based on AS> observations of real-world usage, with Structured text. AS> Comments are appreciated.
AS> As much as I like the format, the current syntax is either AS> too loose, or the transformation logic too dumb, resulting AS> in text that does not come out quite as expected. There are AS> workarounds for all such problems, but authors and editors AS> aren't aware of them, and continually make the same mistakes AS> over and over again.
...[snip]
I agree with your points, but what is more important for me, is that the rendered HTML is syntatically correct
* this is an item
* another item
* a third item
currently renders as :
<ul> <li><p>this is an item</p> <li><p>another item</p> <li><p>a third item</p> </ul>
which is not very slick.. I'd prefer a forward-compatible list-rendering format, complying with HTML4.0 and XHTML, looking like this :
<ul> <li>this is an item</li> <li>another item</li> <li>a third item</li> </ul>
Anybody who has tried to make Netscape4.x understand CSS without properly closing tags will understand my point..
-- Geir Bækholt Hansen web-developer/designer geirh@funcom.com http://www.funcom.com
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
Olivier Deckmyn wrote:
| FirstName | LastName | Age | | Olivier | Deckmyn | 25 | | Guido | Van Rossum | 32 | | Linus | Torvald | 29 |
Latest CVS version does this. You can find the original patch on zope.org somewhere. -- Itamar S.T. itamars@ibm.net
Oups... I would like to correct my RFC with : <table> <tr> <th>FirstName</th><th>LastName</th><th>Age</th> </tr> <tr> <td>Olivier </td><td>Deckmyn</td><td>25</td> </tr> <tr> <td>Guido</td><td>Van Rossum</td><td>32</td> </tr> <tr> <td>Linus </td><td>Torvald </td><td>29</td> </tr> </table> I changed the <td>FirstName</td><td>LastName</td><td>Age</td> to : <th>FirstName</th><th>LastName</th><th>Age</th> ----- Message d'origine ----- De : Olivier Deckmyn <odeckmyn.list@teaser.fr> À : Zope Mailing List <zope@zope.org> Envoyé : lundi 13 mars 2000 11:38 Objet : Re: [Zope] RFC: Flaws in Structured text
I agree with theese purposes on Structured Text...
I would like to be able to write simple tables with Structured Text. Something like :
| FirstName | LastName | Age | | Olivier | Deckmyn | 25 | | Guido | Van Rossum | 32 | | Linus | Torvald | 29 |
That will build a simple table : <table> <tr> <td>FirstName</td><td>LastName</td><td>Age</td> </tr> <tr> <td>Olivier </td><td>Deckmyn</td><td>25</td> </tr> <tr> <td>Guido</td><td>Van Rossum</td><td>32</td> </tr> <tr> <td>Linus </td><td>Torvald </td><td>29</td> </tr> </table>
Customization could be done with CSS...
Thanx !
----- Message d'origine ----- De : Geir B Hansen <geirh@funcom.com> À : Alexander Staubo <alex@mop.no> Cc : Zope Mailing List (E-mail) <zope@zope.org> Envoyé : lundi 13 mars 2000 10:38 Objet : Re: [Zope] RFC: Flaws in Structured text
on Monday, March 13, 2000 Alexander Staubo wrote : AS> I would like to outline a few problems, based on AS> observations of real-world usage, with Structured text. AS> Comments are appreciated.
AS> As much as I like the format, the current syntax is either AS> too loose, or the transformation logic too dumb, resulting AS> in text that does not come out quite as expected. There are AS> workarounds for all such problems, but authors and editors AS> aren't aware of them, and continually make the same mistakes AS> over and over again.
...[snip]
I agree with your points, but what is more important for me, is that the rendered HTML is syntatically correct
* this is an item
* another item
* a third item
currently renders as :
<ul> <li><p>this is an item</p> <li><p>another item</p> <li><p>a third item</p> </ul>
which is not very slick.. I'd prefer a forward-compatible list-rendering format, complying with HTML4.0 and XHTML, looking like this :
<ul> <li>this is an item</li> <li>another item</li> <li>a third item</li> </ul>
Anybody who has tried to make Netscape4.x understand CSS without properly closing tags will understand my point..
-- Geir Bækholt Hansen web-developer/designer geirh@funcom.com http://www.funcom.com
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
On Mon, 13 Mar 2000, Olivier Deckmyn wrote:
Oups...
I would like to correct my RFC with :
<table> <tr> <th>FirstName</th><th>LastName</th><th>Age</th> </tr>
This will imply some parameter to define whether you want headers or not as not everybody uses them. Unless we can come up with some clever (but visually pleasing in ASCII) format. Alexander also mentioned he does not like the <p> elements inside <li> elements but IMO are required in cases where there are more than one paragraphs in an <li> element. Unless we default single paragraphs in <li> elements to non <p>, but then we break consistency. I suppose a complete rewrite of stx is in order using a more modular interface. For example I was thinking of one class per structure element with the appropriate input method to receive the stx text, and a number of output methods to produce a variety of formats, html,pdf,rtf etc. This is the approach I followed in the Table patch but it is also slower. The original stx was designed with performance in mind. One suggestion I have (and it would be relatively easy to implement) is to cache the rendered document in a volatile attribute attached to the DTMLmethod/Document that contains it, together with a simple checksum of the input text. Then we would be free from the performance constrain at least to some extent. Pavlos
Pavlos Christoforou wrote:
I suppose a complete rewrite of stx is in order using a more modular interface. For example I was thinking of one class per structure element with the appropriate input method to receive the stx text, and a number of output methods to produce a variety of formats, html,pdf,rtf etc. This is the approach I followed in the Table patch but it is also slower. The original stx was designed with performance in mind.
Would using re (so it can take advantage of multithreading) make up for a more complicated rendering engine?
One suggestion I have (and it would be relatively easy to implement) is to cache the rendered document in a volatile attribute attached to the DTMLmethod/Document that contains it, together with a simple checksum of the input text. Then we would be free from the performance constrain at least to some extent.
This wouldn't work in all cases - consider <dtml-var ZopeTime>, where the output changes every time it's called. If an object contains only strucutred text then you store the rendered text too, and update it every time the input text changes - the PTK has an object that does this. -- Itamar S.T. itamars@ibm.net
On Mon, 13 Mar 2000, Itamar Shtull-Trauring wrote:
Would using re (so it can take advantage of multithreading) make up for a more complicated rendering engine?
Hmm I hear the new re module of python1.6 is fast .. so a migration to re might be a good idea.
This wouldn't work in all cases - consider <dtml-var ZopeTime>, where the output changes every time it's called. If an object contains only strucutred text then you store the rendered text too, and update it every time the input text changes - the PTK has an object that does this.
If a block of structured text contains dynamic content then you are right, caching won't work, but in many cases structured text documents contain only static text. It does not harm to cache unless you have many stx docs and little memory. I haven't checked PTK yet but in Zpdf Document the rendered output is cached in a volatile attribute together with a fast checksum on the input text. Works quite well and ZODB takes care of garbage collection during object deactivation. Pavlos
Olivier Deckmyn wrote:
I agree with theese purposes on Structured Text...
I would like to be able to write simple tables with Structured Text. Something like :
Pavlos has a patch which adds tables to structured text. I don't have the URL you can find it somewhere on the zope site. -Michel
Thx michel... here is the URL : http://www.zope.org/Members/gaaros/StructuredText ----- Message d'origine ----- De : Michel Pelletier <michel@digicool.com> À : Olivier Deckmyn <odeckmyn.list@teaser.fr> Cc : Zope Mailing List <zope@zope.org> Envoyé : lundi 13 mars 2000 20:09 Objet : Re: [Zope] RFC: Flaws in Structured text
Olivier Deckmyn wrote:
I agree with theese purposes on Structured Text...
I would like to be able to write simple tables with Structured Text. Something like :
Pavlos has a patch which adds tables to structured text. I don't have the URL you can find it somewhere on the zope site.
-Michel
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
From: Geir B Hansen [mailto:geirh@funcom.com] Sent: Monday, March 13, 2000 10:39 AM To: Alexander Staubo Cc: Zope Mailing List (E-mail) Subject: Re: [Zope] RFC: Flaws in Structured text
[snip]
I agree with your points, but what is more important for me, is that the rendered HTML is syntatically correct [snip] which is not very slick.. I'd prefer a forward-compatible list-rendering format, complying with HTML4.0 and XHTML, looking like this :
<ul> <li>this is an item</li> <li>another item</li> <li>a third item</li> </ul>
Shouldn't that rather be <ul> <li><p>this is an item</p></li> <li><p>another item</p></li> <li><p>a third item</p></li> </ul> After all, we *are* talking about paragraphs here. I happen to like -- very much -- the additional visual padding that paragraphs gives me.
-- Geir Bækholt Hansen web-developer/designer geirh@funcom.com http://www.funcom.com
Alexander Staubo mailto:alex@mop.no http://www.mop.no/~alex/
on Monday, March 13, 2000 Alexander Staubo wrote : [snip]
<ul> <li>this is an item</li> <li>another item</li> <li>a third item</li> </ul>
AS> Shouldn't that rather be AS> <ul> AS> <li><p>this is an item</p></li> AS> <li><p>another item</p></li> AS> <li><p>a third item</p></li> AS> </ul> AS> After all, we *are* talking about paragraphs here. I happen to like -- AS> very much -- the additional visual padding that paragraphs gives me. .. not necessarily.. the <p>s imply a paragraph of text, which is not necessarily what is contained in the <li>.. i agree that in most cases the visual result is better with the <p>, but this should really be taken care of with css. As far as possible, presentation and content should be separated... the <p>s would be ok with me, if we just got the <li>s to close correctly, but i don't think they should be there if the reason is purely aesthetic.. -- Geir Bækholt Hansen web-developer/designer geirh@funcom.com http://www.funcom.com
From: Geir B Hansen [mailto:geirh@funcom.com] Sent: Monday, March 13, 2000 11:57 AM To: Alexander Staubo Cc: Zope@Zope. Org Subject: Re[2]: [Zope] RFC: Flaws in Structured text
[snip]
the <p>s would be ok with me, if we just got the <li>s to close correctly, but i don't think they should be there if the reason is purely aesthetic..
Imho closing the <li>s is all we need -- adding padding with CSS works with the latest browsers, but we still have non-CSS browsers. As much as I prefer pure, CSS-enriched HTML, it doesn't work all that well in "real life". Not yet, anyway.
-- Geir Bækholt Hansen web-developer/designer geirh@funcom.com http://www.funcom.com
Is Funcom using Zope for any real-world web work? -- Alexander Staubo http://alex.mop.no/
Alexander Staubo wrote:
I would like to outline a few problems, based on observations of real-world usage, with Structured text. Comments are appreciated.
One of myw own - words in single quotes 'like this' should NOT be turned into code examples. Most people do not write code examples in their texts.
- Structured text code not available to DTML.
Solution: It would be swell to have an _.stx() or StructuredText() construct.
Or add the whole special_formats dictionary available in PythonMethods to _. In general, strucutred text needs a complete rewrite to make it extendable and customizable. Using the current structure you end up commenting stuff out and rewriting sections of the code, which is not a very good way of working. But I doubt DC have the time for that. -- Itamar S.T. itamars@ibm.net
participants (6)
-
Alexander Staubo -
Geir B Hansen -
Itamar Shtull-Trauring -
Michel Pelletier -
Olivier Deckmyn -
Pavlos Christoforou