[Zope] RFC: Flaws in Structured text
Alexander Staubo
alex@mop.no
Mon, 13 Mar 2000 06:13:39 +0100
I would like to outline a few problems, based on
observations of real-world usage, with Structured text.
Comments are appreciated.
As much as I like the format, the current syntax is either
too loose, or the transformation logic too dumb, resulting
in text that does not come out quite as expected. There are
workarounds for all such problems, but authors and editors
aren't aware of them, and continually make the same mistakes
over and over again.
Here are a few gotchas:
- Lines starting with the text number-dot (eg., "1.") are
considered to be bullets.
Problem: In other languages, and sometimes in English,
this is a valid non-bullet introduction. In Scandinavian
languages (Norwegian is the language of my country), for
instance, "1." is an oft-used way of spelling "1nd" or
"first" ("2." for "2nd" or "second", and so forth). My
experience is that this occurs frequently.
Solution: Avoid transforming such paragraphs into bullets
when they occur adjacently to valid non-bullet sentences,
and only apply the transformation when the number is
higher than the preceding paragraph. Validate the number
itself -- today you can write
43423. Something
24324. Something else
and it's still transformed into
<ol><li><p>Something</p><li><p>Something else</p></ol>
- Lines starting with dash-space (eg., "- The") are also
interpreted as bullets.
Problem: As with number bullets, this clashes with non-
English conventions. At least in Scandinavian languages,
quotes are often given using a preceding dash. Here's an
actual example, translated to English for clarity:
- Pooh, said Rabbit kindly, you haven't any brain. - I know, said
Pooh humbly.
Solution: Make this feature optional.
- Em dash sequence ("--") usage clashes with definition
lists.
Problem: People, including me, frequently use an ASCII
form ("--") of the em dash ("-"), in the absence of this
character in the 7-bit ASCII character set. (Actually,
what *I* want is an en dash, which ISO provides.) This
transforms into a HTML <dl></dl> list, even when the dash
occurs late in the paragraph.
Solution: Only transform paragraph to definition list if
the dash sequence follows the first, dot-terminated
sentence, like so:
This is a definition term. -- and this is the definition
itself.
Not sure whether this is ideal. It would likely break old
stx documents. Perhaps a "force literal" control character
can be introduced, like the "\" token used in Python, Perl,
C, etc.
Note to self: There ought to be a way to transform such
poor-man's dashes into en/em dashes, and quotation marks
to "smart" quotation marks.
- Link transformation into HTML anchors tags is poor, or at
least too rigidly parsed.
Problem: Some examples of links that do not work:
The document (found "here":http://www.zope.org).
<dtml-var "'This is a \x022link\x022:http://www.zope.org/'"
fmt="structured-text">,
I'd love to give cite other annoying cases I've come
across, but I don't remember them. ;-)
Solution: Tolerate parantheses, and accept that URLs that
are terminated with end-of-line. Provide better syntax for
specifying URLs, such as this:
A product called Zope (http://www.zope.org) can be found
on the _Zope site_ (http://www.zope.org).
which would be transformed into:
A product called <a href="http://www.zope.org">Zope</a>
can be found on the <a href="http://www.zope.org">Zope
site</a>.
- Structured text code should not wrap transformed text in
paragraph tags (<p></p>), or should at least make this
wrapping optional.
Problem: I often display stx in places where a new
paragraph creates unnecessary vertical padding that
violates page design -- for this purpose I have been
forced to write a simple External Method that removes the
offending paragraph tags. IMHO, this ought to be
unnecessary.
Solution: Provide necessary option.
- Structured text code not available to DTML.
Problem: The code is only available to External Methods
and products. DTML can only get at stx through dtml-var's
fmt attribute.
Solution: It would be swell to have an _.stx() or
StructuredText() construct.
--
Alexander Staubo http://alex.mop.no/
"`This must be Thursday,' said Arthur to himself, sinking low over
his beer, `I never could get the hang of Thursdays.'"
--Douglas Adams, _The Hitchhiker's Guide to the Galaxy_