[Zope-Checkins] CVS: Zope/lib/python/third_party/docutils/docs/dev/rst - alternatives.txt:1.1.2.1 problems.txt:1.1.2.1

Andreas Jung andreas at andreas-jung.com
Fri Oct 29 14:24:46 EDT 2004


Update of /cvs-repository/Zope/lib/python/third_party/docutils/docs/dev/rst
In directory cvs.zope.org:/tmp/cvs-serv11767/docutils/docs/dev/rst

Added Files:
      Tag: ajung-docutils-cleanup-branch
	alternatives.txt problems.txt 
Log Message:
moved


=== Added File Zope/lib/python/third_party/docutils/docs/dev/rst/alternatives.txt === (2329/2729 lines abridged)
==================================================
 A Record of reStructuredText Syntax Alternatives
==================================================
:Author: David Goodger
:Contact: goodger at users.sourceforge.net
:Revision: $Revision: 1.1.2.1 $
:Date: $Date: 2004/10/29 18:24:45 $
:Copyright: This document has been placed in the public domain.

The following are ideas, alternatives, and justifications that were
considered for reStructuredText syntax, which did not originate with
Setext_ or StructuredText_.  For an analysis of constructs which *did*
originate with StructuredText or Setext, please see `Problems With
StructuredText`_.  See the `reStructuredText Markup Specification`_
for full details of the established syntax.

.. _Setext: http://docutils.sourceforge.net/mirror/setext.html
.. _StructuredText:
   http://dev.zope.org/Members/jim/StructuredTextWiki/FrontPage
.. _Problems with StructuredText: problems.html
.. _reStructuredText Markup Specification:
   ../../ref/rst/restructuredtext.html


.. contents::

-------------
 Implemented
-------------

Field Lists
===========

Prior to the syntax for field lists being finalized, several
alternatives were proposed.

1. Unadorned RFC822_ everywhere::

       Author: Me
       Version: 1

   Advantages: clean, precedent (RFC822-compliant).  Disadvantage:
   ambiguous (these paragraphs are a prime example).

   Conclusion: rejected.

2. Special case: use unadorned RFC822_ for the very first or very last
   text block of a document::

       """
       Author: Me
       Version: 1

       The rest of the document...
       """

   Advantages: clean, precedent (RFC822-compliant).  Disadvantages:
   special case, flat (unnested) field lists only, still ambiguous::

       """
       Usage: cmdname [options] arg1 arg2 ...

       We obviously *don't* want the like above to be interpreted as a
       field list item.  Or do we?
       """

   Conclusion: rejected for the general case, accepted for specific
   contexts (PEPs, email).

3. Use a directive::

       .. fields::

          Author: Me
          Version: 1

   Advantages: explicit and unambiguous, RFC822-compliant.
   Disadvantage: cumbersome.

   Conclusion: rejected for the general case (but such a directive
   could certainly be written).

4. Use Javadoc-style::

       @Author: Me
       @Version: 1
       @param a: integer

   Advantages: unambiguous, precedent, flexible.  Disadvantages:
   non-intuitive, ugly, not RFC822-compliant.

   Conclusion: rejected.

5. Use leading colons::

       :Author: Me
       :Version: 1

   Advantages: unambiguous, obvious (*almost* RFC822-compliant),
   flexible, perhaps even elegant.  Disadvantages: no precedent, not
   quite RFC822-compliant.

   Conclusion: accepted!

6. Use double colons::

       Author:: Me
       Version:: 1

   Advantages: unambiguous, obvious? (*almost* RFC822-compliant),
   flexible, similar to syntax already used for literal blocks and
   directives.  Disadvantages: no precedent, not quite
   RFC822-compliant, similar to syntax already used for literal blocks
   and directives.

   Conclusion: rejected because of the syntax similarity & conflicts.

Why is RFC822 compliance important?  It's a universal Internet
standard, and super obvious.  Also, I'd like to support the PEP format
(ulterior motive: get PEPs to use reStructuredText as their standard).
But it *would* be easy to get used to an alternative (easy even to
convert PEPs; probably harder to convert python-deviants ;-).

Unfortunately, without well-defined context (such as in email headers:
RFC822 only applies before any blank lines), the RFC822 format is
ambiguous.  It is very common in ordinary text.  To implement field
lists unambiguously, we need explicit syntax.

The following question was posed in a footnote:

   Should "bibliographic field lists" be defined at the parser level,
   or at the DPS transformation level?  In other words, are they
   reStructuredText-specific, or would they also be applicable to
   another (many/every other?) syntax?

The answer is that bibliographic fields are a
reStructuredText-specific markup convention.  Other syntaxes may
implement the bibliographic elements explicitly.  For example, there
would be no need for such a transformation for an XML-based markup
syntax.

.. _RFC822: http://www.rfc-editor.org/rfc/rfc822.txt


Interpreted Text "Roles"
========================

The original purpose of interpreted text was as a mechanism for
descriptive markup, to describe the nature or role of a word or
phrase.  For example, in XML we could say "<function>len</function>"
to mark up "len" as a function.  It is envisaged that within Python
docstrings (inline documentation in Python module source files, the
primary market for reStructuredText) the role of a piece of
interpreted text can be inferred implicitly from the context of the
docstring within the program source.  For other applications, however,
the role may have to be indicated explicitly.

Interpreted text is enclosed in single backquotes (`).

1. Initially, it was proposed that an explicit role could be indicated
   as a word or phrase within the enclosing backquotes:

   - As a prefix, separated by a colon and whitespace::

         `role: interpreted text`

   - As a suffix, separated by whitespace and a colon::

         `interpreted text :role`

   There are problems with the initial approach:

   - There could be ambiguity with interpreted text containing colons.
     For example, an index entry of "Mission: Impossible" would
     require a backslash-escaped colon.

   - The explicit role is descriptive markup, not content, and will
     not be visible in the processed output.  Putting it inside the
     backquotes doesn't feel right; the *role* isn't being quoted.

2. Tony Ibbs suggested that the role be placed outside the
   backquotes::

       role:`prefix` or `suffix`:role

   This removes the embedded-colons ambiguity, but limits the role
   identifier to be a single word (whitespace would be illegal).
   Since roles are not meant to be visible after processing, the lack
   of whitespace support is not important.

   The suggested syntax remains ambiguous with respect to ratios and
   some writing styles.  For example, suppose there is a "signal"
   identifier, and we write::

       ...calculate the `signal`:noise ratio.

   "noise" looks like a role.

3. As an improvement on #2, we can bracket the role with colons::


[-=- -=- -=- 2329 lines omitted -=- -=- -=-]

    Second and subsequent lines need not be indented at all.

        - A bullet list inside
    the block quote.

          Second paragraph of the
    bullet list inside the block quote.

Although feasible, this form of lazy indentation has problems.  The
document structure and hierarchy is not obvious from the indentation,
making the source plaintext difficult to read.  This will also make
keeping track of the indentation while writing difficult and
error-prone.  However, these problems may be acceptable for Wikis and
email mode, where we may be able to rely on less complex structure
(few nested lists, for example).


Multiple Roles in Interpreted Text
==================================

In reStructuredText, inline markup cannot be nested (yet; `see
above`__).  This also applies to interpreted text.  In order to
simultaneously combine multiple roles for a single piece of text, a
syntax extension would be necessary.  Ideas:

1. Initial idea::

       `interpreted text`:role1,role2:

2. Suggested by Jason Diamond::

       `interpreted text`:role1:role2:

If a document is so complex as to require nested inline markup,
perhaps another markup system should be considered.  By design,
reStructuredText does not have the flexibility of XML.

__ `Nested Inline Markup`_


Parameterized Interpreted Text
==============================

In some cases it may be expedient to pass parameters to interpreted
text, analogous to function calls.  Ideas:

1. Parameterize the interpreted text role itself (suggested by Jason
   Diamond)::

       `interpreted text`:role1(foo=bar):

   Positional parameters could also be supported::

       `CSS`:acronym(Cascading Style Sheets): is used for HTML, and
       `CSS`:acronym(Content Scrambling System): is used for DVDs.

   Technical problem: current interpreted text syntax does not
   recognize roles containing whitespace.  Design problem: this smells
   like programming language syntax, but reStructuredText is not a
   programming language.

2. Put the parameters inside the interpreted text::

       `CSS (Cascading Style Sheets)`:acronym: is used for HTML, and
       `CSS (Content Scrambling System)`:acronym: is used for DVDs.

   Although this could be defined on an individual basis (per role),
   we ought to have a standard.  Hyperlinks with embedded URIs already
   use angle brackets; perhaps they could be used here too::

       `CSS <Cascading Style Sheets>`:acronym: is used for HTML, and
       `CSS <Content Scrambling System>`:acronym: is used for DVDs.

   Do angle brackets connote URLs too much for this to be acceptable?
   How about the "tag" connotation -- does it save them or doom them?

Does this push inline markup too far?  Readability becomes a serious
issue.  Substitutions may provide a better alternative (at the expense
of verbosity and duplication) by pulling the details out of the text
flow::

    |CSS| is used for HTML, and |CSS-DVD| is used for DVDs.

    .. |CSS| acronym:: Cascading Style Sheets
    .. |CSS-DVD| acronym:: Content Scrambling System
       :text: CSS

----------------------------------------------------------------------

This whole idea may be going beyond the scope of reStructuredText.
Documents requiring this functionality may be better off using XML or
another markup system.

This argument comes up regularly when pushing the envelope of
reStructuredText syntax.  I think it's a useful argument in that it
provides a check on creeping featurism.  In many cases, the resulting
verbosity produces such unreadable plaintext that there's a natural
desire *not* to use it unless absolutely necessary.  It's a matter of
finding the right balance.


Syntax for Interpreted Text Role Bindings
=========================================

The following syntax (idea from Jeffrey C. Jacobs) could be used to
associate directives with roles::

    .. :rewrite: class:: rewrite

    `She wore ribbons in her hair and it lay with streaks of
    grey`:rewrite:

The syntax is similar to that of substitution declarations, and the
directive/role association may resolve implementation issues.  The
semantics, ramifications, and implementation details would need to be
worked out.

The example above would implement the "rewrite" role as adding a
``class="rewrite"`` attribute to the interpreted text ("inline"
element).  The stylesheet would then pick up on the "class" attribute
to do the actual formatting.

The advantage of the new syntax would be flexibility.  Uses other than
"class" may present themselves.  The disadvantage is complexity:
having to implement new syntax for a relatively specialized operation,
and having new semantics in existing directives ("class::" would do
something different).

The `"role" directive`__ has been implemented.

__ ../../ref/rst/directives.html#role


Character Processing
====================

Several people have suggested adding some form of character processing
to reStructuredText:

* Some sort of automated replacement of ASCII sequences:

  - ``--`` to em-dash (or ``--`` to en-dash, and ``---`` to em-dash).
  - Convert quotes to curly quote entities.  (Essentially impossible
    for HTML?  Unnecessary for TeX.)
  - Various forms of ``:-)`` to smiley icons.
  - ``"\ "`` to &nbsp;.  Problem with line-wrapping though: it could
    end up escaping the newline.
  - Escaped newlines to <BR>.
  - Escaped period or quote or dash as a disappearing catalyst to
    allow character-level inline markup?

* XML-style character entities, such as "&copy;" for the copyright
  symbol.

Docutils has no need of a character entity subsystem.  Supporting
Unicode and text encodings, character entities should be directly
represented in the text: a copyright symbol should be represented by
the copyright symbol character.  If this is not possible in an
authoring environment, a pre-processing stage can be added, or a table
of substitution definitions can be devised.

A "unicode" directive has been implemented to allow direct
specification of esoteric characters.  In combination with the
substitution construct, "include" files defining common sets of
character entities can be defined and used.  `A set of character
entity set definition files have been defined`__ (`tarball`__).
There's also `a description and instructions for use`__.

__ http://docutils.sf.net/tmp/charents/
__ http://docutils.sf.net/tmp/charents.tgz
__ http://docutils.sf.net/tmp/charents/README.html

To allow for `character-level inline markup`_, a limited form of
character processing has been added to the spec and parser: escaped
whitespace characters are removed from the processed document.  Any
further character processing will be of this functional type, rather
than of the character-encoding type.

.. _character-level inline markup:
   ../../ref/rst/restructuredtext.html#character-level-inline-markup

* Directive idea::

      .. text-replace:: "pattern" "replacement"

  - Support Unicode "U+XXXX" codes.
  - Support regexps, perhaps with alternative "regexp-replace"
    directive.
  - Flags for regexps; ":flags:" option, or individuals.
  - Specifically, should the default be case-sensistive or
    -insensitive?


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   End:


=== Added File Zope/lib/python/third_party/docutils/docs/dev/rst/problems.txt ===
==============================
 Problems With StructuredText
==============================
:Author: David Goodger
:Contact: goodger at users.sourceforge.net
:Revision: $Revision: 1.1.2.1 $
:Date: $Date: 2004/10/29 18:24:45 $
:Copyright: This document has been placed in the public domain.

There are several problems, unresolved issues, and areas of
controversy within StructuredText_ (Classic and Next Generation).  In
order to resolve all these issues, this analysis brings all of the
issues out into the open, enumerates all the alternatives, and
proposes solutions to be incorporated into the reStructuredText_
specification.


.. contents::


Formal Specification
====================

The description in the original StructuredText.py has been criticized
for being vague.  For practical purposes, "the code *is* the spec."
Tony Ibbs has been working on deducing a `detailed description`_ from
the documentation and code of StructuredTextNG_.  Edward Loper's
STMinus_ is another attempt to formalize a spec.

For this kind of a project, the specification should always precede
the code.  Otherwise, the markup is a moving target which can never be
adopted as a standard.  Of course, a specification may be revised
during lifetime of the code, but without a spec there is no visible
control and thus no confidence.


Understanding and Extending the Code
====================================

The original StructuredText_ is a dense mass of sparsely commented
code and inscrutable regular expressions.  It was not designed to be
extended and is very difficult to understand.  StructuredTextNG_ has
been designed to allow input (syntax) and output extensions, but its
documentation (both internal [comments & docstrings], and external) is
inadequate for the complexity of the code itself.

For reStructuredText to become truly useful, perhaps even part of
Python's standard library, it must have clear, understandable
documentation and implementation code.  For the implementation of
reStructuredText to be taken seriously, it must be a sterling example
of the potential of docstrings; the implementation must practice what
the specification preaches.


Section Structure via Indentation
=================================

Setext_ required that body text be indented by 2 spaces.  The original
StructuredText_ and StructuredTextNG_ require that section structure
be indicated through indentation, as "inspired by Python".  For
certain structures with a very limited, local extent (such as lists,
block quotes, and literal blocks), indentation naturally indicates
structure or hierarchy.  For sections (which may have a very large
extent), structure via indentation is unnecessary, unnatural and
ambiguous.  Rather, the syntax of the section title *itself* should
indicate that it is a section title.

The original StructuredText states that "A single-line paragraph whose
immediately succeeding paragraphs are lower level is treated as a
header." Requiring indentation in this way is:

- Unnecessary.  The vast majority of docstrings and standalone
  documents will have no more than one level of section structure.
  Requiring indentation for such docstrings is unnecessary and
  irritating.

- Unnatural.  Most published works use title style (type size, face,
  weight, and position) and/or section/subsection numbering rather
  than indentation to indicate hierarchy.  This is a tradition with a
  very long history.

- Ambiguous.  A StructuredText header is indistinguishable from a
  one-line paragraph followed by a block quote (precluding the use of
  block quotes).  Enumerated section titles are ambiguous (is it a
  header? is it a list item?).  Some additional adornment must be
  required to confirm the line's role as a title, both to a parser and
  to the human reader of the source text.

Python's use of significant whitespace is a wonderful (if not
original) innovation, however requiring indentation in ordinary
written text is hypergeneralization.

reStructuredText_ indicates section structure through title adornment
style (as exemplified by this document).  This is far more natural.
In fact, it is already in widespread use in plain text documents,
including in Python's standard distribution (such as the toplevel
README_ file).


Character Escaping Mechanism
============================

No matter what characters are chosen for markup, some day someone will
want to write documentation *about* that markup or using markup
characters in a non-markup context.  Therefore, any complete markup
language must have an escaping or encoding mechanism.  For a
lightweight markup system, encoding mechanisms like SGML/XML's '&ast;'
are out.  So an escaping mechanism is in.  However, with carefully
chosen markup, it should be necessary to use the escaping mechanism
only infrequently.

reStructuredText_ needs an escaping mechanism: a way to treat
markup-significant characters as the characters themselves.  Currently
there is no such mechanism (although ZWiki uses '!').  What are the
candidates?

1. ``!`` (http://dev.zope.org/Members/jim/StructuredTextWiki/NGEscaping)
2. ``\``
3. ``~``
4. doubling of characters

The best choice for this is the backslash (``\``).  It's "the single
most popular escaping character in the world!", therefore familiar and
unsurprising.  Since characters only need to be escaped under special
circumstances, which are typically those explaining technical
programming issues, the use of the backslash is natural and
understandable.  Python docstrings can be raw (prefixed with an 'r',
as in 'r""'), which would obviate the need for gratuitous doubling-up
of backslashes.

(On 2001-03-29 on the Doc-SIG mailing list, GvR endorsed backslash
escapes, saying, "'nuff said.  Backslash it is." Although neither
legally binding nor irrevocable nor any kind of guarantee of anything,
it is a good sign.)

The rule would be: An unescaped backslash followed by any markup
character escapes the character.  The escaped character represents the
character itself, and is prevented from playing a role in any markup
interpretation.  The backslash is removed from the output.  A literal
backslash is represented by an "escaped backslash," two backslashes in
a row.

A carefully constructed set of recognition rules for inline markup
will obviate the need for backslash-escapes in almost all cases; see
`Delimitation of Inline Markup`_ below.

When an expression (requiring backslashes and other characters used
for markup) becomes too complicated and therefore unreadable, a
literal block may be used instead.  Inside literal blocks, no markup
is recognized, therefore backslashes (for the purpose of escaping
markup) become unnecessary.

We could allow backslashes preceding non-markup characters to remain
in the output.  This would make describing regular expressions and
other uses of backslashes easier.  However, this would complicate the
markup rules and would be confusing.


Blank Lines in Lists
====================

Oft-requested in Doc-SIG (the earliest reference is dated 1996-08-13)
is the ability to write lists without requiring blank lines between
items.  In docstrings, space is at a premium.  Authors want to convey
their API or usage information in as compact a form as possible.
StructuredText_ requires blank lines between all body elements,
including list items, even when boundaries are obvious from the markup
itself.

In reStructuredText, blank lines are optional between list items.
However, in order to eliminate ambiguity, a blank line is required
before the first list item and after the last.  Nested lists also
require blank lines before the list start and after the list end.


Bullet List Markup
==================

StructuredText_ includes 'o' as a bullet character.  This is dangerous
and counter to the language-independent nature of the markup.  There
are many languages in which 'o' is a word.  For example, in Spanish::

    Llamame a la casa
    o al trabajo.

    (Call me at home or at work.)

And in Japanese (when romanized)::

    Senshuu no doyoubi ni tegami
    o kakimashita.

    ([I] wrote a letter on Saturday last week.)

If a paragraph containing an 'o' word wraps such that the 'o' is the
first text on a line, or if a paragraph begins with such a word, it
could be misinterpreted as a bullet list.

In reStructuredText_, 'o' is not used as a bullet character.  '-',
'*', and '+' are the possible bullet characters.


Enumerated List Markup
======================

StructuredText enumerated lists are allowed to begin with numbers and
letters followed by a period or right-parenthesis, then whitespace.
This has surprising consequences for writing styles.  For example,
this is recognized as an enumerated list item by StructuredText::

    Mr. Creosote.

People will write enumerated lists in all different ways.  It is folly
to try to come up with the "perfect" format for an enumerated list,
and limit the docstring parser's recognition to that one format only.

Rather, the parser should recognize a variety of enumerator styles.
It is also recommended that the enumerator of the first list item be
ordinal-1 ('1', 'A', 'a', 'I', or 'i'), as output formats may not be
able to begin a list at an arbitrary enumeration.

An initial idea was to require two or more consistent enumerated list
items in a row.  This idea proved impractical and was dropped.  In
practice, the presence of a proper enumerator is enough to reliably
recognize an enumerated list item; any ambiguities are reported by the
parser.  Here's the original idea for posterity:

    The parser should recognize a variety of enumerator styles, mark
    each block as a potential enumerated list item (PELI), and
    interpret the enumerators of adjacent PELIs to decide whether they
    make up a consistent enumerated list.

    If a PELI is labeled with a "1.", and is immediately followed by a
    PELI labeled with a "2.", we've got an enumerated list.  Or "(A)"
    followed by "(B)".  Or "i)" followed by "ii)", etc.  The chances
    of accidentally recognizing two adjacent and consistently labeled
    PELIs, are acceptably small.

    For an enumerated list to be recognized, the following must be
    true:

    - the list must consist of multiple adjacent list items (2 or
      more)
    - the enumerators must all have the same format
    - the enumerators must be sequential


Definition List Markup
======================

StructuredText uses ' -- ' (whitespace, two hyphens, whitespace) on
the first line of a paragraph to indicate a definition list item.  The
' -- ' serves to separate the term (on the left) from the definition
(on the right).

Many people use ' -- ' as an em-dash in their text, conflicting with
the StructuredText usage.  Although the Chicago Manual of Style says
that spaces should not be used around an em-dash, Peter Funk pointed
out that this is standard usage in German (according to the Duden, the
official German reference), and possibly in other languages as well.
The widespread use of ' -- ' precludes its use for definition lists;
it would violate the "unsurprising" criterion.

A simpler, and at least equally visually distinctive construct
(proposed by Guido van Rossum, who incidentally is a frequent user of
' -- ') would do just as well::

    term 1
        Definition.

    term 2
        Definition 2, paragraph 1.

        Definition 2, paragraph 2.

A reStructuredText definition list item consists of a term and a
definition.  A term is a simple one-line paragraph.  A definition is a
block indented relative to the term, and may contain multiple
paragraphs and other body elements.  No blank line precedes a
definition (this distinguishes definition lists from block quotes).


Literal Blocks
==============

The StructuredText_ specification has literal blocks indicated by
'example', 'examples', or '::' ending the preceding paragraph.  STNG
only recognizes '::'; 'example'/'examples' are not implemented.  This
is good; it fixes an unnecessary language dependency.  The problem is
what to do with the sometimes- unwanted '::'.

In reStructuredText_ '::' at the end of a paragraph indicates that
subsequent *indented* blocks are treated as literal text.  No further
markup interpretation is done within literal blocks (not even
backslash-escapes).  If the '::' is preceded by whitespace, '::' is
omitted from the output; if '::' was the sole content of a paragraph,
the entire paragraph is removed (no 'empty' paragraph remains).  If
'::' is preceded by a non-whitespace character, '::' is replaced by
':' (i.e., the extra colon is removed).

Thus, a section could begin with a literal block as follows::

    Section Title
    -------------

    ::

        print "this is example literal"


Tables
======

The table markup scheme in classic StructuredText was horrible.  Its
omission from StructuredTextNG is welcome, and its markup will not be
repeated here.  However, tables themselves are useful in
documentation.  Alternatives:

1. This format is the most natural and obvious.  It was independently
   invented (no great feat of creation!), and later found to be the
   format supported by the `Emacs table mode`_::

       +------------+------------+------------+--------------+
       |  Header 1  |  Header 2  |  Header 3  |  Header 4    |
       +============+============+============+==============+
       |  Column 1  |  Column 2  | Column 3 & 4 span (Row 1) |
       +------------+------------+------------+--------------+
       |    Column 1 & 2 span    |  Column 3  | - Column 4   |
       +------------+------------+------------+ - Row 2 & 3  |
       |      1     |      2     |      3     | - span       |
       +------------+------------+------------+--------------+

   Tables are described with a visual outline made up of the
   characters '-', '=', '|', and '+':

   - The hyphen ('-') is used for horizontal lines (row separators).
   - The equals sign ('=') is optionally used as a header separator
     (as of version 1.5.24, this is not supported by the Emacs table
     mode).
   - The vertical bar ('|') is used for for vertical lines (column
     separators).
   - The plus sign ('+') is used for intersections of horizontal and
     vertical lines.

   Row and column spans are possible simply by omitting the column or
   row separators, respectively.  The header row separator must be
   complete; in other words, a header cell may not span into the table
   body.  Each cell contains body elements, and may have multiple
   paragraphs, lists, etc.  Initial spaces for a left margin are
   allowed; the first line of text in a cell determines its left
   margin.

2. Below is a simpler table structure.  It may be better suited to
   manual input than alternative #1, but there is no Emacs editing
   mode available.  One disadvantage is that it resembles section
   titles; a one-column table would look exactly like section &
   subsection titles. ::

       ============ ============ ============ ==============
         Header 1     Header 2     Header 3     Header 4
       ============ ============ ============ ==============
         Column 1     Column 2    Column 3 & 4 span (Row 1)
       ------------ ------------ ---------------------------
           Column 1 & 2 span       Column 3    - Column 4
       ------------------------- ------------  - Row 2 & 3
             1            2            3       - span
       ============ ============ ============ ==============

   The table begins with a top border of equals signs with a space at
   each column boundary (regardless of spans).  Each row is
   underlined.  Internal row separators are underlines of '-', with
   spaces at column boundaries.  The last of the optional head rows is
   underlined with '=', again with spaces at column boundaries.
   Column spans have no spaces in their underline.  Row spans simply
   lack an underline at the row boundary.  The bottom boundary of the
   table consists of '=' underlines.  A blank line is required
   following a table.

3. A minimalist alternative is as follows::

       ====  =====  ========  ========  =======  ====  =====  =====
       Old State    Input     Action             New State    Notes
       -----------  --------  -----------------  -----------
       ids   types  new type  sys.msg.  dupname  ids   types
       ====  =====  ========  ========  =======  ====  =====  =====
       --    --     explicit  --        --       new   True
       --    --     implicit  --        --       new   False
       None  False  explicit  --        --       new   True
       old   False  explicit  implicit  old      new   True
       None  True   explicit  explicit  new      None  True
       old   True   explicit  explicit  new,old  None  True   [1]
       None  False  implicit  implicit  new      None  False
       old   False  implicit  implicit  new,old  None  False
       None  True   implicit  implicit  new      None  True
       old   True   implicit  implicit  new      old   True
       ====  =====  ========  ========  =======  ====  =====  =====

   The table begins with a top border of equals signs with one or more
   spaces at each column boundary (regardless of spans).  There must
   be at least two columns in the table (to differentiate it from
   section headers).  Each line starts a new row.  The rightmost
   column is unbounded; text may continue past the edge of the table.
   Each row/line must contain spaces at column boundaries, except for
   explicit column spans.  Underlines of '-' can be used to indicate
   column spans, but should be used sparingly if at all.  Lines
   containing column span underlines may not contain any other text.
   The last of the optional head rows is underlined with '=', again
   with spaces at column boundaries.  The bottom boundary of the table
   consists of '=' underlines.  A blank line is required following a
   table.

   This table sums up the features.  Using all the features in such a
   small space is not pretty though::

       ========  ========  ========
                 Header 2 & 3 Span
                 ------------------
       Header 1  Header 2  Header 3
       ========  ========  ========
       Each      line is   a new row.
       Each row  consists  of one line only.
       Row       spans     are not possible.
       The last  column    may spill over to the right.
       Column spans are possible with an underline joining columns.
       ----------------------------
       The span  is        limited to the row above the underline.
       ========  ========  ========

4. As a variation of alternative 3, bullet list syntax in the first
   column could be used to indicate row starts.  Multi-line rows are
   possible, but row spans are not.  For example::

       ===== =====
       col 1 col 2
       ===== =====
       - 1   Second column of row 1.
       - 2   Second column of row 2.
             Second line of paragraph.
       - 3   Second column of row 3.

             Second paragraph of row 3,
             column 2
       ===== =====

   Column spans would be indicated on the line after the last line of
   the row.  To indicate a real bullet list within a first-column
   cell, simply nest the bullets.

5. In a further variation, we could simply assume that whitespace in
   the first column implies a multi-line row; the text in other
   columns is continuation text.  For example::

       ===== =====
       col 1 col 2
       ===== =====
       1     Second column of row 1.
       2     Second column of row 2.
             Second line of paragraph.
       3     Second column of row 3.

             Second paragraph of row 3,
             column 2
       ===== =====

   Limitations of this approach:

   - Cells in the first column are limited to one line of text.

   - Cells in the first column *must* contain some text; blank cells
     would lead to a misinterpretation.  An empty comment ("..") is
     sufficient.

6. Combining alternative 3 and 4, a bullet list in the first column
   could mean multi-line rows, and no bullet list means single-line
   rows only.

Alternatives 1 and 5 has been adopted by reStructuredText.


Delimitation of Inline Markup
=============================

StructuredText specifies that inline markup must begin with
whitespace, precluding such constructs as parenthesized or quoted
emphatic text::

    "**What?**" she cried.  (*exit stage left*)

The `reStructuredText markup specification`_ allows for such
constructs and disambiguates inline markup through a set of
recognition rules.  These recognition rules define the context of
markup start-strings and end-strings, allowing markup characters to be
used in most non-markup contexts without a problem (or a backslash).
So we can say, "Use asterisks (*) around words or phrases to
*emphasisze* them." The '(*)' will not be recognized as markup.  This
reduces the need for markup escaping to the point where an escape
character is *almost* (but not quite!) unnecessary.


Underlining
===========

StructuredText uses '_text_' to indicate underlining.  To quote David
Ascher in his 2000-01-21 Doc-SIG mailing list post, "Docstring
grammar: a very revised proposal":

    The tagging of underlined text with _'s is suboptimal.  Underlines
    shouldn't be used from a typographic perspective (underlines were
    designed to be used in manuscripts to communicate to the
    typesetter that the text should be italicized -- no well-typeset
    book ever uses underlines), and conflict with double-underscored
    Python variable names (__init__ and the like), which would get
    truncated and underlined when that effect is not desired.  Note
    that while *complete* markup would prevent that truncation
    ('__init__'), I think of docstring markups much like I think of
    type annotations -- they should be optional and above all do no
    harm.  In this case the underline markup does harm.

Underlining is not part of the reStructuredText specification.


Inline Literals
===============

StructuredText's markup for inline literals (text left as-is,
verbatim, usually in a monospaced font; as in HTML <TT>) is single
quotes ('literals').  The problem with single quotes is that they are
too often used for other purposes:

- Apostrophes: "Don't blame me, 'cause it ain't mine, it's Chris'.";

- Quoting text:

      First Bruce: "Well Bruce, I heard the prime minister use it.
      'S'hot enough to boil a monkey's bum in 'ere your Majesty,' he
      said, and she smiled quietly to herself."

  In the UK, single quotes are used for dialogue in published works.

- String literals: s = ''

Alternatives::

    'text'    \'text\'    ''text''    "text"    \"text\"    ""text""
    #text#     @text@      `text`     ^text^    ``text''    ``text``

The examples below contain inline literals, quoted text, and
apostrophes.  Each example should evaluate to the following HTML::

    Some <TT>code</TT>, with a 'quote', "double", ain't it grand?
    Does <TT>a[b] = 'c' + "d" + `2^3`</TT> work?

    0. Some code, with a quote, double, ain't it grand?
       Does a[b] = 'c' + "d" + `2^3` work?
    1. Some 'code', with a \'quote\', "double", ain\'t it grand?
       Does 'a[b] = \'c\' + "d" + `2^3`' work?
    2. Some \'code\', with a 'quote', "double", ain't it grand?
       Does \'a[b] = 'c' + "d" + `2^3`\' work?
    3. Some ''code'', with a 'quote', "double", ain't it grand?
       Does ''a[b] = 'c' + "d" + `2^3`'' work?
    4. Some "code", with a 'quote', \"double\", ain't it grand?
       Does "a[b] = 'c' + "d" + `2^3`" work?
    5. Some \"code\", with a 'quote', "double", ain't it grand?
       Does \"a[b] = 'c' + "d" + `2^3`\" work?
    6. Some ""code"", with a 'quote', "double", ain't it grand?
       Does ""a[b] = 'c' + "d" + `2^3`"" work?
    7. Some #code#, with a 'quote', "double", ain't it grand?
       Does #a[b] = 'c' + "d" + `2^3`# work?
    8. Some @code@, with a 'quote', "double", ain't it grand?
       Does @a[b] = 'c' + "d" + `2^3`@ work?
    9. Some `code`, with a 'quote', "double", ain't it grand?
       Does `a[b] = 'c' + "d" + \`2^3\`` work?
    10. Some ^code^, with a 'quote', "double", ain't it grand?
        Does ^a[b] = 'c' + "d" + `2\^3`^ work?
    11. Some ``code'', with a 'quote', "double", ain't it grand?
        Does ``a[b] = 'c' + "d" + `2^3`'' work?
    12. Some ``code``, with a 'quote', "double", ain't it grand?
        Does ``a[b] = 'c' + "d" + `2^3\``` work?

Backquotes (#9 & #12) are the best choice.  They are unobtrusive and
relatviely rarely used (more rarely than ' or ", anyhow).  Backquotes
have the connotation of 'quotes', which other options (like carets,
#10) don't.

Analogously with ``*emph*`` & ``**strong**``, double-backquotes (#12)
could be used for inline literals.  If single-backquotes are used for
'interpreted text' (context-sensitive domain-specific descriptive
markup) such as function name hyperlinks in Python docstrings, then
double-backquotes could be used for absolute-literals, wherein no
processing whatsoever takes place.  An advantage of double-backquotes
would be that backslash-escaping would no longer be necessary for
embedded single-backquotes; however, embedded double-backquotes (in an
end-string context) would be illegal.  See `Backquotes in
Phrase-Links`__ in `Record of reStructuredText Syntax Alternatives`__.

__ alternatives.html#backquotes-in-phrase-links
__ alternatives.html

Alternative choices are carets (#10) and TeX-style quotes (#11).  For
examples of TeX-style quoting, see
http://www.zope.org/Members/jim/StructuredTextWiki/CustomizingTheDocumentProcessor.

Some existing uses of backquotes:

1. As a synonym for repr() in Python.
2. For command-interpolation in shell scripts.
3. Used as open-quotes in TeX code (and carried over into plaintext
   by TeXies).

The inline markup start-string and end-string recognition rules
defined by the `reStructuredText markup specification`_ would allow
all of these cases inside inline literals, with very few exceptions.
As a fallback, literal blocks could handle all cases.

Outside of inline literals, the above uses of backquotes would require
backslash-escaping.  However, these are all prime examples of text
that should be marked up with inline literals.

If either backquotes or straight single-quotes are used as markup,
TeX-quotes are too troublesome to support, so no special-casing of
TeX-quotes should be done (at least at first).  If TeX-quotes have to
be used outside of literals, a single backslash-escaped would suffice:
\``TeX quote''.  Ugly, true, but very infrequently used.

Using literal blocks is a fallback option which removes the need for
backslash-escaping::

    like this::

        Here, we can do ``absolutely'' anything `'`'\|/|\ we like!

No mechanism for inline literals is perfect, just as no escaping
mechanism is perfect.  No matter what we use, complicated inline
expressions involving the inline literal quote and/or the backslash
will end up looking ugly.  We can only choose the least often ugly
option.

reStructuredText will use double backquotes for inline literals, and
single backqoutes for interpreted text.


Hyperlinks
==========

There are three forms of hyperlink currently in StructuredText_:

1. (Absolute & relative URIs.)  Text enclosed by double quotes
   followed by a colon, a URI, and concluded by punctuation plus white
   space, or just white space, is treated as a hyperlink::

       "Python":http://www.python.org/

2. (Absolute URIs only.)  Text enclosed by double quotes followed by a
   comma, one or more spaces, an absolute URI and concluded by
   punctuation plus white space, or just white space, is treated as a
   hyperlink::

       "mail me", mailto:me at mail.com

3. (Endnotes.)  Text enclosed by brackets link to an endnote at the
   end of the document: at the beginning of the line, two dots, a
   space, and the same text in brackets, followed by the end note
   itself::

       Please refer to the fine manual [GVR2001].

       .. [GVR2001] Python Documentation, Release 2.1, van Rossum,
          Drake, et al., http://www.python.org/doc/

The problem with forms 1 and 2 is that they are neither intuitive nor
unobtrusive (they break design goals 5 & 2).  They overload
double-quotes, which are too often used in ordinary text (potentially
breaking design goal 4).  The brackets in form 3 are also too common
in ordinary text (such as [nested] asides and Python lists like [12]).

Alternatives:

1. Have no special markup for hyperlinks.

2. A. Interpret and mark up hyperlinks as any contiguous text
      containing '://' or ':...@' (absolute URI) or '@' (email
      address) after an alphanumeric word.  To de-emphasize the URI,
      simply enclose it in parentheses:

          Python (http://www.python.org/)

   B. Leave special hyperlink markup as a domain-specific extension.
      Hyperlinks in ordinary reStructuredText documents would be
      required to be standalone (i.e. the URI text inline in the
      document text).  Processed hyperlinks (where the URI text is
      hidden behind the link) are important enough to warrant syntax.

3. The original Setext_ introduced a mechanism of indirect hyperlinks.
   A source link word ('hot word') in the text was given a trailing
   underscore::

       Here is some text with a hyperlink_ built in.

   The hyperlink itself appeared at the end of the document on a line
   by itself, beginning with two dots, a space, the link word with a
   leading underscore, whitespace, and the URI itself::

       .. _hyperlink http://www.123.xyz

   Setext used ``underscores_instead_of_spaces_`` for phrase links.

With some modification, alternative 3 best satisfies the design goals.
It has the advantage of being readable and relatively unobtrusive.
Since each source link must match up to a target, the odd variable
ending in an underscore can be spared being marked up (although it
should generate a "no such link target" warning).  The only
disadvantage is that phrase-links aren't possible without some
obtrusive syntax.

We could achieve phrase-links if we enclose the link text:

1. in double quotes::

       "like this"_

2. in brackets::

       [like this]_

3. or in backquotes::

       `like this`_

Each gives us somewhat obtrusive markup, but that is unavoidable.  The
bracketed syntax (#2) is reminiscent of links on many web pages
(intuitive), although it is somewhat obtrusive.  Alternative #3 is
much less obtrusive, and is consistent with interpreted text: the
trailing underscore indicates the interpretation of the phrase, as a
hyperlink.  #3 also disambiguates hyperlinks from footnote references.
Alternative #3 wins.

The same trailing underscore markup can also be used for footnote and
citation references, removing the problem with ordinary bracketed text
and Python lists::

    Please refer to the fine manual [GVR2000]_.

    .. [GVR2000] Python Documentation, van Rossum, Drake, et al.,
       http://www.python.org/doc/

The two-dots-and-a-space syntax was generalized by Setext for
comments, which are removed from the (visible) processed output.
reStructuredText uses this syntax for comments, footnotes, and link
target, collectively termed "explicit markup".  For link targets, in
order to eliminate ambiguity with comments and footnotes,
reStructuredText specifies that a colon always follow the link target
word/phrase.  The colon denotes 'maps to'.  There is no reason to
restrict target links to the end of the document; they could just as
easily be interspersed.

Internal hyperlinks (links from one point to another within a single
document) can be expressed by a source link as before, and a target
link with a colon but no URI.  In effect, these targets 'map to' the
element immediately following.

As an added bonus, we now have a perfect candidate for
reStructuredText directives, a simple extension mechanism: explicit
markup containing a single word followed by two colons and whitespace.
The interpretation of subsequent data on the directive line or
following is directive-dependent.

To summarize::

    .. This is a comment.

    .. The line below is an example of a directive.
    .. version:: 1

    This is a footnote [1]_.

    This internal hyperlink will take us to the footnotes_ area below.

    Here is a one-word_ external hyperlink.

    Here is `a hyperlink phrase`_.

    .. _footnotes:
    .. [1] Footnote text goes here.

    .. external hyperlink target mappings:
    .. _one-word: http://www.123.xyz
    .. _a hyperlink phrase: http://www.123.xyz

The presence or absence of a colon after the target link
differentiates an indirect hyperlink from a footnote, respectively.  A
footnote requires brackets.  Backquotes around a target link word or
phrase are required if the phrase contains a colon, optional
otherwise.

Below are examples using no markup, the two StructuredText hypertext
styles, and the reStructuredText hypertext style.  Each example
contains an indirect link, a direct link, a footnote/endnote, and
bracketed text.  In HTML, each example should evaluate to::

    <P>A <A HREF="http://spam.org">URI</A>, see <A HREF="#eggs2000">
    [eggs2000]</A> (in Bacon [Publisher]).  Also see
    <A HREF="http://eggs.org">http://eggs.org</A>.</P>

    <P><A NAME="eggs2000">[eggs2000]</A> "Spam, Spam, Spam, Eggs,
    Bacon, and Spam"</P>

1. No markup::

       A URI http://spam.org, see eggs2000 (in Bacon [Publisher]).
       Also see http://eggs.org.

       eggs2000 "Spam, Spam, Spam, Eggs, Bacon, and Spam"

2. StructuredText absolute/relative URI syntax
   ("text":http://www.url.org)::

       A "URI":http://spam.org, see [eggs2000] (in Bacon [Publisher]).
       Also see "http://eggs.org":http://eggs.org.

       .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam"

   Note that StructuredText does not recognize standalone URIs,
   forcing doubling up as shown in the second line of the example
   above.

3. StructuredText absolute-only URI syntax
   ("text", mailto:you at your.com)::

       A "URI", http://spam.org, see [eggs2000] (in Bacon
       [Publisher]).  Also see "http://eggs.org", http://eggs.org.

       .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam"

4. reStructuredText syntax::

    4. A URI_, see [eggs2000]_ (in Bacon [Publisher]).
       Also see http://eggs.org.

       .. _URI: http:/spam.org
       .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam"

The bracketed text '[Publisher]' may be problematic with
StructuredText (syntax 2 & 3).

reStructuredText's syntax (#4) is definitely the most readable.  The
text is separated from the link URI and the footnote, resulting in
cleanly readable text.

.. _StructuredText:
   http://dev.zope.org/Members/jim/StructuredTextWiki/FrontPage
.. _Setext: http://docutils.sourceforge.net/mirror/setext.html
.. _reStructuredText: http://docutils.sourceforge.net/rst.html
.. _detailed description:
   http://homepage.ntlworld.com/tibsnjoan/docutils/STNG-format.html
.. _STMinus: http://www.cis.upenn.edu/~edloper/pydoc/stminus.html
.. _StructuredTextNG:
   http://dev.zope.org/Members/jim/StructuredTextWiki/StructuredTextNG
.. _README: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/~checkout~/
   python/python/dist/src/README
.. _Emacs table mode: http://table.sourceforge.net/
.. _reStructuredText Markup Specification:
   ../../ref/rst/restructuredtext.html


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   End:



More information about the Zope-Checkins mailing list