[Zope-Checkins] CVS: Zope/lib/python/third_party/docutils/docs/peps
- pep-0256.txt:1.1.2.1 pep-0257.txt:1.1.2.1
pep-0258.txt:1.1.2.1 pep-0287.txt:1.1.2.1
Andreas Jung
andreas at andreas-jung.com
Fri Oct 29 14:24:48 EDT 2004
Update of /cvs-repository/Zope/lib/python/third_party/docutils/docs/peps
In directory cvs.zope.org:/tmp/cvs-serv11767/docutils/docs/peps
Added Files:
Tag: ajung-docutils-cleanup-branch
pep-0256.txt pep-0257.txt pep-0258.txt pep-0287.txt
Log Message:
moved
=== Added File Zope/lib/python/third_party/docutils/docs/peps/pep-0256.txt ===
PEP: 256
Title: Docstring Processing System Framework
Version: $Revision: 1.1.2.1 $
Last-Modified: $Date: 2004/10/29 18:24:46 $
Author: David Goodger <goodger at users.sourceforge.net>
Discussions-To: <doc-sig at python.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 01-Jun-2001
Post-History: 13-Jun-2001
Abstract
========
Python lends itself to inline documentation. With its built-in
docstring syntax, a limited form of `Literate Programming`_ is easy to
do in Python. However, there are no satisfactory standard tools for
extracting and processing Python docstrings. The lack of a standard
toolset is a significant gap in Python's infrastructure; this PEP aims
to fill the gap.
The issues surrounding docstring processing have been contentious and
difficult to resolve. This PEP proposes a generic Docstring
Processing System (DPS) framework, which separates out the components
(program and conceptual), enabling the resolution of individual issues
either through consensus (one solution) or through divergence (many).
It promotes standard interfaces which will allow a variety of plug-in
components (input context readers, markup parsers, and output format
writers) to be used.
The concepts of a DPS framework are presented independently of
implementation details.
Road Map to the Docstring PEPs
==============================
There are many aspects to docstring processing. The "Docstring PEPs"
have broken up the issues in order to deal with each of them in
isolation, or as close as possible. The individual aspects and
associated PEPs are as follows:
* Docstring syntax. PEP 287, "reStructuredText Docstring Format"
[#PEP-287]_, proposes a syntax for Python docstrings, PEPs, and
other uses.
* Docstring semantics consist of at least two aspects:
- Conventions: the high-level structure of docstrings. Dealt with
in PEP 257, "Docstring Conventions" [#PEP-257]_.
- Methodology: rules for the informational content of docstrings.
Not addressed.
* Processing mechanisms. This PEP (PEP 256) outlines the high-level
issues and specification of an abstract docstring processing system
(DPS). PEP 258, "Docutils Design Specification" [#PEP-258]_, is an
overview of the design and implementation of one DPS under
development.
* Output styles: developers want the documentation generated from
their source code to look good, and there are many different ideas
about what that means. PEP 258 touches on "Stylist Transforms".
This aspect of docstring processing has yet to be fully explored.
By separating out the issues, we can form consensus more easily
(smaller fights ;-), and accept divergence more readily.
Rationale
=========
There are standard inline documentation systems for some other
languages. For example, Perl has POD_ ("Plain Old Documentation") and
Java has Javadoc_, but neither of these mesh with the Pythonic way.
POD syntax is very explicit, but takes after Perl in terms of
readability. Javadoc is HTML-centric; except for "``@field``" tags,
raw HTML is used for markup. There are also general tools such as
Autoduck_ and Web_ (Tangle & Weave), useful for multiple languages.
There have been many attempts to write auto-documentation systems
for Python (not an exhaustive list):
- Marc-Andre Lemburg's doc.py_
- Daniel Larsson's pythondoc_ & gendoc_
- Doug Hellmann's HappyDoc_
- Laurence Tratt's Crystal (no longer available on the web)
- Ka-Ping Yee's pydoc_ (pydoc.py is now part of the Python standard
library; see below)
- Tony Ibbs' docutils_ (Tony has donated this name to the `Docutils
project`_)
- Edward Loper's STminus_ formalization and related efforts
These systems, each with different goals, have had varying degrees of
success. A problem with many of the above systems was over-ambition
combined with inflexibility. They provided a self-contained set of
components: a docstring extraction system, a markup parser, an
internal processing system and one or more output format writers with
a fixed style. Inevitably, one or more aspects of each system had
serious shortcomings, and they were not easily extended or modified,
preventing them from being adopted as standard tools.
It has become clear (to this author, at least) that the "all or
nothing" approach cannot succeed, since no monolithic self-contained
system could possibly be agreed upon by all interested parties. A
modular component approach designed for extension, where components
may be multiply implemented, may be the only chance for success.
Standard inter-component APIs will make the DPS components
comprehensible without requiring detailed knowledge of the whole,
lowering the barrier for contributions, and ultimately resulting in a
rich and varied system.
Each of the components of a docstring processing system should be
developed independently. A "best of breed" system should be chosen,
either merged from existing systems, and/or developed anew. This
system should be included in Python's standard library.
PyDoc & Other Existing Systems
------------------------------
PyDoc became part of the Python standard library as of release 2.1.
It extracts and displays docstrings from within the Python interactive
interpreter, from the shell command line, and from a GUI window into a
web browser (HTML). Although a very useful tool, PyDoc has several
deficiencies, including:
- In the case of the GUI/HTML, except for some heuristic hyperlinking
of identifier names, no formatting of the docstrings is done. They
are presented within ``<p><small><tt>`` tags to avoid unwanted line
wrapping. Unfortunately, the result is not attractive.
- PyDoc extracts docstrings and structural information (class
identifiers, method signatures, etc.) from imported module objects.
There are security issues involved with importing untrusted code.
Also, information from the source is lost when importing, such as
comments, "additional docstrings" (string literals in non-docstring
contexts; see PEP 258 [#PEP-258]_), and the order of definitions.
The functionality proposed in this PEP could be added to or used by
PyDoc when serving HTML pages. The proposed docstring processing
system's functionality is much more than PyDoc needs in its current
form. Either an independent tool will be developed (which PyDoc may
or may not use), or PyDoc could be expanded to encompass this
functionality and *become* the docstring processing system (or one
such system). That decision is beyond the scope of this PEP.
Similarly for other existing docstring processing systems, their
authors may or may not choose compatibility with this framework.
However, if this framework is accepted and adopted as the Python
standard, compatibility will become an important consideration in
these systems' future.
Specification
=============
The docstring processing system framework is broken up as follows:
1. Docstring conventions. Documents issues such as:
- What should be documented where.
- First line is a one-line synopsis.
PEP 257 [#PEP-257]_ documents some of these issues.
2. Docstring processing system design specification. Documents
issues such as:
- High-level spec: what a DPS does.
- Command-line interface for executable script.
- System Python API.
- Docstring extraction rules.
- Readers, which encapsulate the input context.
- Parsers.
- Document tree: the intermediate internal data structure. The
output of the Parser and Reader, and the input to the Writer all
share the same data structure.
- Transforms, which modify the document tree.
- Writers for output formats.
- Distributors, which handle output management (one file, many
files, or objects in memory).
These issues are applicable to any docstring processing system
implementation. PEP 258 [#PEP-258]_ documents these issues.
3. Docstring processing system implementation.
4. Input markup specifications: docstring syntax. PEP 287 [#PEP-287]_
proposes a standard syntax.
5. Input parser implementations.
6. Input context readers ("modes": Python source code, PEP, standalone
text file, email, etc.) and implementations.
7. Stylists: certain input context readers may have associated
stylists which allow for a variety of output document styles.
8. Output formats (HTML, XML, TeX, DocBook, info, etc.) and writer
implementations.
Components 1, 2/3/5, and 4 are the subject of individual companion
PEPs. If there is another implementation of the framework or
syntax/parser, additional PEPs may be required. Multiple
implementations of each of components 6 and 7 will be required; the
PEP mechanism may be overkill for these components.
Project Web Site
================
A SourceForge project has been set up for this work at
http://docutils.sourceforge.net/.
References and Footnotes
========================
.. [#PEP-287] PEP 287, reStructuredText Docstring Format, Goodger
(http://www.python.org/peps/pep-0287.html)
.. [#PEP-257] PEP 257, Docstring Conventions, Goodger, Van Rossum
(http://www.python.org/peps/pep-0257.html)
.. [#PEP-258] PEP 258, Docutils Design Specification, Goodger
(http://www.python.org/peps/pep-0258.html)
.. _Literate Programming: http://www.literateprogramming.com/
.. _POD: http://www.perldoc.com/perl5.6/pod/perlpod.html
.. _Javadoc: http://java.sun.com/j2se/javadoc/
.. _Autoduck:
http://www.helpmaster.com/hlp-developmentaids-autoduck.htm
.. _Web: http://www-cs-faculty.stanford.edu/~knuth/cweb.html
.. _doc.py:
http://www.egenix.com/files/python/SoftwareDescriptions.html#doc.py
.. _pythondoc:
.. _gendoc: http://starship.python.net/crew/danilo/pythondoc/
.. _HappyDoc: http://happydoc.sourceforge.net/
.. _pydoc: http://www.python.org/doc/current/lib/module-pydoc.html
.. _docutils: http://www.tibsnjoan.co.uk/docutils.html
.. _Docutils project: http://docutils.sourceforge.net/
.. _STMinus: http://www.cis.upenn.edu/~edloper/pydoc/
.. _Python Doc-SIG: http://www.python.org/sigs/doc-sig/
Copyright
=========
This document has been placed in the public domain.
Acknowledgements
================
This document borrows ideas from the archives of the `Python
Doc-SIG`_. Thanks to all members past & present.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:
=== Added File Zope/lib/python/third_party/docutils/docs/peps/pep-0257.txt ===
PEP: 257
Title: Docstring Conventions
Version: $Revision: 1.1.2.1 $
Last-Modified: $Date: 2004/10/29 18:24:46 $
Author: David Goodger <goodger at users.sourceforge.net>,
Guido van Rossum <guido at python.org>
Discussions-To: doc-sig at python.org
Status: Active
Type: Informational
Content-Type: text/x-rst
Created: 29-May-2001
Post-History: 13-Jun-2001
Abstract
========
This PEP documents the semantics and conventions associated with
Python docstrings.
Rationale
=========
The aim of this PEP is to standardize the high-level structure of
docstrings: what they should contain, and how to say it (without
touching on any markup syntax within docstrings). The PEP contains
conventions, not laws or syntax.
"A universal convention supplies all of maintainability, clarity,
consistency, and a foundation for good programming habits too.
What it doesn't do is insist that you follow it against your will.
That's Python!"
-- Tim Peters on comp.lang.python, 2001-06-16
If you violate these conventions, the worst you'll get is some dirty
looks. But some software (such as the Docutils_ docstring processing
system [1]_ [2]_) will be aware of the conventions, so following them
will get you the best results.
Specification
=============
What is a Docstring?
--------------------
A docstring is a string literal that occurs as the first statement in
a module, function, class, or method definition. Such a docstring
becomes the ``__doc__`` special attribute of that object.
All modules should normally have docstrings, and all functions and
classes exported by a module should also have docstrings. Public
methods (including the ``__init__`` constructor) should also have
docstrings. A package may be documented in the module docstring of
the ``__init__.py`` file in the package directory.
String literals occurring elsewhere in Python code may also act as
documentation. They are not recognized by the Python bytecode
compiler and are not accessible as runtime object attributes (i.e. not
assigned to ``__doc__``), but two types of extra docstrings may be
extracted by software tools:
1. String literals occurring immediately after a simple assignment at
the top level of a module, class, or ``__init__`` method are called
"attribute docstrings".
2. String literals occurring immediately after another docstring are
called "additional docstrings".
Please see PEP 258, "Docutils Design Specification" [2]_, for a
detailed description of attribute and additional docstrings.
XXX Mention docstrings of 2.2 properties.
For consistency, always use ``"""triple double quotes"""`` around
docstrings. Use ``r"""raw triple double quotes"""`` if you use any
backslashes in your docstrings. For Unicode docstrings, use
``u"""Unicode triple-quoted strings"""``.
There are two forms of docstrings: one-liners and multi-line
docstrings.
One-line Docstrings
--------------------
One-liners are for really obvious cases. They should really fit on
one line. For example::
def kos_root():
"""Return the pathname of the KOS root directory."""
global _kos_root
if _kos_root: return _kos_root
...
Notes:
- Triple quotes are used even though the string fits on one line.
This makes it easy to later expand it.
- The closing quotes are on the same line as the opening quotes. This
looks better for one-liners.
- There's no blank line either before or after the docstring.
- The docstring is a phrase ending in a period. It prescribes the
function or method's effect as a command ("Do this", "Return that"),
not as a description; e.g. don't write "Returns the pathname ...".
- The one-line docstring should NOT be a "signature" reiterating the
function/method parameters (which can be obtained by introspection).
Don't do::
def function(a, b):
"""function(a, b) -> list"""
This type of docstring is only appropriate for C functions (such as
built-ins), where introspection is not possible. However, the
nature of the *return value* cannot be determined by introspection,
so it should be mentioned. The preferred form for such a docstring
would be something like::
def function(a, b):
"""Do X and return a list."""
(Of course "Do X" should be replaced by a useful description!)
Multi-line Docstrings
----------------------
Multi-line docstrings consist of a summary line just like a one-line
docstring, followed by a blank line, followed by a more elaborate
description. The summary line may be used by automatic indexing
tools; it is important that it fits on one line and is separated from
the rest of the docstring by a blank line. The summary line may be on
the same line as the opening quotes or on the next line. The entire
docstring is indented the same as the quotes at its first line (see
example below).
Insert a blank line before and after all docstrings (one-line or
multi-line) that document a class -- generally speaking, the class's
methods are separated from each other by a single blank line, and the
docstring needs to be offset from the first method by a blank line;
for symmetry, put a blank line between the class header and the
docstring. Docstrings documenting functions or methods generally
don't have this requirement, unless the function or method's body is
written as a number of blank-line separated sections -- in this case,
treat the docstring as another section, and precede it with a blank
line.
The docstring of a script (a stand-alone program) should be usable as
its "usage" message, printed when the script is invoked with incorrect
or missing arguments (or perhaps with a "-h" option, for "help").
Such a docstring should document the script's function and command
line syntax, environment variables, and files. Usage messages can be
fairly elaborate (several screens full) and should be sufficient for a
new user to use the command properly, as well as a complete quick
reference to all options and arguments for the sophisticated user.
The docstring for a module should generally list the classes,
exceptions and functions (and any other objects) that are exported by
the module, with a one-line summary of each. (These summaries
generally give less detail than the summary line in the object's
docstring.) The docstring for a package (i.e., the docstring of the
package's ``__init__.py`` module) should also list the modules and
subpackages exported by the package.
The docstring for a function or method should summarize its behavior
and document its arguments, return value(s), side effects, exceptions
raised, and restrictions on when it can be called (all if applicable).
Optional arguments should be indicated. It should be documented
whether keyword arguments are part of the interface.
The docstring for a class should summarize its behavior and list the
public methods and instance variables. If the class is intended to be
subclassed, and has an additional interface for subclasses, this
interface should be listed separately (in the docstring). The class
constructor should be documented in the docstring for its ``__init__``
method. Individual methods should be documented by their own
docstring.
If a class subclasses another class and its behavior is mostly
inherited from that class, its docstring should mention this and
summarize the differences. Use the verb "override" to indicate that a
subclass method replaces a superclass method and does not call the
superclass method; use the verb "extend" to indicate that a subclass
method calls the superclass method (in addition to its own behavior).
*Do not* use the Emacs convention of mentioning the arguments of
functions or methods in upper case in running text. Python is case
sensitive and the argument names can be used for keyword arguments, so
the docstring should document the correct argument names. It is best
to list each argument on a separate line. For example::
def complex(real=0.0, imag=0.0):
"""Form a complex number.
Keyword arguments:
real -- the real part (default 0.0)
imag -- the imaginary part (default 0.0)
"""
if imag == 0.0 and real == 0.0: return complex_zero
...
The BDFL [3]_ recommends inserting a blank line between the last
paragraph in a multi-line docstring and its closing quotes, placing
the closing quotes on a line by themselves. This way, Emacs'
``fill-paragraph`` command can be used on it.
Handling Docstring Indentation
------------------------------
Docstring processing tools will strip a uniform amount of indentation
from the second and further lines of the docstring, equal to the
minimum indentation of all non-blank lines after the first line. Any
indentation in the first line of the docstring (i.e., up to the first
newline) is insignificant and removed. Relative indentation of later
lines in the docstring is retained. Blank lines should be removed
from the beginning and end of the docstring.
Since code is much more precise than words, here is an implementation
of the algorithm::
def trim(docstring):
if not docstring:
return ''
# Convert tabs to spaces (following the normal Python rules)
# and split into a list of lines:
lines = docstring.expandtabs().splitlines()
# Determine minimum indentation (first line doesn't count):
indent = sys.maxint
for line in lines[1:]:
stripped = line.lstrip()
if stripped:
indent = min(indent, len(line) - len(stripped))
# Remove indentation (first line is special):
trimmed = [lines[0].strip()]
if indent < sys.maxint:
for line in lines[1:]:
trimmed.append(line[indent:].rstrip())
# Strip off trailing and leading blank lines:
while trimmed and not trimmed[-1]:
trimmed.pop()
while trimmed and not trimmed[0]:
trimmed.pop(0)
# Return a single string:
return '\n'.join(trimmed)
The docstring in this example contains two newline characters and is
therefore 3 lines long. The first and last lines are blank::
def foo():
"""
This is the second line of the docstring.
"""
To illustrate::
>>> print repr(foo.__doc__)
'\n This is the second line of the docstring.\n '
>>> foo.__doc__.splitlines()
['', ' This is the second line of the docstring.', ' ']
>>> trim(foo.__doc__)
'This is the second line of the docstring.'
Once trimmed, these docstrings are equivalent::
def foo():
"""A multi-line
docstring.
"""
def bar():
"""
A multi-line
docstring.
"""
References and Footnotes
========================
.. [1] PEP 256, Docstring Processing System Framework, Goodger
(http://www.python.org/peps/pep-0256.html)
.. [2] PEP 258, Docutils Design Specification, Goodger
(http://www.python.org/peps/pep-0258.html)
.. [3] Guido van Rossum, Python's creator and Benevolent Dictator For
Life.
.. _Docutils: http://docutils.sourceforge.net/
.. _Python Style Guide:
http://www.python.org/doc/essays/styleguide.html
.. _Doc-SIG: http://www.python.org/sigs/doc-sig/
Copyright
=========
This document has been placed in the public domain.
Acknowledgements
================
The "Specification" text comes mostly verbatim from the `Python Style
Guide`_ essay by Guido van Rossum.
This document borrows ideas from the archives of the Python Doc-SIG_.
Thanks to all members past and present.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
fill-column: 70
sentence-end-double-space: t
End:
=== Added File Zope/lib/python/third_party/docutils/docs/peps/pep-0258.txt ===
PEP: 258
Title: Docutils Design Specification
Version: $Revision: 1.1.2.1 $
Last-Modified: $Date: 2004/10/29 18:24:46 $
Author: David Goodger <goodger at users.sourceforge.net>
Discussions-To: <doc-sig at python.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Requires: 256, 257
Created: 31-May-2001
Post-History: 13-Jun-2001
==========
Abstract
==========
This PEP documents design issues and implementation details for
Docutils, a Python Docstring Processing System (DPS). The rationale
and high-level concepts of a DPS are documented in PEP 256, "Docstring
Processing System Framework" [#PEP-256]_. Also see PEP 256 for a
"Road Map to the Docstring PEPs".
Docutils is being designed modularly so that any of its components can
be replaced easily. In addition, Docutils is not limited to the
processing of Python docstrings; it processes standalone documents as
well, in several contexts.
No changes to the core Python language are required by this PEP. Its
deliverables consist of a package for the standard library and its
documentation.
===============
Specification
===============
Docutils Project Model
======================
Project components and data flow::
+---------------------------+
| Docutils: |
| docutils.core.Publisher, |
| docutils.core.publish_*() |
+---------------------------+
/ | \
/ | \
1,3,5 / 6 | \ 7
+--------+ +-------------+ +--------+
| READER | ----> | TRANSFORMER | ====> | WRITER |
+--------+ +-------------+ +--------+
/ \\ |
/ \\ |
2 / 4 \\ 8 |
+-------+ +--------+ +--------+
| INPUT | | PARSER | | OUTPUT |
+-------+ +--------+ +--------+
The numbers above each component indicate the path a document's data
takes. Double-width lines between Reader & Parser and between
Transformer & Writer indicate that data sent along these paths should
be standard (pure & unextended) Docutils doc trees. Single-width
lines signify that internal tree extensions or completely unrelated
representations are possible, but they must be supported at both ends.
Publisher
---------
The ``docutils.core`` module contains a "Publisher" facade class and
several convenience functions: "publish_cmdline()" (for command-line
front ends), "publish_file()" (for programmatic use with file-like
I/O), and "publish_string()" (for programmatic use with string I/O).
The Publisher class encapsulates the high-level logic of a Docutils
system. The Publisher class has overall responsibility for
processing, controlled by the ``Publisher.publish()`` method:
1. Set up internal settings (may include config files & command-line
options) and I/O objects.
2. Call the Reader object to read data from the source Input object
and parse the data with the Parser object. A document object is
returned.
3. Set up and apply transforms via the Transformer object attached to
the document.
4. Call the Writer object which translates the document to the final
output format and writes the formatted data to the destination
Output object. Depending on the Output object, the output may be
returned from the Writer, and then from the ``publish()`` method.
Calling the "publish" function (or instantiating a "Publisher" object)
with component names will result in default behavior. For custom
behavior (customizing component settings), create custom component
objects first, and pass *them* to the Publisher or ``publish_*``
convenience functions.
Readers
-------
Readers understand the input context (where the data is coming from),
send the whole input or discrete "chunks" to the parser, and provide
the context to bind the chunks together back into a cohesive whole.
Each reader is a module or package exporting a "Reader" class with a
"read" method. The base "Reader" class can be found in the
``docutils/readers/__init__.py`` module.
Most Readers will have to be told what parser to use. So far (see the
list of examples below), only the Python Source Reader ("PySource";
still incomplete) will be able to determine the parser on its own.
Responsibilities:
* Get input text from the source I/O.
* Pass the input text to the parser, along with a fresh `document
tree`_ root.
Examples:
* Standalone (Raw/Plain): Just read a text file and process it.
The reader needs to be told which parser to use.
The "Standalone Reader" has been implemented in module
``docutils.readers.standalone``.
* Python Source: See `Python Source Reader`_ below. This Reader is
currently in development in the Docutils sandbox.
* Email: RFC-822 headers, quoted excerpts, signatures, MIME parts.
* PEP: RFC-822 headers, "PEP xxxx" and "RFC xxxx" conversion to URIs.
The "PEP Reader" has been implemented in module
``docutils.readers.pep``; see PEP 287 and PEP 12.
* Wiki: Global reference lookups of "wiki links" incorporated into
transforms. (CamelCase only or unrestricted?) Lazy
indentation?
* Web Page: As standalone, but recognize meta fields as meta tags.
Support for templates of some sort? (After ``<body>``, before
``</body>``?)
* FAQ: Structured "question & answer(s)" constructs.
* Compound document: Merge chapters into a book. Master manifest
file?
Parsers
-------
Parsers analyze their input and produce a Docutils `document tree`_.
They don't know or care anything about the source or destination of
the data.
Each input parser is a module or package exporting a "Parser" class
with a "parse" method. The base "Parser" class can be found in the
``docutils/parsers/__init__.py`` module.
Responsibilities: Given raw input text and a doctree root node,
populate the doctree by parsing the input text.
Example: The only parser implemented so far is for the
reStructuredText markup. It is implemented in the
``docutils/parsers/rst/`` package.
The development and integration of other parsers is possible and
encouraged.
.. _transforms:
Transformer
-----------
The Transformer class, in ``docutils/transforms/__init__.py``, stores
transforms and applies them to documents. A transformer object is
attached to every new document tree. The Publisher_ calls
``Transformer.apply_transforms()`` to apply all stored transforms to
the document tree. Transforms change the document tree from one form
to another, add to the tree, or prune it. Transforms resolve
references and footnote numbers, process interpreted text, and do
other context-sensitive processing.
Some transforms are specific to components (Readers, Parser, Writers,
Input, Output). Standard component-specific transforms are specified
in the ``default_transforms`` attribute of component classes. After
the Reader has finished processing, the Publisher_ calls
``Transformer.populate_from_components()`` with a list of components
and all default transforms are stored.
Each transform is a class in a module in the ``docutils/transforms/``
package, a subclass of ``docutils.tranforms.Transform``. Transform
classes each have a ``default_priority`` attribute which is used by
the Transformer to apply transforms in order (low to high). The
default priority can be overridden when adding transforms to the
Transformer object.
Transformer responsibilities:
* Apply transforms to the document tree, in priority order.
* Store a mapping of component type name ('reader', 'writer', etc.) to
component objects. These are used by certain transforms (such as
"components.Filter") to determine suitability.
Transform responsibilities:
* Modify a doctree in-place, either purely transforming one structure
into another, or adding new structures based on the doctree and/or
external data.
Examples of transforms (in the ``docutils/transforms/`` package):
* frontmatter.DocInfo: Conversion of document metadata (bibliographic
information).
* references.AnonymousHyperlinks: Resolution of anonymous references
to corresponding targets.
* parts.Contents: Generates a table of contents for a document.
* document.Merger: Combining multiple populated doctrees into one.
(Not yet implemented or fully understood.)
* document.Splitter: Splits a document into a tree-structure of
subdocuments, perhaps by section. It will have to transform
references appropriately. (Neither implemented not remotely
understood.)
* components.Filter: Includes or excludes elements which depend on a
specific Docutils component.
Writers
-------
Writers produce the final output (HTML, XML, TeX, etc.). Writers
translate the internal `document tree`_ structure into the final data
format, possibly running Writer-specific transforms_ first.
By the time the document gets to the Writer, it should be in final
form. The Writer's job is simply (and only) to translate from the
Docutils doctree structure to the target format. Some small
transforms may be required, but they should be local and
format-specific.
Each writer is a module or package exporting a "Writer" class with a
"write" method. The base "Writer" class can be found in the
``docutils/writers/__init__.py`` module.
Responsibilities:
* Translate doctree(s) into specific output formats.
- Transform references into format-native forms.
* Write the translated output to the destination I/O.
Examples:
* XML: Various forms, such as:
- Docutils XML (an expression of the internal document tree,
implemented as ``docutils.writers.docutils_xml``).
- DocBook (being implemented in the Docutils sandbox).
* HTML (XHTML implemented as ``docutils.writers.html4css1``).
* PDF (a ReportLabs interface is being developed in the Docutils
sandbox).
* TeX (a LaTeX Writer is being implemented in the sandbox).
* Docutils-native pseudo-XML (implemented as
``docutils.writers.pseudoxml``, used for testing).
* Plain text
* reStructuredText?
Input/Output
------------
I/O classes provide a uniform API for low-level input and output.
Subclasses will exist for a variety of input/output mechanisms.
However, they can be considered an implementation detail. Most
applications should be satisfied using one of the convenience
functions associated with the Publisher_.
I/O classes are currently in the preliminary stages; there's a lot of
work yet to be done. Issues:
* How to represent multi-file input (files & directories) in the API?
* How to represent multi-file output? Perhaps "Writer" variants, one
for each output distribution type? Or Output objects with
associated transforms?
Responsibilities:
* Read data from the input source (Input objects) or write data to the
output destination (Output objects).
Examples of input sources:
* A single file on disk or a stream (implemented as
``docutils.io.FileInput``).
* Multiple files on disk (``MultiFileInput``?).
* Python source files: modules and packages.
* Python strings, as received from a client application
(implemented as ``docutils.io.StringInput``).
Examples of output destinations:
* A single file on disk or a stream (implemented as
``docutils.io.FileOutput``).
* A tree of directories and files on disk.
* A Python string, returned to a client application (implemented as
``docutils.io.StringOutput``).
* No output; useful for programmatic applications where only a portion
of the normal output is to be used (implemented as
``docutils.io.NullOutput``).
* A single tree-shaped data structure in memory.
* Some other set of data structures in memory.
Docutils Package Structure
==========================
* Package "docutils".
- Module "__init__.py" contains: class "Component", a base class for
Docutils components; class "SettingsSpec", a base class for
specifying runtime settings (used by docutils.frontend); and class
"TransformSpec", a base class for specifying transforms.
- Module "docutils.core" contains facade class "Publisher" and
convenience functions. See `Publisher`_ above.
- Module "docutils.frontend" provides runtime settings support, for
programmatic use and front-end tools (including configuration file
support, and command-line argument and option processing).
- Module "docutils.io" provides a uniform API for low-level input
and output. See `Input/Output`_ above.
- Module "docutils.nodes" contains the Docutils document tree
element class library plus tree-traversal Visitor pattern base
classes. See `Document Tree`_ below.
- Module "docutils.statemachine" contains a finite state machine
specialized for regular-expression-based text filters and parsers.
The reStructuredText parser implementation is based on this
module.
- Module "docutils.urischemes" contains a mapping of known URI
schemes ("http", "ftp", "mail", etc.).
- Module "docutils.utils" contains utility functions and classes,
including a logger class ("Reporter"; see `Error Handling`_
below).
- Package "docutils.parsers": markup parsers_.
- Function "get_parser_class(parser_name)" returns a parser module
by name. Class "Parser" is the base class of specific parsers.
(``docutils/parsers/__init__.py``)
- Package "docutils.parsers.rst": the reStructuredText parser.
- Alternate markup parsers may be added.
See `Parsers`_ above.
- Package "docutils.readers": context-aware input readers.
- Function "get_reader_class(reader_name)" returns a reader module
by name or alias. Class "Reader" is the base class of specific
readers. (``docutils/readers/__init__.py``)
- Module "docutils.readers.standalone" reads independent document
files.
- Module "docutils.readers.pep" reads PEPs (Python Enhancement
Proposals).
- Readers to be added for: Python source code (structure &
docstrings), email, FAQ, and perhaps Wiki and others.
See `Readers`_ above.
- Package "docutils.writers": output format writers.
- Function "get_writer_class(writer_name)" returns a writer module
by name. Class "Writer" is the base class of specific writers.
(``docutils/writers/__init__.py``)
- Module "docutils.writers.html4css1" is a simple HyperText Markup
Language document tree writer for HTML 4.01 and CSS1.
- Module "docutils.writers.docutils_xml" writes the internal
document tree in XML form.
- Module "docutils.writers.pseudoxml" is a simple internal
document tree writer; it writes indented pseudo-XML.
- Writers to be added: HTML 3.2 or 4.01-loose, XML (various forms,
such as DocBook), PDF, TeX, plaintext, reStructuredText, and
perhaps others.
See `Writers`_ above.
- Package "docutils.transforms": tree transform classes.
- Class "Transformer" stores transforms and applies them to
document trees. (``docutils/transforms/__init__.py``)
- Class "Transform" is the base class of specific transforms.
(``docutils/transforms/__init__.py``)
- Each module contains related transform classes.
See `Transforms`_ above.
- Package "docutils.languages": Language modules contain
language-dependent strings and mappings. They are named for their
language identifier (as defined in `Choice of Docstring Format`_
below), converting dashes to underscores.
- Function "get_language(language_code)", returns matching
language module. (``docutils/languages/__init__.py``)
- Modules: en.py (English), de.py (German), fr.py (French), it.py
(Italian), sk.py (Slovak), sv.py (Swedish).
- Other languages to be added.
* Third-party modules: "extras" directory. These modules are
installed only if they're not already present in the Python
installation.
- ``extras/optparse.py`` and ``extras/textwrap.py`` provide
option parsing and command-line help; from Greg Ward's
http://optik.sf.net/ project, included for convenience.
- ``extras/roman.py`` contains Roman numeral conversion routines.
Front-End Tools
===============
The ``tools/`` directory contains several front ends for common
Docutils processing. See `Docutils Front-End Tools`_ for details.
.. _Docutils Front-End Tools:
http://docutils.sourceforge.net/docs/user/tools.html
Document Tree
=============
A single intermediate data structure is used internally by Docutils,
in the interfaces between components; it is defined in the
``docutils.nodes`` module. It is not required that this data
structure be used *internally* by any of the components, just
*between* components as outlined in the diagram in the `Docutils
Project Model`_ above.
Custom node types are allowed, provided that either (a) a transform
converts them to standard Docutils nodes before they reach the Writer
proper, or (b) the custom node is explicitly supported by certain
Writers, and is wrapped in a filtered "pending" node. An example of
condition (a) is the `Python Source Reader`_ (see below), where a
"stylist" transform converts custom nodes. The HTML ``<meta>`` tag is
an example of condition (b); it is supported by the HTML Writer but
not by others. The reStructuredText "meta" directive creates a
"pending" node, which contains knowledge that the embedded "meta" node
can only be handled by HTML-compatible writers. The "pending" node is
resolved by the ``docutils.transforms.components.Filter`` transform,
which checks that the calling writer supports HTML; if it doesn't, the
"pending" node (and enclosed "meta" node) is removed from the
document.
The document tree data structure is similar to a DOM tree, but with
specific node names (classes) instead of DOM's generic nodes. The
schema is documented in an XML DTD (eXtensible Markup Language
Document Type Definition), which comes in two parts:
* the Docutils Generic DTD, docutils.dtd_, and
* the OASIS Exchange Table Model, soextbl.dtd_.
The DTD defines a rich set of elements, suitable for many input and
output formats. The DTD retains all information necessary to
reconstruct the original input text, or a reasonable facsimile
thereof.
See `The Docutils Document Tree`_ for details (incomplete).
Error Handling
==============
When the parser encounters an error in markup, it inserts a system
message (DTD element "system_message"). There are five levels of
system messages:
* Level-0, "DEBUG": an internal reporting issue. There is no effect
on the processing. Level-0 system messages are handled separately
from the others.
* Level-1, "INFO": a minor issue that can be ignored. There is little
or no effect on the processing. Typically level-1 system messages
are not reported.
* Level-2, "WARNING": an issue that should be addressed. If ignored,
there may be minor problems with the output. Typically level-2
system messages are reported but do not halt processing
* Level-3, "ERROR": a major issue that should be addressed. If
ignored, the output will contain unpredictable errors. Typically
level-3 system messages are reported but do not halt processing
* Level-4, "SEVERE": a critical error that must be addressed.
Typically level-4 system messages are turned into exceptions which
halt processing. If ignored, the output will contain severe errors.
Although the initial message levels were devised independently, they
have a strong correspondence to `VMS error condition severity
levels`_; the names in quotes for levels 1 through 4 were borrowed
from VMS. Error handling has since been influenced by the `log4j
project`_.
Python Source Reader
====================
The Python Source Reader ("PySource") is the Docutils component that
reads Python source files, extracts docstrings in context, then
parses, links, and assembles the docstrings into a cohesive whole. It
is a major and non-trivial component, currently under experimental
development in the Docutils sandbox. High-level design issues are
presented here.
Processing Model
----------------
This model will evolve over time, incorporating experience and
discoveries.
1. The PySource Reader uses an Input class to read in Python packages
and modules, into a tree of strings.
2. The Python modules are parsed, converting the tree of strings into
a tree of abstract syntax trees with docstring nodes.
3. The abstract syntax trees are converted into an internal
representation of the packages/modules. Docstrings are extracted,
as well as code structure details. See `AST Mining`_ below.
Namespaces are constructed for lookup in step 6.
4. One at a time, the docstrings are parsed, producing standard
Docutils doctrees.
5. PySource assembles all the individual docstrings' doctrees into a
Python-specific custom Docutils tree paralleling the
package/module/class structure; this is a custom Reader-specific
internal representation (see the `Docutils Python Source DTD`_).
Namespaces must be merged: Python identifiers, hyperlink targets.
6. Cross-references from docstrings (interpreted text) to Python
identifiers are resolved according to the Python namespace lookup
rules. See `Identifier Cross-References`_ below.
7. A "Stylist" transform is applied to the custom doctree (by the
Transformer_), custom nodes are rendered using standard nodes as
primitives, and a standard document tree is emitted. See `Stylist
Transforms`_ below.
8. Other transforms are applied to the standard doctree by the
Transformer_.
9. The standard doctree is sent to a Writer, which translates the
document into a concrete format (HTML, PDF, etc.).
10. The Writer uses an Output class to write the resulting data to its
destination (disk file, directories and files, etc.).
AST Mining
----------
Abstract Syntax Tree mining code will be written (or adapted) that
scans a parsed Python module, and returns an ordered tree containing
the names, docstrings (including attribute and additional docstrings;
see below), and additional info (in parentheses below) of all of the
following objects:
* packages
* modules
* module attributes (+ initial values)
* classes (+ inheritance)
* class attributes (+ initial values)
* instance attributes (+ initial values)
* methods (+ parameters & defaults)
* functions (+ parameters & defaults)
(Extract comments too? For example, comments at the start of a module
would be a good place for bibliographic field lists.)
In order to evaluate interpreted text cross-references, namespaces for
each of the above will also be required.
See the python-dev/docstring-develop thread "AST mining", started on
2001-08-14.
Docstring Extraction Rules
--------------------------
1. What to examine:
a) If the "``__all__``" variable is present in the module being
documented, only identifiers listed in "``__all__``" are
examined for docstrings.
b) In the absence of "``__all__``", all identifiers are examined,
except those whose names are private (names begin with "_" but
don't begin and end with "__").
c) 1a and 1b can be overridden by runtime settings.
2. Where:
Docstrings are string literal expressions, and are recognized in
the following places within Python modules:
a) At the beginning of a module, function definition, class
definition, or method definition, after any comments. This is
the standard for Python ``__doc__`` attributes.
b) Immediately following a simple assignment at the top level of a
module, class definition, or ``__init__`` method definition,
after any comments. See `Attribute Docstrings`_ below.
c) Additional string literals found immediately after the
docstrings in (a) and (b) will be recognized, extracted, and
concatenated. See `Additional Docstrings`_ below.
d) @@@ 2.2-style "properties" with attribute docstrings? Wait for
syntax?
3. How:
Whenever possible, Python modules should be parsed by Docutils, not
imported. There are several reasons:
- Importing untrusted code is inherently insecure.
- Information from the source is lost when using introspection to
examine an imported module, such as comments and the order of
definitions.
- Docstrings are to be recognized in places where the byte-code
compiler ignores string literal expressions (2b and 2c above),
meaning importing the module will lose these docstrings.
Of course, standard Python parsing tools such as the "parser"
library module should be used.
When the Python source code for a module is not available
(i.e. only the ``.pyc`` file exists) or for C extension modules, to
access docstrings the module can only be imported, and any
limitations must be lived with.
Since attribute docstrings and additional docstrings are ignored by
the Python byte-code compiler, no namespace pollution or runtime bloat
will result from their use. They are not assigned to ``__doc__`` or
to any other attribute. The initial parsing of a module may take a
slight performance hit.
Attribute Docstrings
''''''''''''''''''''
(This is a simplified version of PEP 224 [#PEP-224]_.)
A string literal immediately following an assignment statement is
interpreted by the docstring extraction machinery as the docstring of
the target of the assignment statement, under the following
conditions:
1. The assignment must be in one of the following contexts:
a) At the top level of a module (i.e., not nested inside a compound
statement such as a loop or conditional): a module attribute.
b) At the top level of a class definition: a class attribute.
c) At the top level of the "``__init__``" method definition of a
class: an instance attribute. Instance attributes assigned in
other methods are assumed to be implementation details. (@@@
``__new__`` methods?)
d) A function attribute assignment at the top level of a module or
class definition.
Since each of the above contexts are at the top level (i.e., in the
outermost suite of a definition), it may be necessary to place
dummy assignments for attributes assigned conditionally or in a
loop.
2. The assignment must be to a single target, not to a list or a tuple
of targets.
3. The form of the target:
a) For contexts 1a and 1b above, the target must be a simple
identifier (not a dotted identifier, a subscripted expression,
or a sliced expression).
b) For context 1c above, the target must be of the form
"``self.attrib``", where "``self``" matches the "``__init__``"
method's first parameter (the instance parameter) and "attrib"
is a simple identifier as in 3a.
c) For context 1d above, the target must be of the form
"``name.attrib``", where "``name``" matches an already-defined
function or method name and "attrib" is a simple identifier as
in 3a.
Blank lines may be used after attribute docstrings to emphasize the
connection between the assignment and the docstring.
Examples::
g = 'module attribute (module-global variable)'
"""This is g's docstring."""
class AClass:
c = 'class attribute'
"""This is AClass.c's docstring."""
def __init__(self):
"""Method __init__'s docstring."""
self.i = 'instance attribute'
"""This is self.i's docstring."""
def f(x):
"""Function f's docstring."""
return x**2
f.a = 1
"""Function attribute f.a's docstring."""
Additional Docstrings
'''''''''''''''''''''
(This idea was adapted from PEP 216 [#PEP-216]_.)
Many programmers would like to make extensive use of docstrings for
API documentation. However, docstrings do take up space in the
running program, so some programmers are reluctant to "bloat up" their
code. Also, not all API documentation is applicable to interactive
environments, where ``__doc__`` would be displayed.
Docutils' docstring extraction tools will concatenate all string
literal expressions which appear at the beginning of a definition or
after a simple assignment. Only the first strings in definitions will
be available as ``__doc__``, and can be used for brief usage text
suitable for interactive sessions; subsequent string literals and all
attribute docstrings are ignored by the Python byte-code compiler and
may contain more extensive API information.
Example::
def function(arg):
"""This is __doc__, function's docstring."""
"""
This is an additional docstring, ignored by the byte-code
compiler, but extracted by Docutils.
"""
pass
.. topic:: Issue: ``from __future__ import``
This would break "``from __future__ import``" statements introduced
in Python 2.1 for multiple module docstrings (main docstring plus
additional docstring(s)). The Python Reference Manual specifies:
A future statement must appear near the top of the module. The
only lines that can appear before a future statement are:
* the module docstring (if any),
* comments,
* blank lines, and
* other future statements.
Resolution?
1. Should we search for docstrings after a ``__future__``
statement? Very ugly.
2. Redefine ``__future__`` statements to allow multiple preceding
string literals?
3. Or should we not even worry about this? There probably
shouldn't be ``__future__`` statements in production code, after
all. Perhaps modules with ``__future__`` statements will simply
have to put up with the single-docstring limitation.
Choice of Docstring Format
--------------------------
Rather than force everyone to use a single docstring format, multiple
input formats are allowed by the processing system. A special
variable, ``__docformat__``, may appear at the top level of a module
before any function or class definitions. Over time or through
decree, a standard format or set of formats should emerge.
A module's ``__docformat__`` variable only applies to the objects
defined in the module's file. In particular, the ``__docformat__``
variable in a package's ``__init__.py`` file does not apply to objects
defined in subpackages and submodules.
The ``__docformat__`` variable is a string containing the name of the
format being used, a case-insensitive string matching the input
parser's module or package name (i.e., the same name as required to
"import" the module or package), or a registered alias. If no
``__docformat__`` is specified, the default format is "plaintext" for
now; this may be changed to the standard format if one is ever
established.
The ``__docformat__`` string may contain an optional second field,
separated from the format name (first field) by a single space: a
case-insensitive language identifier as defined in RFC 1766. A
typical language identifier consists of a 2-letter language code from
`ISO 639`_ (3-letter codes used only if no 2-letter code exists; RFC
1766 is currently being revised to allow 3-letter codes). If no
language identifier is specified, the default is "en" for English.
The language identifier is passed to the parser and can be used for
language-dependent markup features.
Identifier Cross-References
---------------------------
In Python docstrings, interpreted text is used to classify and mark up
program identifiers, such as the names of variables, functions,
classes, and modules. If the identifier alone is given, its role is
inferred implicitly according to the Python namespace lookup rules.
For functions and methods (even when dynamically assigned),
parentheses ('()') may be included::
This function uses `another()` to do its work.
For class, instance and module attributes, dotted identifiers are used
when necessary. For example (using reStructuredText markup)::
class Keeper(Storer):
"""
Extend `Storer`. Class attribute `instances` keeps track
of the number of `Keeper` objects instantiated.
"""
instances = 0
"""How many `Keeper` objects are there?"""
def __init__(self):
"""
Extend `Storer.__init__()` to keep track of instances.
Keep count in `Keeper.instances`, data in `self.data`.
"""
Storer.__init__(self)
Keeper.instances += 1
self.data = []
"""Store data in a list, most recent last."""
def store_data(self, data):
"""
Extend `Storer.store_data()`; append new `data` to a
list (in `self.data`).
"""
self.data = data
Each of the identifiers quoted with backquotes ("`") will become
references to the definitions of the identifiers themselves.
Stylist Transforms
------------------
Stylist transforms are specialized transforms specific to the PySource
Reader. The PySource Reader doesn't have to make any decisions as to
style; it just produces a logically constructed document tree, parsed
and linked, including custom node types. Stylist transforms
understand the custom nodes created by the Reader and convert them
into standard Docutils nodes.
Multiple Stylist transforms may be implemented and one can be chosen
at runtime (through a "--style" or "--stylist" command-line option).
Each Stylist transform implements a different layout or style; thus
the name. They decouple the context-understanding part of the Reader
from the layout-generating part of processing, resulting in a more
flexible and robust system. This also serves to "separate style from
content", the SGML/XML ideal.
By keeping the piece of code that does the styling small and modular,
it becomes much easier for people to roll their own styles. The
"barrier to entry" is too high with existing tools; extracting the
stylist code will lower the barrier considerably.
==========================
References and Footnotes
==========================
.. [#PEP-256] PEP 256, Docstring Processing System Framework, Goodger
(http://www.python.org/peps/pep-0256.html)
.. [#PEP-224] PEP 224, Attribute Docstrings, Lemburg
(http://www.python.org/peps/pep-0224.html)
.. [#PEP-216] PEP 216, Docstring Format, Zadka
(http://www.python.org/peps/pep-0216.html)
.. _docutils.dtd:
http://docutils.sourceforge.net/docs/ref/docutils.dtd
.. _soextbl.dtd:
http://docutils.sourceforge.net/docs/ref/soextblx.dtd
.. _The Docutils Document Tree:
http://docutils.sourceforge.net/docs/ref/doctree.html
.. _VMS error condition severity levels:
http://www.openvms.compaq.com:8000/73final/5841/841pro_027.html
#error_cond_severity
.. _log4j project: http://logging.apache.org/log4j/docs/index.html
.. _Docutils Python Source DTD:
http://docutils.sourceforge.net/docs/dev/pysource.dtd
.. _ISO 639: http://lcweb.loc.gov/standards/iso639-2/englangn.html
.. _Python Doc-SIG: http://www.python.org/sigs/doc-sig/
==================
Project Web Site
==================
A SourceForge project has been set up for this work at
http://docutils.sourceforge.net/.
===========
Copyright
===========
This document has been placed in the public domain.
==================
Acknowledgements
==================
This document borrows ideas from the archives of the `Python
Doc-SIG`_. Thanks to all members past & present.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:
=== Added File Zope/lib/python/third_party/docutils/docs/peps/pep-0287.txt ===
PEP: 287
Title: reStructuredText Docstring Format
Version: $Revision: 1.1.2.1 $
Last-Modified: $Date: 2004/10/29 18:24:46 $
Author: David Goodger <goodger at users.sourceforge.net>
Discussions-To: <doc-sig at python.org>
Status: Draft
Type: Informational
Content-Type: text/x-rst
Created: 25-Mar-2002
Post-History: 02-Apr-2002
Replaces: 216
Abstract
========
When plaintext hasn't been expressive enough for inline documentation,
Python programmers have sought out a format for docstrings. This PEP
proposes that the `reStructuredText markup`_ be adopted as a standard
markup format for structured plaintext documentation in Python
docstrings, and for PEPs and ancillary documents as well.
reStructuredText is a rich and extensible yet easy-to-read,
what-you-see-is-what-you-get plaintext markup syntax.
Only the low-level syntax of docstrings is addressed here. This PEP
is not concerned with docstring semantics or processing at all (see
PEP 256 for a "Road Map to the Docstring PEPs"). Nor is it an attempt
to deprecate pure plaintext docstrings, which are always going to be
legitimate. The reStructuredText markup is an alternative for those
who want more expressive docstrings.
Benefits
========
Programmers are by nature a lazy breed. We reuse code with functions,
classes, modules, and subsystems. Through its docstring syntax,
Python allows us to document our code from within. The "holy grail"
of the Python Documentation Special Interest Group (Doc-SIG_) has been
a markup syntax and toolset to allow auto-documentation, where the
docstrings of Python systems can be extracted in context and processed
into useful, high-quality documentation for multiple purposes.
Document markup languages have three groups of customers: the authors
who write the documents, the software systems that process the data,
and the readers, who are the final consumers and the most important
group. Most markups are designed for the authors and software
systems; readers are only meant to see the processed form, either on
paper or via browser software. ReStructuredText is different: it is
intended to be easily readable in source form, without prior knowledge
of the markup. ReStructuredText is entirely readable in plaintext
format, and many of the markup forms match common usage (e.g.,
``*emphasis*``), so it reads quite naturally. Yet it is rich enough
to produce complex documents, and extensible so that there are few
limits. Of course, to write reStructuredText documents some prior
knowledge is required.
The markup offers functionality and expressivity, while maintaining
easy readability in the source text. The processed form (HTML etc.)
makes it all accessible to readers: inline live hyperlinks; live links
to and from footnotes; automatic tables of contents (with live
links!); tables; images for diagrams etc.; pleasant, readable styled
text.
The reStructuredText parser is available now, part of the Docutils_
project. Standalone reStructuredText documents and PEPs can be
converted to HTML; other output format writers are being worked on and
will become available over time. Work is progressing on a Python
source "Reader" which will implement auto-documentation from
docstrings. Authors of existing auto-documentation tools are
encouraged to integrate the reStructuredText parser into their
projects, or better yet, to join forces to produce a world-class
toolset for the Python standard library.
Tools will become available in the near future, which will allow
programmers to generate HTML for online help, XML for multiple
purposes, and eventually PDF, DocBook, and LaTeX for printed
documentation, essentially "for free" from the existing docstrings.
The adoption of a standard will, at the very least, benefit docstring
processing tools by preventing further "reinventing the wheel".
Eventually PyDoc, the one existing standard auto-documentation tool,
could have reStructuredText support added. In the interim it will
have no problem with reStructuredText markup, since it treats all
docstrings as preformatted plaintext.
Goals
=====
These are the generally accepted goals for a docstring format, as
discussed in the Doc-SIG:
1. It must be readable in source form by the casual observer.
2. It must be easy to type with any standard text editor.
3. It must not need to contain information which can be deduced from
parsing the module.
4. It must contain sufficient information (structure) so it can be
converted to any reasonable markup format.
5. It must be possible to write a module's entire documentation in
docstrings, without feeling hampered by the markup language.
reStructuredText meets and exceeds all of these goals, and sets its
own goals as well, even more stringent. See `Docstring-Significant
Features`_ below.
The goals of this PEP are as follows:
1. To establish reStructuredText as a standard structured plaintext
format for docstrings (inline documentation of Python modules and
packages), PEPs, README-type files and other standalone documents.
"Accepted" status will be sought through Python community consensus
and eventual BDFL pronouncement.
Please note that reStructuredText is being proposed as *a*
standard, not *the only* standard. Its use will be entirely
optional. Those who don't want to use it need not.
2. To solicit and address any related concerns raised by the Python
community.
3. To encourage community support. As long as multiple competing
markups are out there, the development community remains fractured.
Once a standard exists, people will start to use it, and momentum
will inevitably gather.
4. To consolidate efforts from related auto-documentation projects.
It is hoped that interested developers will join forces and work on
a joint/merged/common implementation.
Once reStructuredText is a Python standard, effort can be focused on
tools instead of arguing for a standard. Python needs a standard set
of documentation tools.
With regard to PEPs, one or both of the following strategies may be
applied:
a) Keep the existing PEP section structure constructs (one-line
section headers, indented body text). Subsections can either be
forbidden, or supported with reStructuredText-style underlined
headers in the indented body text.
b) Replace the PEP section structure constructs with the
reStructuredText syntax. Section headers will require underlines,
subsections will be supported out of the box, and body text need
not be indented (except for block quotes).
Strategy (b) is recommended, and its implementation is complete.
Support for RFC 2822 headers has been added to the reStructuredText
parser for PEPs (unambiguous given a specific context: the first
contiguous block of the document). It may be desired to concretely
specify what over/underline styles are allowed for PEP section
headers, for uniformity.
Rationale
=========
The lack of a standard syntax for docstrings has hampered the
development of standard tools for extracting and converting docstrings
into documentation in standard formats (e.g., HTML, DocBook, TeX).
There have been a number of proposed markup formats and variations,
and many tools tied to these proposals, but without a standard
docstring format they have failed to gain a strong following and/or
floundered half-finished.
Throughout the existence of the Doc-SIG, consensus on a single
standard docstring format has never been reached. A lightweight,
implicit markup has been sought, for the following reasons (among
others):
1. Docstrings written within Python code are available from within the
interactive interpreter, and can be "print"ed. Thus the use of
plaintext for easy readability.
2. Programmers want to add structure to their docstrings, without
sacrificing raw docstring readability. Unadorned plaintext cannot
be transformed ("up-translated") into useful structured formats.
3. Explicit markup (like XML or TeX) is widely considered unreadable
by the uninitiated.
4. Implicit markup is aesthetically compatible with the clean and
minimalist Python syntax.
Many alternative markups for docstrings have been proposed on the
Doc-SIG over the years; a representative sample is listed below. Each
is briefly analyzed in terms of the goals stated above. Please note
that this is *not* intended to be an exclusive list of all existing
markup systems; there are many other markups (Texinfo, Doxygen, TIM,
YODL, AFT, ...) which are not mentioned.
- XML_, SGML_, DocBook_, HTML_, XHTML_
XML and SGML are explicit, well-formed meta-languages suitable for
all kinds of documentation. XML is a variant of SGML. They are
best used behind the scenes, because to untrained eyes they are
verbose, difficult to type, and too cluttered to read comfortably as
source. DocBook, HTML, and XHTML are all applications of SGML
and/or XML, and all share the same basic syntax and the same
shortcomings.
- TeX_
TeX is similar to XML/SGML in that it's explicit, but not very easy
to write, and not easy for the uninitiated to read.
- `Perl POD`_
Most Perl modules are documented in a format called POD (Plain Old
Documentation). This is an easy-to-type, very low level format with
strong integration with the Perl parser. Many tools exist to turn
POD documentation into other formats: info, HTML and man pages,
among others. However, the POD syntax takes after Perl itself in
terms of readability.
- JavaDoc_
Special comments before Java classes and functions serve to document
the code. A program to extract these, and turn them into HTML
documentation is called javadoc, and is part of the standard Java
distribution. However, JavaDoc has a very intimate relationship
with HTML, using HTML tags for most markup. Thus it shares the
readability problems of HTML.
- Setext_, StructuredText_
Early on, variants of Setext (Structure Enhanced Text), including
Zope Corp's StructuredText, were proposed for Python docstring
formatting. Hereafter these variants will collectively be called
"STexts". STexts have the advantage of being easy to read without
special knowledge, and relatively easy to write.
Although used by some (including in most existing Python
auto-documentation tools), until now STexts have failed to become
standard because:
- STexts have been incomplete. Lacking "essential" constructs that
people want to use in their docstrings, STexts are rendered less
than ideal. Note that these "essential" constructs are not
universal; everyone has their own requirements.
- STexts have been sometimes surprising. Bits of text are
unexpectedly interpreted as being marked up, leading to user
frustration.
- SText implementations have been buggy.
- Most STexts have have had no formal specification except for the
implementation itself. A buggy implementation meant a buggy spec,
and vice-versa.
- There has been no mechanism to get around the SText markup rules
when a markup character is used in a non-markup context. In other
words, no way to escape markup.
Proponents of implicit STexts have vigorously opposed proposals for
explicit markup (XML, HTML, TeX, POD, etc.), and the debates have
continued off and on since 1996 or earlier.
reStructuredText is a complete revision and reinterpretation of the
SText idea, addressing all of the problems listed above.
Specification
=============
The specification and user documentaton for reStructuredText is
quite extensive. Rather than repeating or summarizing it all
here, links to the originals are provided.
Please first take a look at `A ReStructuredText Primer`_, a short and
gentle introduction. The `Quick reStructuredText`_ user reference
quickly summarizes all of the markup constructs. For complete and
extensive details, please refer to the following documents:
- `An Introduction to reStructuredText`_
- `reStructuredText Markup Specification`_
- `reStructuredText Directives`_
In addition, `Problems With StructuredText`_ explains many markup
decisions made with regards to StructuredText, and `A Record of
reStructuredText Syntax Alternatives`_ records markup decisions made
independently.
Docstring-Significant Features
==============================
- A markup escaping mechanism.
Backslashes (``\``) are used to escape markup characters when needed
for non-markup purposes. However, the inline markup recognition
rules have been constructed in order to minimize the need for
backslash-escapes. For example, although asterisks are used for
*emphasis*, in non-markup contexts such as "*" or "(*)" or "x * y",
the asterisks are not interpreted as markup and are left unchanged.
For many non-markup uses of backslashes (e.g., describing regular
expressions), inline literals or literal blocks are applicable; see
the next item.
- Markup to include Python source code and Python interactive
sessions: inline literals, literal blocks, and doctest blocks.
Inline literals use ``double-backquotes`` to indicate program I/O or
code snippets. No markup interpretation (including backslash-escape
[``\``] interpretation) is done within inline literals.
Literal blocks (block-level literal text, such as code excerpts or
ASCII graphics) are indented, and indicated with a double-colon
("::") at the end of the preceding paragraph (right here -->)::
if literal_block:
text = 'is left as-is'
spaces_and_linebreaks = 'are preserved'
markup_processing = None
Doctest blocks begin with ">>> " and end with a blank line. Neither
indentation nor literal block double-colons are required. For
example::
Here's a doctest block:
>>> print 'Python-specific usage examples; begun with ">>>"'
Python-specific usage examples; begun with ">>>"
>>> print '(cut and pasted from interactive sessions)'
(cut and pasted from interactive sessions)
- Markup that isolates a Python identifier: interpreted text.
Text enclosed in single backquotes is recognized as "interpreted
text", whose interpretation is application-dependent. In the
context of a Python docstring, the default interpretation of
interpreted text is as Python identifiers. The text will be marked
up with a hyperlink connected to the documentation for the
identifier given. Lookup rules are the same as in Python itself:
LGB namespace lookups (local, global, builtin). The "role" of the
interpreted text (identifying a class, module, function, etc.) is
determined implicitly from the namespace lookup. For example::
class Keeper(Storer):
"""
Keep data fresher longer.
Extend `Storer`. Class attribute `instances` keeps track
of the number of `Keeper` objects instantiated.
"""
instances = 0
"""How many `Keeper` objects are there?"""
def __init__(self):
"""
Extend `Storer.__init__()` to keep track of
instances. Keep count in `self.instances` and data
in `self.data`.
"""
Storer.__init__(self)
self.instances += 1
self.data = []
"""Store data in a list, most recent last."""
def storedata(self, data):
"""
Extend `Storer.storedata()`; append new `data` to a
list (in `self.data`).
"""
self.data = data
Each piece of interpreted text is looked up according to the local
namespace of the block containing its docstring.
- Markup that isolates a Python identifier and specifies its type:
interpreted text with roles.
Although the Python source context reader is designed not to require
explicit roles, they may be used. To classify identifiers
explicitly, the role is given along with the identifier in either
prefix or suffix form::
Use :method:`Keeper.storedata` to store the object's data in
`Keeper.data`:instance_attribute:.
The syntax chosen for roles is verbose, but necessarily so (if
anyone has a better alternative, please post it to the Doc-SIG_).
The intention of the markup is that there should be little need to
use explicit roles; their use is to be kept to an absolute minimum.
- Markup for "tagged lists" or "label lists": field lists.
Field lists represent a mapping from field name to field body.
These are mostly used for extension syntax, such as "bibliographic
field lists" (representing document metadata such as author, date,
and version) and extension attributes for directives (see below).
They may be used to implement methodologies (docstring semantics),
such as identifying parameters, exceptions raised, etc.; such usage
is beyond the scope of this PEP.
A modified RFC 2822 syntax is used, with a colon *before* as well as
*after* the field name. Field bodies are more versatile as well;
they may contain multiple field bodies (even nested field lists).
For example::
:Date: 2002-03-22
:Version: 1
:Authors:
- Me
- Myself
- I
Standard RFC 2822 header syntax cannot be used for this construct
because it is ambiguous. A word followed by a colon at the
beginning of a line is common in written text.
- Markup extensibility: directives and substitutions.
Directives are used as an extension mechanism for reStructuredText,
a way of adding support for new block-level constructs without
adding new syntax. Directives for images, admonitions (note,
caution, etc.), and tables of contents generation (among others)
have been implemented. For example, here's how to place an image::
.. image:: mylogo.png
Substitution definitions allow the power and flexibility of
block-level directives to be shared by inline text. For example::
The |biohazard| symbol must be used on containers used to
dispose of medical waste.
.. |biohazard| image:: biohazard.png
- Section structure markup.
Section headers in reStructuredText use adornment via underlines
(and possibly overlines) rather than indentation. For example::
This is a Section Title
=======================
This is a Subsection Title
--------------------------
This paragraph is in the subsection.
This is Another Section Title
=============================
This paragraph is in the second section.
Questions & Answers
===================
1. Is reStructuredText rich enough?
Yes, it is for most people. If it lacks some construct that is
required for a specific application, it can be added via the
directive mechanism. If a useful and common construct has been
overlooked and a suitably readable syntax can be found, it can be
added to the specification and parser.
2. Is reStructuredText *too* rich?
For specific applications or individuals, perhaps. In general, no.
Since the very beginning, whenever a docstring markup syntax has
been proposed on the Doc-SIG_, someone has complained about the
lack of support for some construct or other. The reply was often
something like, "These are docstrings we're talking about, and
docstrings shouldn't have complex markup." The problem is that a
construct that seems superfluous to one person may be absolutely
essential to another.
reStructuredText takes the opposite approach: it provides a rich
set of implicit markup constructs (plus a generic extension
mechanism for explicit markup), allowing for all kinds of
documents. If the set of constructs is too rich for a particular
application, the unused constructs can either be removed from the
parser (via application-specific overrides) or simply omitted by
convention.
3. Why not use indentation for section structure, like StructuredText
does? Isn't it more "Pythonic"?
Guido van Rossum wrote the following in a 2001-06-13 Doc-SIG post:
I still think that using indentation to indicate sectioning is
wrong. If you look at how real books and other print
publications are laid out, you'll notice that indentation is
used frequently, but mostly at the intra-section level.
Indentation can be used to offset lists, tables, quotations,
examples, and the like. (The argument that docstrings are
different because they are input for a text formatter is wrong:
the whole point is that they are also readable without
processing.)
I reject the argument that using indentation is Pythonic: text
is not code, and different traditions and conventions hold.
People have been presenting text for readability for over 30
centuries. Let's not innovate needlessly.
See `Section Structure via Indentation`__ in `Problems With
StructuredText`_ for further elaboration.
__ http://docutils.sourceforge.net/docs/dev/rst/problems.html
#section-structure-via-indentation
4. Why use reStructuredText for PEPs? What's wrong with the existing
standard?
The existing standard for PEPs is very limited in terms of general
expressibility, and referencing is especially lacking for such a
reference-rich document type. PEPs are currently converted into
HTML, but the results (mostly monospaced text) are less than
attractive, and most of the value-added potential of HTML
(especially inline hyperlinks) is untapped.
Making reStructuredText a standard markup for PEPs will enable much
richer expression, including support for section structure, inline
markup, graphics, and tables. In several PEPs there are ASCII
graphics diagrams, which are all that plaintext documents can
support. Since PEPs are made available in HTML form, the ability
to include proper diagrams would be immediately useful.
Current PEP practices allow for reference markers in the form "[1]"
in the text, and the footnotes/references themselves are listed in
a section toward the end of the document. There is currently no
hyperlinking between the reference marker and the
footnote/reference itself (it would be possible to add this to
pep2html.py, but the "markup" as it stands is ambiguous and
mistakes would be inevitable). A PEP with many references (such as
this one ;-) requires a lot of flipping back and forth. When
revising a PEP, often new references are added or unused references
deleted. It is painful to renumber the references, since it has to
be done in two places and can have a cascading effect (insert a
single new reference 1, and every other reference has to be
renumbered; always adding new references to the end is suboptimal).
It is easy for references to go out of sync.
PEPs use references for two purposes: simple URL references and
footnotes. reStructuredText differentiates between the two. A PEP
might contain references like this::
Abstract
This PEP proposes adding frungible doodads [1] to the core.
It extends PEP 9876 [2] via the BCA [3] mechanism.
...
References and Footnotes
[1] http://www.example.org/
[2] PEP 9876, Let's Hope We Never Get Here
http://www.python.org/peps/pep-9876.html
[3] "Bogus Complexity Addition"
Reference 1 is a simple URL reference. Reference 2 is a footnote
containing text and a URL. Reference 3 is a footnote containing
text only. Rewritten using reStructuredText, this PEP could look
like this::
Abstract
========
This PEP proposes adding `frungible doodads`_ to the core. It
extends PEP 9876 [#pep9876]_ via the BCA [#]_ mechanism.
...
References & Footnotes
======================
.. _frungible doodads: http://www.example.org/
.. [#pep9876] PEP 9876, Let's Hope We Never Get Here
.. [#] "Bogus Complexity Addition"
URLs and footnotes can be defined close to their references if
desired, making them easier to read in the source text, and making
the PEPs easier to revise. The "References and Footnotes" section
can be auto-generated with a document tree transform. Footnotes
from throughout the PEP would be gathered and displayed under a
standard header. If URL references should likewise be written out
explicitly (in citation form), another tree transform could be
used.
URL references can be named ("frungible doodads"), and can be
referenced from multiple places in the document without additional
definitions. When converted to HTML, references will be replaced
with inline hyperlinks (HTML <a> tags). The two footnotes are
automatically numbered, so they will always stay in sync. The
first footnote also contains an internal reference name, "pep9876",
so it's easier to see the connection between reference and footnote
in the source text. Named footnotes can be referenced multiple
times, maintaining consistent numbering.
The "#pep9876" footnote could also be written in the form of a
citation::
It extends PEP 9876 [PEP9876]_ ...
.. [PEP9876] PEP 9876, Let's Hope We Never Get Here
Footnotes are numbered, whereas citations use text for their
references.
5. Wouldn't it be better to keep the docstring and PEP proposals
separate?
The PEP markup proposal may be removed if it is deemed that there
is no need for PEP markup, or it could be made into a separate PEP.
If accepted, PEP 1, PEP Purpose and Guidelines [#PEP-1]_, and PEP
9, Sample PEP Template [#PEP-9]_ will be updated.
It seems natural to adopt a single consistent markup standard for
all uses of structured plaintext in Python, and to propose it all
in one place.
6. The existing pep2html.py script converts the existing PEP format to
HTML. How will the new-format PEPs be converted to HTML?
A new version of pep2html.py with integrated reStructuredText
parsing has been completed. The Docutils project supports PEPs
with a "PEP Reader" component, including all functionality
currently in pep2html.py (auto-recognition of PEP & RFC references,
email masking, etc.).
7. Who's going to convert the existing PEPs to reStructuredText?
PEP authors or volunteers may convert existing PEPs if they like,
but there is no requirement to do so. The reStructuredText-based
PEPs will coexist with the old PEP standard. The pep2html.py
mentioned in answer 6 processes both old and new standards.
8. Why use reStructuredText for README and other ancillary files?
The reasoning given for PEPs in answer 4 above also applies to
README and other ancillary files. By adopting a standard markup,
these files can be converted to attractive cross-referenced HTML
and put up on python.org. Developers of other projects can also
take advantage of this facility for their own documentation.
9. Won't the superficial similarity to existing markup conventions
cause problems, and result in people writing invalid markup (and
not noticing, because the plaintext looks natural)? How forgiving
is reStructuredText of "not quite right" markup?
There will be some mis-steps, as there would be when moving from
one programming language to another. As with any language,
proficiency grows with experience. Luckily, reStructuredText is a
very little language indeed.
As with any syntax, there is the possibility of syntax errors. It
is expected that a user will run the processing system over their
input and check the output for correctness.
In a strict sense, the reStructuredText parser is very unforgiving
(as it should be; "In the face of ambiguity, refuse the temptation
to guess" [#Zen]_ applies to parsing markup as well as computer
languages). Here's design goal 3 from `An Introduction to
reStructuredText`_:
Unambiguous. The rules for markup must not be open for
interpretation. For any given input, there should be one and
only one possible output (including error output).
While unforgiving, at the same time the parser does try to be
helpful by producing useful diagnostic output ("system messages").
The parser reports problems, indicating their level of severity
(from least to most: debug, info, warning, error, severe). The
user or the client software can decide on reporting thresholds;
they can ignore low-level problems or cause high-level problems to
bring processing to an immediate halt. Problems are reported
during the parse as well as included in the output, often with
two-way links between the source of the problem and the system
message explaining it.
10. Will the docstrings in the Python standard library modules be
converted to reStructuredText?
No. Python's library reference documentation is maintained
separately from the source. Docstrings in the Python standard
library should not try to duplicate the library reference
documentation. The current policy for docstrings in the Python
standard library is that they should be no more than concise
hints, simple and markup-free (although many *do* contain ad-hoc
implicit markup).
11. I want to write all my strings in Unicode. Will anything
break?
The parser fully supports Unicode. Docutils supports arbitrary
input and output encodings.
12. Why does the community need a new structured text design?
The existing structured text designs are deficient, for the
reasons given in "Rationale" above. reStructuredText aims to be a
complete markup syntax, within the limitations of the "readable
plaintext" medium.
13. What is wrong with existing documentation methodologies?
What existing methodologies? For Python docstrings, there is
**no** official standard markup format, let alone a documentation
methodology akin to JavaDoc. The question of methodology is at a
much higher level than syntax (which this PEP addresses). It is
potentially much more controversial and difficult to resolve, and
is intentionally left out of this discussion.
References & Footnotes
======================
.. [#PEP-1] PEP 1, PEP Guidelines, Warsaw, Hylton
(http://www.python.org/peps/pep-0001.html)
.. [#PEP-9] PEP 9, Sample PEP Template, Warsaw
(http://www.python.org/peps/pep-0009.html)
.. [#Zen] From `The Zen of Python (by Tim Peters)`__ (or just
"``import this``" in Python)
__ http://www.python.org/doc/Humor.html#zen
.. [#PEP-216] PEP 216, Docstring Format, Zadka
(http://www.python.org/peps/pep-0216.html)
.. _reStructuredText markup: http://docutils.sourceforge.net/rst.html
.. _Doc-SIG: http://www.python.org/sigs/doc-sig/
.. _XML: http://www.w3.org/XML/
.. _SGML: http://www.oasis-open.org/cover/general.html
.. _DocBook: http://docbook.org/tdg/en/html/docbook.html
.. _HTML: http://www.w3.org/MarkUp/
.. _XHTML: http://www.w3.org/MarkUp/#xhtml1
.. _TeX: http://www.tug.org/interest.html
.. _Perl POD: http://www.perldoc.com/perl5.6/pod/perlpod.html
.. _JavaDoc: http://java.sun.com/j2se/javadoc/
.. _Setext: http://docutils.sourceforge.net/mirror/setext.html
.. _StructuredText:
http://dev.zope.org/Members/jim/StructuredTextWiki/FrontPage
.. _A ReStructuredText Primer:
http://docutils.sourceforge.net/docs/user/rst/quickstart.html
.. _Quick reStructuredText:
http://docutils.sourceforge.net/docs/user/rst/quickref.html
.. _An Introduction to reStructuredText:
http://docutils.sourceforge.net/docs/ref/rst/introduction.html
.. _reStructuredText Markup Specification:
http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html
.. _reStructuredText Directives:
http://docutils.sourceforge.net/docs/ref/rst/directives.html
.. _Problems with StructuredText:
http://docutils.sourceforge.net/docs/dev/rst/problems.html
.. _A Record of reStructuredText Syntax Alternatives:
http://docutils.sourceforge.net/docs/dev/rst/alternatives.html
.. _Docutils: http://docutils.sourceforge.net/
Copyright
=========
This document has been placed in the public domain.
Acknowledgements
================
Some text is borrowed from PEP 216, Docstring Format [#PEP-216]_, by
Moshe Zadka.
Special thanks to all members past & present of the Python Doc-SIG_.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:
More information about the Zope-Checkins
mailing list