[Zope3-dev] Pre-proposal: IDocument and friends
Lalo Martins
lalo@hackandroll.org
Thu, 20 Dec 2001 10:32:06 -0200
Pre-proposal: IDocument and IDocumentJargon components
Abstract
This is a proposal to address the requirements of a flexible,
generic Document component for Zope3, raised in the "Axe DTML
Document" thread in the Zope3-Dev mailing list.
All proposed names are, of course, subject to discussion.
Problem
A lot of data in a Zope site is too complex for an opaque blob
(such as the current "File" object) and too simple to demand
its own component class.
Of the "kinds" of data (using the term broadly, to mean
something that could or could not become a class) that have
enough structure to be a class, many share common structure
and features - a tree-like structure and the need for at least
one textual, human-editable representation, plus automatic
rendering to the web. This suggests these classes should
either use some common utilities or subclass a single base
class.
Requirements
1. Documents are primarily text, with a tree-like structure.
2. The default presentation should render the document
automatically to whatever format the user wants. This
should take in account the Zope3 idea of "default
presentation", which already depends on medium (HTTP, for
instance) and accepted formats (in case of HTTP, the
"Accept:" header). In the canonical case, this will mean
filtering trough a standard page template somewhere and
producing xHTML.
3. At least one human-editable, lossless format should be
available, for WebDAV and ZMI textareas.
4. Renderers for new formats (for 2 and 3) and the
respective parsers (for 3) should be transparently
pluggable, leveraging the Component Framework.
5. There should be a "basic" structure (see "Flexibility of
the tree" below).
6. It should be easy to tag (mark up) specialized content in
the text - for example, mark up all references to people
or email addresses.
7. Documents that don't want to follow the "basic"
structure at all, should be allowed to (see "Flexibility
of the tree" below).
Flexibility of the tree
Solving this problem with one single interface would lead to
excessive or insufficient rigidity. A significative portion
of the documents this proposal addresses conform more or
less to a general tree structure, with multiple levels of
containment (which can be book/chapter/section/subsection or
not) and some basic meta-layout (such as emphasys, bulleted
lists and "foot"-notes).
(For simplicity, we'll assume that this basic structure is
more or less the one supported by the current version of
StructuredText, as of this writing in December 2001. This
might or might not be the case.)
The necessity of additional tagging, as defined in
requirement 5, can be addressed by adding markup to the
basic structure.
However, some documents don't share this basic structure;
-FIXME- no examples come to mind. So it is necessary to
allow a different basic structure to be used.
Solution
IDocument
Define one component interface, IDocument. Different
implementations could exist, and they could conceivably
implement the interface using different internal document
formats. But the difference between these should be only a
matter of performance, never functionality.
This interface doesn't need to do a lot; converting from one
interface (IDocument) to some presentation is already
handled by the Component Architeture. IDocument just needs
to provide a way back, for requirement 4.
So, the IDocument interface only specifies one method:
'updateFrom(object)', which uses the component framework to
find out what kind of content is in 'object', then updates
its content based on it if a converter is available,
otherwise raises an exception.
ZDoc
This proposal introduces the hypothetical format "ZDoc",
used to describe an IDocument. While it is possible that
implementation of this proposal decides to use ZDoc as the
internal representation or even as exchange format, this may
not necessarily be so. For now, this format is only an
abstract tool we'll use to communicate between ourselves, to
have an idea of what can be in an IDocument.
Let's imagine ZDoc as an empty XML schema, on top of which
we'll use XML namespaces to introduce semantics.
In an hypothetical implementation of IDocument, the document
content would be converted into the corresponding ZDoc and
stored in the ZODB in this format.
Each ZDoc element has one default namespace, specified in
the usual XML namespace notation
(http://www.w3.org/TR/1999/REC-xml-names-19990114/ for info).
Let's also imagine a standard namespace for ZDoc -
ZStructuredDoc. This namespace defines a set of tags and
attributes that matches requirement 4 (this would probably
be more or less the featureset of StructuredText as of this
writing). So, with the correct xmlns attribute in the
top-level <Document> element, the whole document gets
formatted by ZStructuredDoc.
IDocumentJargon
Now we need to address requirements 6 and 7.
For this, you'd implement a component which implements
IDocumentJargon. This component will provide additional
markup and/or alternative structure.
This interface defines two methods: 'makeElement(source)'
and 'processElement(element)'.
The method 'makeElement', given an element from the tree
(-FIXME-: DOM object or XML string?) returns an
IDocumentElement object.
The Document framework will include an utility to register
IDocumentJargon components. When an IDocument instance is
being parsed, the last path component of the namespace name
is used to build the correct jargon. For example::
<Document
xmlns:py="http://www.zope.org/Members/lalo/xmlns/PyDoc">
...blah blah blah <py:module>time</py:module> blah...
</Document>
During parsing, the jargon registry would look for a jargon
named 'PyDoc'. This component would be used to turn the
py:module tag into an IDocumentElement object.
Element attributes with a namespace different than that of
the element itself are processed by 'processElement'.
Order of processElement calls
More specific namespaces (those specified in inner
elements) are looked up first. For those specified in the
same element, stands the order of the xmlns statements.
Example::
<Document xmlns:py="http://www.zope.org/Members/lalo/xmlns/PyDoc"
xmlns:zope="http://www.zope.org/xmlns/ZopeAPIDoc">
...blah blah blah
<py:class name="DateTime"
xmlns:iso="http://www.zope.org/Members/lalo/xmlns/IsoStuff">
<py:method url="http://www.python.org/doc"
iso:std="8601"
zope:name="ZopeTime">ISO</py:method>
</py:class>
blah...
</Document>
The element py:method would first be built by calling
PyDoc.makeElement(); then this element would be fed to
IsoStuff.processElement(), and finally
ZopeAPIDoc.processElement().
IDocumentElement
Basic API not yet defined. This interface can be DOMish or
Zopeish (e.g. 'objectValues'), or have its own API
(e.g. 'elementValues()'), or any combination of these.
Besides tree navigation, searching and modification, this
interface has two methods, to specialize rendering.
method 'present()' --
used for generating lossy presentations. Should return an
IDocumentElement with that corresponds to a reasonable
representation of this element in ZStructuredDoc.
For example: '<faa:person>Lalo</faa:person>' could become
'<extlink url="http://www.laranja.org/">Lalo</extlink>',
perhaps by looking up a person-to-URL database somewhere.
method 'render(interface)' --
used when there is something better than the "reasonable"
for a lossy presentation, for one given target.
For example: '<mm:sound>fascinating</mm:sound>', when
rendering to HTML, could generate an '<object>' tag to
embed an audio player.
This method should raise a standard exception when it is
not applicable - let's call it 'NotthingAppropriated'. The
rendering adapter will catch this exception and handle it
by calling 'present()', then rendering the resulting
ZStructuredDoc.
Risks
These points have been raised by Paul Everitt in the thread:
1. If the class that the pickle is an instance of is overly
rich, then you might find yourself always writing
converters for every Zope upgrade. Also, the data may be
less usable outside of Zope.
2. I'm someone that uses CMF Documents quite aggressively.
I'm constantly trying to find some damn tool on some
operating system that can replace a TEXTAREA w/o
disappointing me. Thus, in some ways I have expectations
of my Zope3 folder appearing to be a fileserver, albeit one
that can give me a bunch of extras when I look at it
through a web browser. This is an important usage. It
covers the majority of content currently being authored by
the majority of average users.
3. In some ways, the smarts of CMF Document leads to
unexpected behavior for (2). For instance, say I'm using
WebDrive/DavFS to edit a text document. I save it. It's
immediately out-of-date, because the CMF sticks Dublin Core
headers into it. If you pick some neutral DOM
representation, you're by definition changing the original
in a perhaps lossy way. Users might not expect that and
give up when things don't work as expected.
4. OTOH, most of the value an organization gets is in turning
raw data into repeatable, standardized content with rich
services. Pulling this off without alienating users (see
(3)) is the trick.
Other risks:
5. A component not flexible enough, or not specialized enough,
or too hard to use (UI), could alienate users.
6. If it's to hard to write and register parsers (converters
from some format to, say, ZDoc), they won't be written, and
the framework will therefore be less useful.
--
I don't know, I feel something is missing. I sense the presence
of holes, but I can't pinpoint them. So, here it is, for peer
review ;-)
[]s,
|alo
+----
--
It doesn't bother me that people say things like
"you'll never get anywhere with this attitude".
In a few decades, it will make a good paragraph
in my biography. You know, for a laugh.
--
http://www.laranja.org/ mailto:lalo@laranja.org
pgp key: http://www.laranja.org/pessoal/pgp
Brazil of Darkness (RPG) --- http://www.BroDar.org/