[Zope-dev] Multiformatted Interface (was Re:
Rik Hoekstra
hoekstra@fsw.leidenuniv.nl
Wed, 13 Oct 1999 21:56:49 +0100
> (I moved this thread to zope-dev)
Woops. I accidentally sent my reply to the Zope list. Here it is
again.
>
> Toby Dickenson wrote:
> >
> > On Mon, 11 Oct 1999 15:06:50 -0700, you wrote:
> >
> > >I'm creating a brand new ZClass (called "PDFClass"). It's my first ZClass.
> > >I've just added the Common Instance Property Sheet (called "PDFProperties").
> > >Now I'm trying to add properties to that property sheet. I added one
> > >"string" property OK, but now when I add the "date" property "pub_date", I
> > >get this error:
> > >
> > > Invalid Date-Time String
> >
> > Are you planning to extract properties from PDF files? That's a task
> > on my to-do list too.
>
> Let's start a discussion on this before any code gets written, I've been
> thinking alot about document formats lately.
>
> I'm working on a model and elaboration of what I call MFI, Multi-Format
> Interface, which will be a component of the Portal Toolkit and possible
> a future core feature of Zope. Basicly, this consolidates all of the
> document types, DTML, XML, PDF, what-have you, into a subclassable
> interface that allows you to define pluggable format types.
That is a very good idea. Should it be able to guess what the
document format is or will you have to indicate that by hand?
> This way,
> indestead of making a whole new type of object (PDFDocument, whatever)
> you make a new plug-in format for MFI that all Documents can then select
> as their format type. This is much more flexible, extendable, and
> 'philosophically' correct than the current method. As an example, an
> HTML formatter could be made (that extracts meta-information from a
> document and perhaps builds a DOM tree if it's parsable well enough), a
> PDF formatter (can DOM be put on PDF?)
I doubt it. The structured documents and the page oriented markup
seem to be rather different philosophies of representing a text
document. Turning PDF documents into html is no easy business
either. But then, some sectioning of even a pdf document (and even of
Word documents - sometimes :-) _is_ possible. Calling that a DOM is
stretching the DOM concept a bit too much I think.
> a structured text (stx)
> formatter (we are elaborating a DOM interface for Stx). The
> possiblities are endless, and it means that all of these document types
> in the add list can be reduced to one selection.
>
> This is a pretty light description, but there are many other benefits
> I'm formalizing into a document right now. What are you thoughts?
>
What all documents do/could/should have (natively or added) are
document properties (preferably conforming to the Dublin Core, for
standardization). These should be extracted using DOM or COM (for
Word documents) or via a PDF parser for PDF documents or whatever and
added to the propertysheet. Or/and propertysheets could be filled by
hand through the Zope Management interface.
I take it that your proposal also leads to inclusion of documents in
catalogs for fielded and fulltext searches? Yes, please?
However, won't this be conceptually difficult beyond full text
searches? The level of access of the documents are so diverse.
Compare the structured DOM access to XML documents to the (basically)
mere word level access to pdf (and even html) documents.
Oh well, just my own preoccupations I guess. An very good idea
Michel.
Rik
_______________________________________________
Zope maillist - Zope@zope.org
http://www.zope.org/mailman/listinfo/zope
(Related lists - please, no cross posts or HTML encoding!
To receive general Zope announcements, see:
http://www.zope.org/mailman/listinfo/zope-announce
For developer-specific issues, zope-dev@zope.org -
http://www.zope.org/mailman/listinfo/zope-dev )