Date sent: Wed, 13 Oct 1999 14:59:42 -0400 From: Michel Pelletier <michel@digicool.com> Organization: Digital Creations, Inc. To: Toby Dickenson <htrd90@zepler.org> Copies to: Loren Stafford <lstafford@icompression.com>, zope-dev@zope.org Subject: [Zope-dev] Re: [Zope] Can't add "date" property to new ZClass
(I moved this thread to zope-dev)
Toby Dickenson wrote:
On Mon, 11 Oct 1999 15:06:50 -0700, you wrote:
I'm creating a brand new ZClass (called "PDFClass"). It's my first ZClass. I've just added the Common Instance Property Sheet (called "PDFProperties"). Now I'm trying to add properties to that property sheet. I added one "string" property OK, but now when I add the "date" property "pub_date", I get this error:
Invalid Date-Time String
Are you planning to extract properties from PDF files? That's a task on my to-do list too.
Let's start a discussion on this before any code gets written, I've been thinking alot about document formats lately.
I'm working on a model and elaboration of what I call MFI, Multi-Format Interface, which will be a component of the Portal Toolkit and possible a future core feature of Zope. Basicly, this consolidates all of the document types, DTML, XML, PDF, what-have you, into a subclassable interface that allows you to define pluggable format types.
That is a very good idea. Should it be able to guess what the document format is or will you have to indicate that by hand?
This way, indestead of making a whole new type of object (PDFDocument, whatever) you make a new plug-in format for MFI that all Documents can then select as their format type. This is much more flexible, extendable, and 'philosophically' correct than the current method. As an example, an HTML formatter could be made (that extracts meta-information from a document and perhaps builds a DOM tree if it's parsable well enough), a PDF formatter (can DOM be put on PDF?)
I doubt it. The structured documents and the page oriented markup seem to be rather different philosophies of representing a text document. Turning PDF documents into html is no easy business either. But then, some sectioning of even a pdf document (and even of Word documents - sometimes :-) _is_ possible. Calling that a DOM is stretching the DOM concept a bit too much I think.
a structured text (stx) formatter (we are elaborating a DOM interface for Stx). The possiblities are endless, and it means that all of these document types in the add list can be reduced to one selection.
This is a pretty light description, but there are many other benefits I'm formalizing into a document right now. What are you thoughts?
What all documents do/could/should have (natively or added) are document properties (preferably conforming to the Dublin Core, for standardization). These should be extracted using DOM or COM (for Word documents) or via a PDF parser for PDF documents or whatever and added to the propertysheet. Or/and propertysheets could be filled by hand through the Zope Management interface. I take it that your proposal also leads to inclusion of documents in catalogs for fielded and fulltext searches? Yes, please? However, won't this be conceptually difficult beyond full text searches? The level of access of the documents are so diverse. Compare the structured DOM access to XML documents to the (basically) mere word level access to pdf (and even html) documents. Oh well, just my own preoccupations I guess. An very good idea Michel. Rik