Re: [Zope-dev] Multiformatted Interface (was Re: [Zope] Can't add "date" property to new ZClass) "date" property to new ZClass)
Date sent: Wed, 13 Oct 1999 14:59:42 -0400 From: Michel Pelletier <michel@digicool.com> Organization: Digital Creations, Inc. To: Toby Dickenson <htrd90@zepler.org> Copies to: Loren Stafford <lstafford@icompression.com>, zope-dev@zope.org Subject: [Zope-dev] Re: [Zope] Can't add "date" property to new ZClass
(I moved this thread to zope-dev)
Toby Dickenson wrote:
On Mon, 11 Oct 1999 15:06:50 -0700, you wrote:
I'm creating a brand new ZClass (called "PDFClass"). It's my first ZClass. I've just added the Common Instance Property Sheet (called "PDFProperties"). Now I'm trying to add properties to that property sheet. I added one "string" property OK, but now when I add the "date" property "pub_date", I get this error:
Invalid Date-Time String
Are you planning to extract properties from PDF files? That's a task on my to-do list too.
Let's start a discussion on this before any code gets written, I've been thinking alot about document formats lately.
I'm working on a model and elaboration of what I call MFI, Multi-Format Interface, which will be a component of the Portal Toolkit and possible a future core feature of Zope. Basicly, this consolidates all of the document types, DTML, XML, PDF, what-have you, into a subclassable interface that allows you to define pluggable format types.
That is a very good idea. Should it be able to guess what the document format is or will you have to indicate that by hand?
This way, indestead of making a whole new type of object (PDFDocument, whatever) you make a new plug-in format for MFI that all Documents can then select as their format type. This is much more flexible, extendable, and 'philosophically' correct than the current method. As an example, an HTML formatter could be made (that extracts meta-information from a document and perhaps builds a DOM tree if it's parsable well enough), a PDF formatter (can DOM be put on PDF?)
I doubt it. The structured documents and the page oriented markup seem to be rather different philosophies of representing a text document. Turning PDF documents into html is no easy business either. But then, some sectioning of even a pdf document (and even of Word documents - sometimes :-) _is_ possible. Calling that a DOM is stretching the DOM concept a bit too much I think.
a structured text (stx) formatter (we are elaborating a DOM interface for Stx). The possiblities are endless, and it means that all of these document types in the add list can be reduced to one selection.
This is a pretty light description, but there are many other benefits I'm formalizing into a document right now. What are you thoughts?
What all documents do/could/should have (natively or added) are document properties (preferably conforming to the Dublin Core, for standardization). These should be extracted using DOM or COM (for Word documents) or via a PDF parser for PDF documents or whatever and added to the propertysheet. Or/and propertysheets could be filled by hand through the Zope Management interface. I take it that your proposal also leads to inclusion of documents in catalogs for fielded and fulltext searches? Yes, please? However, won't this be conceptually difficult beyond full text searches? The level of access of the documents are so diverse. Compare the structured DOM access to XML documents to the (basically) mere word level access to pdf (and even html) documents. Oh well, just my own preoccupations I guess. An very good idea Michel. Rik
Rik Hoekstra wrote:
I'm working on a model and elaboration of what I call MFI, Multi-Format Interface, which will be a component of the Portal Toolkit and possible a future core feature of Zope. Basicly, this consolidates all of the document types, DTML, XML, PDF, what-have you, into a subclassable interface that allows you to define pluggable format types.
That is a very good idea. Should it be able to guess what the document format is or will you have to indicate that by hand?
It can be indicated on the managment screen, specified programaticaly through DTML, and I suppose it would be nifty to have it sniff an uploaded content type and select a formatter, for example, if you FTP upload a file called whatever.xml, Zope will magically turn it into an XML document instead of a DTML document.
I doubt it. The structured documents and the page oriented markup seem to be rather different philosophies of representing a text document. Turning PDF documents into html is no easy business either. But then, some sectioning of even a pdf document (and even of Word documents - sometimes :-) _is_ possible. Calling that a DOM is stretching the DOM concept a bit too much I think.
I'll take your word for it, I dont know much about it. I'm sure it's possible to do a minimal amount of parsing for properties etc or to just have a null formatter in place for future expansion.
What all documents do/could/should have (natively or added) are document properties (preferably conforming to the Dublin Core, for standardization). These should be extracted using DOM or COM (for Word documents) or via a PDF parser for PDF documents or whatever and added to the propertysheet. Or/and propertysheets could be filled by hand through the Zope Management interface.
We are working on making CatalogAwareness dublin-core standard, in fact i think it's checked into CVS right now, but it might not be.
I take it that your proposal also leads to inclusion of documents in catalogs for fielded and fulltext searches? Yes, please?
This will be a seperate mix-in class for portal aware objects. For example, News Items, How-Tos and tips will all subclass both MFI and PortalObject. PortalObject defines portal behavior like membership, cataloging, dublin properties and reviewing.
However, won't this be conceptually difficult beyond full text searches? The level of access of the documents are so diverse. Compare the structured DOM access to XML documents to the (basically) mere word level access to pdf (and even html) documents.
ZCatalogs would either have to be instructed to understand the deeper structure of these types of doucments, or the documents themselves (through PortalObject) will need to told how to compose their 'content' into something the catalog can turn into values. -Michel
participants (2)
-
Michel Pelletier -
Rik Hoekstra