[Zope] - Finding recently changed pages

Jim Fulton jim@Digicool.com
Tue, 08 Dec 1998 22:45:55 +0000


Kent Polk wrote:
> 
> Hi A.M. (A.M. Kuchling), in <199812061624.LAA25965@207-172-39-232.s232.tnt10.ann.erols.com> on Dec 6 you wrote:
> 
> > At http://discuss.userland.com/msgReader$857 , Dave Winer suggests
> > that Web sites should have a top-level siteChanges.xml file that lists
> > URLs that have recently changed.  Search engines could then use this
> > information to only crawl pages that have changed recently, which
> > would let them keep up to date more easily.  How would one go about
> > implementing something like this under Zope?
> 
> This is an interesting issue that ties in with what I've been
> doing. I've built several 'document management' sites using
> the old Principia. All of them, up to this point, stored the
> bibliographic content in an SQL database. This bibliographic
> database made it real easy to implement a 'What's New in the
> last ___ days?' sort of thing as an sql query, which is terribly
> useful for people who use web site info as a 'library'. IMO,
> automatic information 'dating' should be available with all
> publishing models. As should be searching and versioning...
> (did I just shoot myself in the foot? :^)

Here, your application defined what the important object was, 
wrt date, the bibliographic record.

> There are some very interesting issues wrt to dating, searching
> and versioning for ZOPE that I think need to be addressed. I'm
> not exactly sure how to address them because the problem is fairly
> complex. I've gone through a number of scenarios and keep coming
> back to the concept of a 'multi-part' document Product that
> incorporates those three issues PLUS provides a mechanism to
> build, maintain, and view those documents. What do I mean? Good
> questions as I'm not sure myself. But let me try to explain...
> 
> Multi-part issues:
>  HTML documents are typically inherently multipart documents
>  as they often reference other types of data such as images,
>  spreadsheet tables, etc.  Many other types of documents also
>  are associated with data which is directly related but typically
>  don't have a linking mechanism to automatically access the
>  other parts.  Documents also often need to be stored and/or
>  published multiple ways.  Searching an MSWord document is
>  a complex problem, possibly creating (or storing) a text version
>  to be available for search purposes when the main document is
>  loaded is appropriate.

I agree, but I use slighly different jargon that reflects my
different viewpoint.

URL's point to *objects*.  Objects may have multiple parts, 
which may be simple properties, sich as strings and numbers,
methods, whose role is to manage or present object information,  
or complex subobjects.
 
> Versioning issues:
>  Almost every document management system is required to have a
>  versioning mechanism, whereby old versions of documents can be
>  referenced and extracted.  (IMO, Versioning is similar to multi-
>  part document handling)

How so?  I'm not disagreeing.  I'm just curious how you see them
as similar?

> Dating issues:
>  All documents, including versions, should have a date associated
>  with them (often two or three dates... :^(

It's obviously hard to pin down which dates are important, at least
in general.  See my response to Andrew.
 
> 'Publishing' issues:
>  How to make multi-part documents available?  IMO, all of the parts
>  of a multi-part document are related to that document. A folderish
>  object seems to apply for wrapping multi-part documents.

Exactly.

>  This
>  object's primary method would present the latest version of the
>  main document part when requested. Other methods would present
>  the list of versions, parts, etc.

Yes.

> Search issues:
>  Methods for indexing and date-searching to provide 'what's new?'
>  sorts of accesses. Note that 'What's New?' can be easily further
>  constrained by further bibliographic information or possibly client
>  cookie mechanisms quite easily...

Yes, but this seems to be very application dependent.
 
> Creation and Editing issues:
>  I think this issue starts to come in line with the current discussion.
>  One problem I have with users adding documents to Principia is that
>  many operations have to be performed before the document is readily
>  available to be published within the constraints of the environment.

Could you elaborate?

>  (several issues that don't necessarily apply to everyone). Most of
>  my clients just want to make these documents available and don't
>  want to have to build the html reference pages to do so.
> 
>  TinyTables has recently met with great acclaim by my users because
>  I can provide them with a reference generator and they can just
>  edit the TinyTable data and not have to learn HTML or worry about
>  correct HTML syntax (typos, etc).
> 
>  I keep thinking that a heirarchy of Products to provide versioning,
>  parts, and editing/publishing would greatly assist this 'problem'
> 
>  Now, I haven't even touched on the issues of how/where the data is
>  to be stored, as I think there needs to be flexibility here and
>  several other issues, but Ty and I are probably going to take a
>  whack at this concept of multi-part documents to try to address
>  these issues and I thought it would be appropriate to see if it is
>  enough of a concern to anyone else such that at least the design
>  ought to be a community effort in order to make it more useful to
>  others.
> 
>  Any Comments?

I think there needs to be a way of more easily defining new kinds 
of objects in Zope.  Zope can't really know when a change to a folder
is meaningful for your application.  But if you could implement
your own application objects, using Zope building blocks like folders, 
documents, etc, then you could control what information was used
for determining things like "modification time".  You could also
provide management interfaces that make ir easier for your customers
to manage your objects.

A fairly high priority for me is to come up with a through-the-web
Zope class system that would make it pretty easy for you to implement
the sorts of "multi-part" documents you want.

Jim

--
Jim Fulton           mailto:jim@digicool.com
Technical Director   (540) 371-6909              Python Powered!
Digital Creations    http://www.digicool.com     http://www.python.org

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.