[Zope] Zope search engine
Kent Polk
kent@goathill.org
Fri, 19 Feb 1999 08:35:11 -0600 (CST)
Regarding search indexing...
The builtin Zope Find manager appears to provide searches
constrained by:
- object type
- ids
- full-text matches (containing)
- mod dates
- folder
I would like to be able to provide application-specific index/search
capabilities. Here's my view:
search by:
- object type. Be able to easily specify a new object type to
constrain searching. Seems Zope has that covered.
- ids. This is a search 'by name'. Covered?
- attribute. This could correspond to a bibliographic search.
Paul mentioned Brian's addition that lets one set object
attributes via tags in a document. One might also consider
providing an 'index_bibliography' or 'index_attribute' method
which, if it exists, would allow an application manager to
provide a way to index document bib's. This would make it
easier to index sql document collections or filesystem document
collections without having to have the info directly stored in
the Zope database.
- full-text. Existing appears to be for exact matched internal
documents? I'd like to see the indexer look for an additional
'index_body' method which, if it exists, would allow an app
developer to provide a way to index text documents that possibly
aren't in the Zope Database.
- query language. Needs one. Not sure how well featured it needs
to be.
- mod dates. Need a way to register mod-dates as an attribute for
query results, etc. Intentionally vague here as I don't know what
the answer should be. However, it seems to be to go completely
against the grain of Zope to only provide datestamp capabilities
for simple objects. The whole idea of Zope (to me) is to provide
automated site facilities to manage large document collections
and to be able to build those documents from content objects.
For example, 4 years ago I worked on a project for the NRC which
stored SGML'ed document content in an Oracle database. It was
almost REALLY cool as you could build a document by performing
an SQL query that collected the results, evaluated the SGML and
produced the report according to your desires (almost). It then
date-stamped the document according to deterministic rules. That
experience greatly shaped how I view information and it is at
odds with the way that most people think. Fortunately for me, I
think they are wrong. :^)
The QRS reporting system that Ty and I developed with Principia is
similar to the NRC system in that you perform a query, a document
is built, edited, published in html/pdf/postscript on the fly,
referenced, with a complete modification transaction history
available, etc. Each document has a variety of dates associated
with it and rules are available to set a moddate (expiration date
also, etc).
I'd like to see (again) a method available to set the moddate for
objects that aren't statically in the Zope object database, even
though their parts may be in there, such as Z Tables or even
TinyTables databased documents. I would like for the index method
to have access to the moddate. Not sure if it really needs to be
further a part of Zope... Yes, this moddate could just be another
attribute, and maybe it should done be that way, but I just want
to make sure the index engine has a way of accessing it and knowing
that it is the 'official' moddate for that object.
I've also been experimenting with publishing filesystem-stored
base documents. As another person just mentioned, I have a customer
who insists that the base document information be stored in the
filesystem. I can provide the mod-date, but again, I need a way
to let Zope call indexing methods which I could provide for these
different applications.
Comments? What are you DC guys thinking of for a search engine?
Thanks!
Kent