(I cc:ed the Zope list on this because there is some good information for the community) Jon Udell wrote:
This is a nifty concept: you set up rules that map search results into a Yahoo-like category tree. You can create these rules interactively or, since it's all expressed in XML, you could in principle derive a ruleset by some other means and then have Ultraseek CCE use it.
This can be done rather nicely with a general purpose object index. Given objects (like documents) with various properties, you can desegnate a property 'keywords' that defines the set of nodes in a catagory hierarchy the object fits into. I envision you could do this two way, as a set of singleton nodes (so that multiple 'paths' from the heirarchy root are expressed can be expressed in one keyword, caveat you can't 'suppress' paths) or as a set of fully delimted paths or 'ordered keywords'.
Here's the example that Ultraseek's site refers to: <http://search.state.mn.us/>.
Has anyone used Ultraseek with its CCE? Or a similar system (e.g. Verity Topic) that does rule-based mapping of results onto a category tree? I'd be curious to hear from someone who's wrestled, using tools like this, with a fairly large corpus of documents, and can speak to the issues involved in creating/maintaining the mappings from results space into category space.
I cannot give you examples with the software you mention, but I can give you an example that we are working on with ZCatalog. We are desiging a 'Topic' based system that works as a catagorical hierachy, ala Yahoo. ZCatalog is an object index, somewhat identical to what I describe above. ZCatalog indexes objects into an arbitrary set of various kinds of indexes. Each index is responsible for indexing one particular property of an object. If the an object being indexed does not have a property that an index is looking for, the object is simply not indexed in that index (but it may have a property that another index is looking for, and therefore will be indexed in *that* index). The CVS version of ZCatalog uses three types of index (in 2.0, there are only the first 2): Field Index: property values are treated atomically. Indexes can be queried for all objects that match that value. Range searches can also be done on indexed object values that support comparison (like numbers, dates, special purpose 'length' objects, etc). indexes can also be queired for the set of unique values in the index, for example, you can ask for the set of unique 'meta_types' of all objects indexed. A good example of this is the search by 'type' on the Zope site (http://www.zope.org/SiteIndex/searchForm). Text Index: property values are applied against a lexicon object that stems, stops, and parses the value into a full text index. The index may be queried with a simple boolean query language that allows 'and' 'or', phrasing, parenthesized boolean expressions, and proximity matching. Relevance ranking is supported and returns the sum of the occurances of all query terms in the 'hit'. A normalized score is also provided that is normalized from 0 to 100 over the whole result set. Keyword Index: Subclasses all of the field index behavior, except that property values are treated as a sequence of keywords. The ZCatalog can work in a UNIX 'find' like fasion, where it spiders over the object hierarchy indexing objects, or Zope classes may subclass behavior that makes them Catalog Aware, allowing them to index/unindex themselves when their state changes. The new Zope site is driven by ZCatalog. All of the product listings, member contributions, news items, links, tips, documentation, how-tos, etc... are all catalog aware objects that index themselves. As new content is added anywhere in the site, all of the various dynamicaly generated information is updated. Some user have the propery access credentials to review or immediatly submit new content, other users must submit content for review before it is cataloged. On the bottom of every screen, there is a 'DTML Source' link that shows you the DTML that generated that page. There, you can see the various clever ways that the Catalog is used throughout the site. -Michel
-- Jon Udell | <http://udell.roninhouse.com/> | 603-355-8980