[Zope3-dev] ANN: Zemantic: a Zope 3 Semantic Web Catalog
Michel Pelletier
michel at dialnetwork.com
Wed Dec 29 16:32:19 EST 2004
Zemantic is a prototype semantic web catalog for Zope 2 and 3. The
Zope 2 version is available from The Collective, the Zope 3 version is
not yet checked into a repo and is available as snapshots on
http://zemantic.org. I hope to get it checked into a repo soon.
Note that the Zope 3 version is tested against Zope X3 3.0.0 and *not*
the development version of Zope 3. Included installation instructions
are included to install it into a Zope X3 3.0.0 instance. Zemantic
also requires Daniel Krech's rdflib library from http://rdflib.net.
rdflib does most of the heavy-lifting in Zemantic (XML parsing, triple
construction). Zemantic is just a persistent backend storage to this
library with some query enhancements and some Zope 3 glue code.
I feel Zemantic is far enough along to release it into the wild, and
would like to encourage other Zope 3 developers to try it out and send
me their feedback. It tries to follow the Zope 3 best practices as
outlined by the Zope 3 development process as much as possible, and
major documentation and testing enhancements are on the drawing board
for the next release.
Zemantic catalogs semantic data represented in the Resource
Description Framework (RDF). This data is communicated to the catalog
in the RDF XML format and can come from any internal or external
resource; any kind of metadata that can be expressed in XML about
anything with a URI can be cataloged into a Zemantic catalog. Links
to information on the semantic web and RDF can be found on
zemantic.org.
Zemantic comes with an example content object that implements
automatic cataloging and uncataloging via an event subscriber. The
content object describes itself in XML via an adapter and an XML page
template using the Dublin Core vocabulary (very easy thanks to Zope
3's built in DC implementation), DC is just used here as an example,
Zemantic does not interpret the semantic meaning of any of the RDF
data it stores, so any vocabulary is possible (XMLS, RSS, OWL, custom,
or any mix thereof).
Zemantic can be used in many ways like the ZCatalog from Zope 2 can be
used. More specificly, it can be used like Field and Keyword
indexes in Zope 2, but with much more flexibility, because Zemantic
has no need to initially create "indexes" of fixed predicates,
instead, information about resources is stored in a semantic graph in
a form of meta-index.
Zemantic is also meant to be used by application specific "agents"
that search the catalog attempting to satisfy some application
specific query or need. To assist the task of agents, Zemantic comes
with a collection of query tools to nest, intersect, and take the
union of results from any combination of semantic queries.
In AI circles agents can use reasoning and inference based on RDF
data. There are several semantic web applications written in Python,
including the CWM tool and SWAP by Tim Berners-Lee, that explore these
ideas and I have no intention of reproducing this body of work. They
should with some massaging work with Zope 3 and Zemantic, after all,
working with legacy Python modules is one of the key goals of Zope 3.
To explain zemantic's operation briefly, any semantic graph can be
broken down into individual statements that contain a subject,
predicate, and object. These three components together are called a
*triple* and they form the basis of all RDF information theory.
Zemantic indexes triples in a three-dimensional persistent BTree so
that any pattern of subject, predicate, or object, can be queried out
of the catalog very quickly (see lib/ZODBBackend.py).
As the catalog is used it "learns" new statements about whatever you
feed it. For example, you do not need to create an index for the
"creator" predicate. Zemantic catalogs this relationship using the
common and well known DC creator element, identified with a URI in the
catalog. To query for any resources that match a given creator, you
could use the following Python:
from zemantic import Query, Any, dc, Literal
catalog.query(Query(Any, dc.creator, Literal("michel")))
This would be roughly equivalent to the ZCatalog query:
zcatalog(creator="michel")
Except that in Zemantic there is no "creator" index, predicates
(indexes), like dc.creator, are indexed into the triple store along
with their associated keys and values.
Notice that you pass the catalog the query object that
does the actual search (Query() is a help class that provides for
simple subject, predicate, object searching). This illustrates that
*searching* the catalog is an application specific function and must
be provided by the application by passing in an object that implements
the IQuery interface.
By analogy, it's like going into the library yourself to get a book
instead of having the librarian (like the zcatalog) go and get it. It
might be more complicated, you have to understand how navigate a card
catalog and library to find a "book", but if you need to do research
while your inside the library to eventually decide what book you
really want, it's more efficient than having to send the librarian in
many times with research instructions.
Another way of thinking about agents it is that you can send different
librarians into the library for you, each trained to look for books
using their own custom techniques. Some may use reasoning and
inference to get books, others might do keyword searching, one might
just throw darts at the shelves. The point is the application should
control the query logic, not the storage (the library itself).
Soon I hope to have some more online resources, like a repo and a
mailing list, to devote to the Zope 3 prototype and get
a real Zope 3 site up and running on Zemantic.org with a demo and
wiki. For now if you have any questions, please feel free to email
me.
Thanks,
-Michel
More information about the Zope3-dev
mailing list