[Zope-dev] Re: RDF Musings and TinyTables

Thu, 20 Feb 2003 23:14:55 +0100

On jeudi, f=E9v 20, 2003, at 22:15 Europe/Paris, Shane Hathaway wrote:

[snip]

> With all this in mind, I just studied my Mozilla mimeTypes.rdf file=20
> again.  At first, this file looks nasty.  I've only defined handlers=20=

> for two mime types, application/pdf and application/x-zope-edit, yet=20=

> the string "application/pdf" shows up 8 times in the file!  I only=20
> typed it once. ;-)

Good news, though, it's *really* compressible. :^)  gzip can get 20-1.

> But if I think of RDF files as database export files (or maybe the=20
> results of a database query), it all makes sense.

I hate to say it, but I can actually read RDF now.

> - The order in which the RDF elements appear in the file doesn't=20
> matter, just like the physical order of inodes on a hard disk doesn't=20=

> matter.

With Mozilla, it goes one step further.  A graph can have multiple=20
datasources, which inject data into the graph.  These datasources can=20
get resources from different servers, from different kinds of content=20
(IMAP, bookmarks, etc.)

Thus, not only does physical order not matter, but location doesn't=20
matter either.

> - The obvious way to read this file is to search for XML elements that=20=

> look like '<RDF:Description about=3D"urn:mimetype:*">'.  But that's =
not=20
> the right way: that's like scanning filesystem inodes sequentially.=20
> Instead, there is a root URI, "urn:mimetypes", and the RDF elements=20
> make connections to other elements from there.

That's right.  The URN structure was a trap I fell into until very=20
recently.  I thought I would order my universe using a URN hierarchy. =20=

But I realized that it had no meaning and no use.  When I next=20
refactor, I'll move to a flat model of=20
"urn:x-moztop:realmid:resourceid", where the id's are immutable SHA=20
calculations.

As an aside: Zope 3 should have an immutable, placeless object=20
identifier, but I lost that debate on #zope3-dev pretty badly. ^)

The ids should have no meaning.  All the meaning should go in the RDF=20
properties, so you can do something with it.

This is another hidden meaning in RDF: properties are first-class=20
resources, in addition to property values.

Lately I've been thinking more about distributed content management and=20=

mobile content management, so these kinds of things are more important=20=

for me.  When you gather up a bunch of content from a bunch of=20
loosely-coupled places, how do you make sense of it?  If you have a=20
document on your laptop and on your website, should they be considered=20=

the same logical document?

> - RDF is hard to read, but legibility by humans isn't its primary=20
> focus.  It's more concerned with providing a way to declare any=20
> relationship about anything.

Right.  That's what the graph tool at the W3C online validator is for.=20=

:^)  Just throw it some RDF and let it draw a picture for you.

>> The ad-hoc part is, for me, the key.  Relational theory provided the=20=

>> theoretical foundation for modern online transaction processing.  But=20=

>> things like content management are a much different problem.  (One=20
>> analyst states that unstructured content is 80% of the information in=20=

>> a business.)
>> RDF, in my view, is the equivalent of a "set theory", a formal=20
>> foundation, for content management.  Without it, everyone has to=20
>> build their own "framework" for stitching things together, for=20
>> connecting the dots.
>
> So RDF seems like a replacement for, or maybe enhancement of,=20
> relational theory.  But I wonder how object-oriented databases fit in=20=

> the mix?

Good point.  IMO, classic OODBMS want you to know more in advance than=20=

RDF.  Also, the relationships are programmed, not assembled (perhaps=20
that isn't clearly stated).

>>> Serialization of RDF into XML and the relationship between RDF and=20=

>>> the Semantic Web are distinct concepts from RDF theory.
>> That's right.  I've always been surprised when I threw some RDF/XML=20=

>> into Mozilla, then got a dump of the serialized results.  What I put=20=

>> in doesn't look like what I get out.  That's because there is an=20
>> abstract model.  The XML can look a couple of different ways, and you=20=

>> still have the same abstract model.
>
> How do you (1) throw RDF into Mozilla and (2) get a dump of the=20
> results?  Is there a utility for doing this?

Yes.  There's the hard way and the easy way.  For the hard way, you use=20=

XPCOM to grab the datasource, get a component to serialize it, and run=20=

some methods on it.  (I say "hard way", really, it's probably 5 lines=20
of JS.  Long lines.)

However, I'm using rdfds from XulPlanet (which is the best=20
documentation site for any project I've ever seen):

   http://www.xulplanet.com/tutorials/xultu/rdfds/

For a very quick and useful RDF introduction, read chapter six of the=20
XUL tutorial:

  http://www.xulplanet.com/tutorials/xultu/

With rdfds, getting a serialized version is simple:

   var ds =3D new RDFDataSource("http://www.zope.org/some.rdf");
   alert(ds.serializeToString());

> Are the results in RDF, and are they pretty much equivalent to "cat=20
> file1 file2"? :-)

The result of serializeToString is indeed RDF, and it is not even close=20=

to cat file1 file2. :^)  Logically it is exactly the same.  The string=20=

itself, though, will look quite different than the two input files. =20
The serializer can make up some URNs for anonymous resources, it can=20
give new namespace prefixes to namespaces you declare, etc.

More important, it likes to rearrange...

Wait, that's more detail than you probably want. :^)

>> It took me a while, but I learned how to take advantage of this. =20
>> With Moztop, I'm taking a pretty loose, distributed approach to=20
>> content managment.  I collect RDF from a bunch of different servers,=20=

>> throw it all into one big graph, and use this to draw widgets on the=20=

>> screen.
>> The ability to make an assertion into a completely different part of=20=

>> the tree is something you can't do in XML.
>>> This ad-hoc data storage made me think of TinyTables.  TinyTables is=20=

>>> a good Zope product that fills the need for simple tables of data,=20=

>>> but it needs attention.  What if it got replaced by some Zope=20
>>> product called
>> I will do everything in the universe to help such a project.  How is=20=

>> that? :^)
>> I know what the practical benefits that RDF can mean for content=20
>> management.  And it isn't esoteric Semantic Gibberish.  I'm unable,=20=

>> though, to map it on the server side.  However, I'm having luck on=20
>> the client side:
>>   http://www.zope-europe.org/Members/paul/tmp/moztop-pinstripe.png
>
> I can see that the benefits on the client side would be enormous.  For=20=

> interfacing clients to Zope, we've always thought in two directions:=20=

> either connect the client via ZEO, or have the client call remote=20
> procedures that return lists and strings.  The ZEO client idea would=20=

> be fast and easy, but the client would get unrestricted access to the=20=

> whole database.  The remote procedures would be secure but potentially=20=

> slow, since the client usually needs more than one list or string.

Right.

> But if the client requests RDF using a remote procedure call, the=20
> server can send back everything at once that it considers relevant. =20=

> Hmmm... but I bet there's more to it.

Yes, but it's a good "more".

Right now I have a composite datasource (one that gets its data from=20
multiple locations).  It is getting fed by RDF from a CMF (where a page=20=

template serializes the entire portal_catalog) and from a Zope 3. =20
Later I'll add a file system source.  All of these generate resources=20
and relationships that get thrown into a bucket.

I use a couple of RDF properties to hold things together: <nc:subitems>=20=

to indicated tree containment, <dc:title> and <site:resourcetype> (site=20=

is my own namespace) to provide connections into labels for the UI, and=20=

then CSS styling for icons and whatnot.

> Here's a fantasy... the ability to write a template that can be=20
> processed either by Zope or by the client.  When the client is able to=20=

> do the work, send the template and bunch of RDF.  When the client=20
> can't do it, preprocess it.  This is what XSLT always wanted to be=20
> able to do, but I couldn't see it getting there.  Maybe RDF can make=20=

> this a reality? :-)

If all you really want is to take data and draw stuff on the screen,=20
XSLT can do it.  However, XSLT, and I'm convinced the XML underneath,=20
doesn't really help construct a complete interface.  And certainly not=20=

one that is based on a rich content model.

> This email is getting big, so I'll cut it off here for now.  I'll=20
> study the XUL templates.

Cool.  Drop by #moztop sometime and say hi.

--Paul=