[Zope] Re: Python Code Repository?
Paul Everitt
paul@digicool.com
Sat, 10 Jul 1999 10:14:21 -0400
"A.M. Kuchling" wrote:
> Michel Pelletier (the one at Digital Creations) was working on a
> ZTrove at one point, I believe. (For information on the whole idea of
> Trove, see http://www.tuxedo.org/~esr/trove/). I don't know what's
> happened to ZTrove; anyone from DC care to comment?
OK, lonnnggg post ahead.
Zope Product Library
With the progress made on Zope2 and the Zope Portal Toolkit (PTK),
we've made enough progress to begin construction of a "Freshmeat"
package library for Zope software. (Note: the jargon of "product",
"package", "release", and "version" is awfully contentious.)
Since this is effectively the same ground being covered here, I'd
like to take a moment to describe what we are doing and see if folks
here on the Python list find it interesting. I'll also cross-post
to the Zope list. The desired outcome is for our experience and
software to provide a boost to a real effort. (Historical note: I
was the shepherd for the original ill-fated "Locator-SIG".)
Goals
o Fast location of desired software
o Keep central admin requirements *low* by distributing control
o Allow package authors to administer their entries
o Adhere to appropriate standards and initiatives
Zope Product Library
In our system, Authors create Products and provide Releases of
those Products. A Release is a version of a Product for a
platform (if binary) or a source version in some packaging scheme
(such as RPM). Thus, a Release is the bits and a Product is the
container for all information about those bits.
The things capitalized in the above are "ZClasses". For you
non-Zopistas, a ZClass is like a Python class that you build
through the web and store in the Zope object database. It is an
instance that can make instances. It also provides a simple way
to store "your kinds of things" in the Zope object database.
The private, under-construction new zope.org site has a membership
system where we give people a Zope home folder to stash their
stuff. They pick from a list of things they can add, such as
uploaded images, HTML documents, etc.
People with higher privilege -- such as an Author -- can also
choose "Product" from the list of things they can add to their
folder. When they do so, the Product gets added to the site
catalog.
A Product is like a folder -- you can stick a home page, other
docs, PDF files, etc. in it. More importantly, inside a Product
you can add a Release. This will provide the kind of thing seen
in Freshmeat.
Thanks to acquisition, much of the information about a Release
(author name and email, description, etc. ) comes from the Product
and doesn't have to be repeated.
The ZCatalog is awfully useful for this kind of system. First, it
is (almost) always up-to-date. Whenever you add a Product or
Release, or edit one, or delete one, the Zope transaction also
updates the catalog. Next, ZCatalog's indexing engine does both
fielded and full-text searches.
Another nice feature is that objects control how they are indexed,
as the Catalog is calling methods on the site content to get at
indexed content. For instance, if you decided you'd like to untar
the release and index the docstrings, go right ahead. As long as
you can implement a Python method to do it. If you want the
full-text to actually come from four properties about a Product,
piece of cake. If you want to create a Remote Product listing,
where the content is retrieved by urllib for indexing, you can
quickly do that as well.
Once the content is in the ZCatalog, well, Bob's your uncle. You
can create multiple interactive search forms. You can query the
catalog to insert a tree on a page for browsing the contents. You
can create multiple presentations (by date like Slashdot and
Freshmeat, by category like ODP, by author, by platform, etc.)
More interestingly, you can get the Catalog to dish up requests as
RDF. Imagine browsing the package library using the tree control
in Mozilla, which gets fed by RDF. Or, imagine a command-line
Python script that acts like rpmfind.
What About Trove and ZTrove?
This isn't Trove. This *could* be Trove, but we are focused on
getting something in place fast on zope.org. We are solving our
own problems first.
Once we have made all our mistakes and learned from them, it would
definately be time to talk about Trove and a Python package
locator.
ZCatalog
The main advance that allows a high-end package system is the
"ZCatalog",
http://www.zope.org/Download/ZCatalog/ZCatalogTutorial. This
tutorial provides a good rundown of all the neat things about the
catalog.
Does the Catalog scale? We think so, but that's a hard question.
We have a *lot* of painful, painful experience in this area, based
on a big consulting job we did a few years ago, and those lessons
are reflected in the software. Our indexing works very hard at
keeping memory footprint moderately in check and dishing up
incredibly fast searches.
As a datapoint, we took a 15 Mb chunk of the rufus RPM database
and loaded it into the indexer. A combination fielded and
full-text search took 0.004 seconds on a PII-350 (inside the
indexer, formatting the response and returning it took a lot
longer).
CPAN and rufus.w3.org
It appears to me that CPAN does the following functions:
o Software repository for Perl code
o Mirrored worldwide
o Routes requests to a correct, available archive
A related system that I know more about is rufus.w3.org's RPM
database. It adds a particularly useful feature that I *think*
CPAN has: a command-line tool to query repositories and fetch
packages for installation. I unfortunately don't know much about
the workflow of how package listings get into the repository, then
how they get updated.
I'll observe that the searchability for both of these systems is
awfully weak. For us in Zopeland, this is a major opportunity.
The Mythical Python Locator
So, what exactly *is* being looked for here? A simple package
registry? Something integrated with dist-utils? How much should
this system scale? What standards should it piggy-back? Is it
centralized or decentralized, both in implementation and
administration? What jargon do we agree to and what is the
metadata we will choose?
And most important, is anybody willing to actually work on it?
In closing, though I'm skeptical that this will move from "brief
flurry of activity" into accomplished project, I think this is an
area where a definate disadvantage can be turned into a clear
advantage. The goal shouldn't be just to catch up. The goal
should be go farther, to show off and taunt, then to rub the
others' noses in it. :^)