[Zope] Re: Python Code Repository?

Sat, 10 Jul 1999 10:14:21 -0400

"A.M. Kuchling" wrote:
> Michel Pelletier (the one at Digital Creations) was working on a
> ZTrove at one point, I believe.  (For information on the whole idea of
> Trove, see http://www.tuxedo.org/~esr/trove/).  I don't know what's
> happened to ZTrove; anyone from DC care to comment?

OK, lonnnggg post ahead.

Zope Product Library

  With the progress made on Zope2 and the Zope Portal Toolkit (PTK),
  we've made enough progress to begin construction of a "Freshmeat"
  package library for Zope software.  (Note: the jargon of "product",
  "package", "release", and "version" is awfully contentious.)

  Since this is effectively the same ground being covered here, I'd
  like to take a moment to describe what we are doing and see if folks
  here on the Python list find it interesting.  I'll also cross-post
  to the Zope list.  The desired outcome is for our experience and
  software to provide a boost to a real effort.  (Historical note: I
  was the shepherd for the original ill-fated "Locator-SIG".)

  Goals

    o Fast location of desired software

    o Keep central admin requirements *low* by distributing control

    o Allow package authors to administer their entries

    o Adhere to appropriate standards and initiatives

  Zope Product Library

    In our system, Authors create Products and provide Releases of
    those Products.  A Release is a version of a Product for a
    platform (if binary) or a source version in some packaging scheme
    (such as RPM).  Thus, a Release is the bits and a Product is the
    container for all information about those bits.

    The things capitalized in the above are "ZClasses".  For you
    non-Zopistas, a ZClass is like a Python class that you build
    through the web and store in the Zope object database.  It is an
    instance that can make instances.  It also provides a simple way
    to store "your kinds of things" in the Zope object database.

    The private, under-construction new zope.org site has a membership
    system where we give people a Zope home folder to stash their
    stuff.  They pick from a list of things they can add, such as
    uploaded images, HTML documents, etc.

    People with higher privilege -- such as an Author -- can also
    choose "Product" from the list of things they can add to their
    folder.  When they do so, the Product gets added to the site
    catalog.

    A Product is like a folder -- you can stick a home page, other
    docs, PDF files, etc. in it.  More importantly, inside a Product
    you can add a Release.  This will provide the kind of thing seen
    in Freshmeat.

    Thanks to acquisition, much of the information about a Release
    (author name and email, description, etc. ) comes from the Product
    and doesn't have to be repeated.

    The ZCatalog is awfully useful for this kind of system.  First, it
    is (almost) always up-to-date.  Whenever you add a Product or
    Release, or edit one, or delete one, the Zope transaction also
    updates the catalog.  Next, ZCatalog's indexing engine does both
    fielded and full-text searches.

    Another nice feature is that objects control how they are indexed,
    as the Catalog is calling methods on the site content to get at
    indexed content.  For instance, if you decided you'd like to untar
    the release and index the docstrings, go right ahead.  As long as
    you can implement a Python method to do it.  If you want the
    full-text to actually come from four properties about a Product,
    piece of cake.  If you want to create a Remote Product listing,
    where the content is retrieved by urllib for indexing, you can
    quickly do that as well.

    Once the content is in the ZCatalog, well, Bob's your uncle.  You
    can create multiple interactive search forms.  You can query the
    catalog to insert a tree on a page for browsing the contents.  You
    can create multiple presentations (by date like Slashdot and
    Freshmeat, by category like ODP, by author, by platform, etc.)

    More interestingly, you can get the Catalog to dish up requests as
    RDF.  Imagine browsing the package library using the tree control
    in Mozilla, which gets fed by RDF.  Or, imagine a command-line
    Python script that acts like rpmfind.

  What About Trove and ZTrove?

    This isn't Trove.  This *could* be Trove, but we are focused on
    getting something in place fast on zope.org.  We are solving our
    own problems first.

    Once we have made all our mistakes and learned from them, it would
    definately be time to talk about Trove and a Python package
    locator.

  ZCatalog

    The main advance that allows a high-end package system is the
    "ZCatalog",
    http://www.zope.org/Download/ZCatalog/ZCatalogTutorial.  This
    tutorial provides a good rundown of all the neat things about the
    catalog.

    Does the Catalog scale?  We think so, but that's a hard question.
    We have a *lot* of painful, painful experience in this area, based
    on a big consulting job we did a few years ago, and those lessons
    are reflected in the software.  Our indexing works very hard at
    keeping memory footprint moderately in check and dishing up
    incredibly fast searches.

    As a datapoint, we took a 15 Mb chunk of the rufus RPM database
    and loaded it into the indexer.  A combination fielded and
    full-text search took 0.004 seconds on a PII-350 (inside the
    indexer, formatting the response and returning it took a lot
    longer).

  CPAN and rufus.w3.org

    It appears to me that CPAN does the following functions:

      o Software repository for Perl code

      o Mirrored worldwide

      o Routes requests to a correct, available archive

    A related system that I know more about is rufus.w3.org's RPM
    database.  It adds a particularly useful feature that I *think*
    CPAN has: a command-line tool to query repositories and fetch
    packages for installation.  I unfortunately don't know much about
    the workflow of how package listings get into the repository, then
    how they get updated.

    I'll observe that the searchability for both of these systems is
    awfully weak.  For us in Zopeland, this is a major opportunity.

  The Mythical Python Locator

    So, what exactly *is* being looked for here?  A simple package
    registry?  Something integrated with dist-utils?  How much should
    this system scale?  What standards should it piggy-back?  Is it
    centralized or decentralized, both in implementation and
    administration?  What jargon do we agree to and what is the
    metadata we will choose?

    And most important, is anybody willing to actually work on it?

    In closing, though I'm skeptical that this will move from "brief
    flurry of activity" into accomplished project, I think this is an
    area where a definate disadvantage can be turned into a clear
    advantage.  The goal shouldn't be just to catch up.  The goal
    should be go farther, to show off and taunt, then to rub the
    others' noses in it. :^)