[ZWeb] FYI: Readying the "Zope Download Center"

Guido van Rossum guido@python.org
Sun, 12 Jan 2003 11:24:25 -0500


> >>> Have you read my general mirroring proposal?  The help we need
> >>> right now would concern the architecture, design and
> >>> implementation of the system I propose there.  It shouldn't be
> >>> much code, since the essence of mirroring can be done using rsync
> >>> or ftpmirror.py, but we need tools to find downloadable bits, and
> >>> to collate download stats.
> >>
> >> Sure, though given the history of such efforts, I don't think we
> >> should wait for new software before fixing this problem.  We've
> >> talked for over two years about writing software to collect download
> >> stats, even before spreading download across multiple sites!
> >
> > Hm.  For python.org, we use something called Webalizer, which scans
> > Apache-style log file.  Since Zope wrotes those too, I'm not sure what
> > the problem is for a single site.  Integrating the results back into
> > Zope?
> 
> I did this back in July.  Here's the problems I discovered:
> 
> 1) URLs make it hard to accurately say "what is zope".  There's 
> different versions of Zope, there's patch files, there are the 
> different binaries.

I don't understand this.  Where do you need to know "what is Zope"
when scanning the log files?  Probably you skipped a step in your
reasoning.

> 2) The log files are currently being munched by webalizer and then 
> thrown out.  Thus, if the URL you want isn't in the list of top 20 
> hits, you're screwed.  All the information is lost, since the original 
> data are deleted.  (This was the problem that blocked me, as I never 
> had access to the boxes where the munching was done).

That's an operational problem that can be fixed.

> I was able to extract some useful information and compile it into an 
> Excel spreadsheet last summer.  But it wasn't authoritative.

Sorry, you fail to have proved that using Webalizer for zope.org won't
work.  All you have proved is that it didn't work within the
constraints you tried it last July.

> >>   One of the continuing problems: everyone says counters shouldn't be
> >> done inside the ZODB (though getting a clear answer on this isn't 
> >> easy).
[...]
> > Three solutions come to mind:
> >
> > - Use an auto-packing Berekely storage
> 
> It would have to be on the ZEO side, so that all the clients put data 
> in the same place.  Which might mean a lot of ZEO messages, but I don't 
> know if that's a problem.

Perhaps; I don't know if ZEO is currently a bottleneck (though I
somehow doubt it).

> > - Use an undo-less separate storage for the counters
> 
> Ditto.

But it could run on a separate storage server.

> > - Don't update the counters directly in ZODB; nightly, computer
> >   counters from log files and update them in ZODB
> 
> Probably the best idea.

Agreed.

> All of these problems suffer from the same issue: nobody wants to touch 
> the current zope.org software.  Thus, this would need to wait for nzo, 
> I imagine.

Of course it has to wait for NZO (see Sidnei's post).  I think we can
wait that much longer.  In the mean time, all I'm asking (and have
still not heard) is where on zope.org the pointers to the SF downloads
are.  If there aren't any, how are people going to find these?

And BTW, I wouldn't mind being added as an admin to the zope project
at SF, so I can help out with things.

> Perhaps we could at least avoid throwing out the original Apache data?

That would be a great start; I consider the historic Apache data from
python.org an essential resource.

I'll talk to our new sysadmin about this.  I imagine he'll be busy
though; please ping in a few weeks.

--Guido van Rossum (home page: http://www.python.org/~guido/)