[ZWeb] FYI: Readying the "Zope Download Center"

Sun, 12 Jan 2003 15:22:59 +0100

On dimanche, jan 12, 2003, at 15:02 Europe/Paris, Guido van Rossum 
wrote:

>>> Have you read my general mirroring proposal?  The help we need
>>> right now would concern the architecture, design and
>>> implementation of the system I propose there.  It shouldn't be
>>> much code, since the essence of mirroring can be done using rsync
>>> or ftpmirror.py, but we need tools to find downloadable bits, and
>>> to collate download stats.
>>
>> Sure, though given the history of such efforts, I don't think we
>> should wait for new software before fixing this problem.  We've
>> talked for over two years about writing software to collect download
>> stats, even before spreading download across multiple sites!
>
> Hm.  For python.org, we use something called Webalizer, which scans
> Apache-style log file.  Since Zope wrotes those too, I'm not sure what
> the problem is for a single site.  Integrating the results back into
> Zope?

I did this back in July.  Here's the problems I discovered:

1) URLs make it hard to accurately say "what is zope".  There's 
different versions of Zope, there's patch files, there are the 
different binaries.

2) The log files are currently being munched by webalizer and then 
thrown out.  Thus, if the URL you want isn't in the list of top 20 
hits, you're screwed.  All the information is lost, since the original 
data are deleted.  (This was the problem that blocked me, as I never 
had access to the boxes where the munching was done).

I was able to extract some useful information and compile it into an 
Excel spreadsheet last summer.  But it wasn't authoritative.

>> The SF-based Download Center will be a step forward.
>
> Definitely.
>
>> Improving a Zope-based download, based on new software and a new
>> mirror network, is also a good idea.
>
> Thanks.
>
>>> It would be best to coordinate with Sidnei, who's doing a
>>> "downloadable product" for NZO.
>>
>> Actually I wrote the original one, so I have a rough idea what it 
>> does.
>
> :-)
>
>>   One of the continuing problems: everyone says counters shouldn't be
>> done inside the ZODB (though getting a clear answer on this isn't 
>> easy).
>
> It's true.  (Anecdote: during my brief stint at BeOpen in 2000, we
> tried to use Zope for a commercial Python site.  The designers had
> made a very attractive layout, and some Zope contractors had
> implemented it all in Zope.  Unfortunately they had used a naive
> ad-serving product that used ZODB counters, and nobody understood why
> the site's Data.fs grew to unacceptable sizes until we called in Jim
> Fulton who quickly diagnosed the problem.)
>
> Three solutions come to mind:
>
> - Use an auto-packing Berekely storage

It would have to be on the ZEO side, so that all the clients put data 
in the same place.  Which might mean a lot of ZEO messages, but I don't 
know if that's a problem.

> - Use an undo-less separate storage for the counters

Ditto.

> - Don't update the counters directly in ZODB; nightly, computer
>   counters from log files and update them in ZODB

Probably the best idea.

All of these problems suffer from the same issue: nobody wants to touch 
the current zope.org software.  Thus, this would need to wait for nzo, 
I imagine.

Perhaps we could at least avoid throwing out the original Apache data?

--Paul