[Zope] making a static snapshot

Paul Winkler pw_lists@slinkp.com
Thu, 1 May 2003 09:08:42 -0400


On Thu, May 01, 2003 at 12:55:50PM +0200, martin f krafft wrote:
> i wonder if there are ways to create a static snapshot of a Zope
> site. standard mirroring tools exist, sure, but due to acquisition,
> one object can potentially have one million URLs, so standard
> mirroring won't work. has anyone else thought about this?

yup. It's a pain. May I suggest:
1) don't use relative URLs,
2) avoid using relative URLs, and
3) get rid of relative URLs.

with some attention to that, a careful combination of wget 
options works pretty well for me. --mirror is not adequate:

wget -r -l8 -p -nH -nc -np -k -E 

or, more verbosely:

wget --recursive --level=8 --page-requisites \
 --no-host-directories --no-clobber --no-parent \
 --convert-links --html-extension

The "level" setting depends of course on how deep your site gets.
Note that this causes some renaming, so you may find that the output
includes things like index_html.html. But links are automatically
fixed so the result just works even if it's not 100% identical to
the original.

By saving the standard error from wget and grepping it for errors, i get
link-checking at the same time :-)
i also wrote a trivial python script that searches through
the stderr log and counts downloads with the same basename, which helps me track
down problematic relative URLs. When "main.css" gets downloaded 200
times, that's a clue. :-)

-- 

Paul Winkler
home:  http://www.slinkp.com
"Muppet Labs, where the future is made - today!"