On Thu, May 01, 2003 at 12:55:50PM +0200, martin f krafft wrote:
i wonder if there are ways to create a static snapshot of a Zope site. standard mirroring tools exist, sure, but due to acquisition, one object can potentially have one million URLs, so standard mirroring won't work. has anyone else thought about this?
yup. It's a pain. May I suggest: 1) don't use relative URLs, 2) avoid using relative URLs, and 3) get rid of relative URLs. with some attention to that, a careful combination of wget options works pretty well for me. --mirror is not adequate: wget -r -l8 -p -nH -nc -np -k -E or, more verbosely: wget --recursive --level=8 --page-requisites \ --no-host-directories --no-clobber --no-parent \ --convert-links --html-extension The "level" setting depends of course on how deep your site gets. Note that this causes some renaming, so you may find that the output includes things like index_html.html. But links are automatically fixed so the result just works even if it's not 100% identical to the original. By saving the standard error from wget and grepping it for errors, i get link-checking at the same time :-) i also wrote a trivial python script that searches through the stderr log and counts downloads with the same basename, which helps me track down problematic relative URLs. When "main.css" gets downloaded 200 times, that's a clue. :-) -- Paul Winkler home: http://www.slinkp.com "Muppet Labs, where the future is made - today!"