making a static snapshot
i wonder if there are ways to create a static snapshot of a Zope site. standard mirroring tools exist, sure, but due to acquisition, one object can potentially have one million URLs, so standard mirroring won't work. has anyone else thought about this? thanks. -- martin; (greetings from the heart of the sun.) \____ echo mailto: !#^."<*>"|tr "<*> mailto:" net@madduck keyserver problems? http://keyserver.kjsl.com/~jharris/keyserver.html get my key here: http://madduck.net/me/gpg/publickey eleventh law of acoustics: in a minimum-phase system there is an inextricable link between frequency response, phase response and transient response, as they are all merely transforms of one another. this combined with minimalization of open-loop errors in output amplifiers and correct compensation for non-linear passive crossover network loading can lead to a significant decrease in system resolution lost. however, of course, this all means jack when you listen to pink floyd.
On Thu, May 01, 2003 at 12:55:50PM +0200, martin f krafft wrote:
i wonder if there are ways to create a static snapshot of a Zope site. standard mirroring tools exist, sure, but due to acquisition, one object can potentially have one million URLs, so standard mirroring won't work. has anyone else thought about this?
yup. It's a pain. May I suggest: 1) don't use relative URLs, 2) avoid using relative URLs, and 3) get rid of relative URLs. with some attention to that, a careful combination of wget options works pretty well for me. --mirror is not adequate: wget -r -l8 -p -nH -nc -np -k -E or, more verbosely: wget --recursive --level=8 --page-requisites \ --no-host-directories --no-clobber --no-parent \ --convert-links --html-extension The "level" setting depends of course on how deep your site gets. Note that this causes some renaming, so you may find that the output includes things like index_html.html. But links are automatically fixed so the result just works even if it's not 100% identical to the original. By saving the standard error from wget and grepping it for errors, i get link-checking at the same time :-) i also wrote a trivial python script that searches through the stderr log and counts downloads with the same basename, which helps me track down problematic relative URLs. When "main.css" gets downloaded 200 times, that's a clue. :-) -- Paul Winkler home: http://www.slinkp.com "Muppet Labs, where the future is made - today!"
1) don't use relative URLs,
Good advice. Not only is your site hard to mirror, but it's a robot trap. IMHO, cyclic URLs are a sign that you're mis-using aquisition. Another note: Make sure that links to folders end in slashes. Some mirroring programs can't handle the BASE HREF that Zope uses to correct the usage. Paul Winkler wrote:
On Thu, May 01, 2003 at 12:55:50PM +0200, martin f krafft wrote:
i wonder if there are ways to create a static snapshot of a Zope site. standard mirroring tools exist, sure, but due to acquisition, one object can potentially have one million URLs, so standard mirroring won't work. has anyone else thought about this?
yup. It's a pain. May I suggest: 1) don't use relative URLs, 2) avoid using relative URLs, and 3) get rid of relative URLs. ....
-- ______________________________________________________ Steve McMahon Reid-McMahon, LLC steve@reidmcmahon.com steve@dcn.org
On Thu, May 01, 2003 at 03:36:02PM -0700, Steve McMahon wrote:
1) don't use relative URLs,
Good advice. Not only is your site hard to mirror, but it's a robot trap. IMHO, cyclic URLs are a sign that you're mis-using aquisition.
Another note: Make sure that links to folders end in slashes. Some mirroring programs can't handle the BASE HREF that Zope uses to correct the usage.
yeah, that was a problem. The combination of wget flags that i posted solves this by saving things like: index_html.html (folder) index_html/foo (something acquired relative to index_html) -- Paul Winkler home: http://www.slinkp.com "Muppet Labs, where the future is made - today!"
also sprach Paul Winkler <pw_lists@slinkp.com> [2003.05.01.1508 +0200]:
1) don't use relative URLs, 2) avoid using relative URLs, and 3) get rid of relative URLs.
sot this (plone) portal I have can be seen through www.ailab.ch, but also through www.ifi.unizh.ch/newailab. The point is, the root of the tree may freely change, thanks to vhost monsters! My problem currently is that I'd like to be able to create links in structured text, ideally using / to identify the root of the portal. However, "link":/people/krafft, when viewed through www.ifi.unizh.ch/newailab will point to www.ifi.unizh.ch/people/krafft, which is, of course, a 404. so in the above, i have to use relative urls, or can you think of another way to do this? I tried "link":portal_url/people/krafft, and that works, but it makes the link show up as e.g. www.ifi.unizh.ch/newailab/research/portal_url/people/krafft, which is (a) ugly, (b) breaks navigation, and (c) makes bookmarking and other uses dangerous. thanks for any insights! -- martin; (greetings from the heart of the sun.) \____ echo mailto: !#^."<*>"|tr "<*> mailto:" net@madduck keyserver problems? http://keyserver.kjsl.com/~jharris/keyserver.html get my key here: http://madduck.net/me/gpg/publickey 1 + 1 = 3, for large values of 1
On Fri, May 02, 2003 at 02:59:22PM +0200, martin f krafft wrote:
also sprach Paul Winkler <pw_lists@slinkp.com> [2003.05.01.1508 +0200]:
1) don't use relative URLs, 2) avoid using relative URLs, and 3) get rid of relative URLs.
sot this (plone) portal I have can be seen through www.ailab.ch, but also through www.ifi.unizh.ch/newailab. The point is, the root of the tree may freely change, thanks to vhost monsters!
My problem currently is that I'd like to be able to create links in structured text, ideally using / to identify the root of the portal. However, "link":/people/krafft, when viewed through www.ifi.unizh.ch/newailab will point to www.ifi.unizh.ch/people/krafft, which is, of course, a 404.
so in the above, i have to use relative urls, or can you think of another way to do this?
hmmm ok i should have been more specific and less dogmatic. :) My current strategy is: relative URLs are OK in CMF content relative URLs are banned from CMF skin templates Our CMF content is not typically viewed in many places via acquisition, so this works. -- Paul Winkler home: http://www.slinkp.com "Muppet Labs, where the future is made - today!"
participants (3)
-
martin f krafft -
Paul Winkler -
Steve McMahon