I am trying to use wget to make a static HTML copy of a Zope site. However, it seems that the 'view' files which contain the various views are incomplete. They seem to contain all the page formatting and all. However, the actual content (say, text in a document) is not there. They are not truncated. They are just not complete. Is it possible that that there is a tag or something that my browser is following to get this part of the content, and wget does not? I would have expected that the 'view' file would be sent complete from the zope server. Or? Or, maybe there is a better way to do this. I have a site that is jumping ship to a non-Zope location, and they want their content... -- Roger Oberholtzer <roger.oberholtzer@surbrunn.net>
--On Freitag, 4. Februar 2005 21:31 Uhr +0100 Roger Oberholtzer <roger.oberholtzer@surbrunn.net> wrote:
Or, maybe there is a better way to do this. I have a site that is jumping ship to a non-Zope location, and they want their content...
Maybe you should not use Zope for producing static sites? This approach still appears to me as broken-by-design....it's like writing firmware for the injection of a Mercedes and then trying to run the same software for controlling an electric wheel chair. -aj
That is not what I am doing. The site is currently a happy dynamic Zope site. It is just that the site owners want to move elsewhere and no longer want the Zope site. But they want the existing content to put in their new static boring site. This my use of wget. Another interesting thing about using wget with the Zope site is what happens if you have a calendar a ĺa Plone. The links to each year are followed on and on. And, as each year is at the same level in the hierarchy, the level limiting for wget has no effect. What happens is that wget can run forever, following the years in the calendar. On Sun, 2005-02-06 at 16:44 +0100, Andreas Jung wrote:
--On Freitag, 4. Februar 2005 21:31 Uhr +0100 Roger Oberholtzer <roger.oberholtzer@surbrunn.net> wrote:
Or, maybe there is a better way to do this. I have a site that is jumping ship to a non-Zope location, and they want their content...
Maybe you should not use Zope for producing static sites? This approach still appears to me as broken-by-design....it's like writing firmware for the injection of a Mercedes and then trying to run the same software for controlling an electric wheel chair.
-aj _______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
On Sun, Feb 06, 2005 at 05:15:50PM +0100, Roger Oberholtzer wrote:
That is not what I am doing. The site is currently a happy dynamic Zope site. It is just that the site owners want to move elsewhere and no longer want the Zope site. But they want the existing content to put in their new static boring site. This my use of wget.
It should "just work". Having no knowledge of how your Zope site is put together, I for one have no idea what could be wrong. wget has a lot of options that are worth exploring. For producing a locally browsable static copy of zope and CMF content, I eventually settled on this, which changes some file extensions and rewrites links to point to the local version: wget -nc -r -l8 -p -nH --no-parent --convert-links --html-extension It's not perfect, as for a folder named "foo" you may end up with both foo.html and foo/index_html.html, both having the same content. It also helps if you don't have runaway URLs: i.e. relative links in your navigation that lead to wget traversing the same object over and over with URLs like http://foo/bar/baz/baz/baz/baz/baz/baz/...
Another interesting thing about using wget with the Zope site is what happens if you have a calendar a ?a Plone. The links to each year are followed on and on. And, as each year is at the same level in the hierarchy, the level limiting for wget has no effect. What happens is that wget can run forever, following the years in the calendar.
Maybe some work on robots.txt could help with this? Don't know. -- Paul Winkler http://www.slinkp.com
On Feb 4, 2005, at 3:31 PM, Roger Oberholtzer wrote:
Or, maybe there is a better way to do this. I have a site that is jumping ship to a non-Zope location, and they want their content...
We have all of our content produced in a large CMF based application, but at the very last minute (or at some point soon afterwards) we had to change the delivery tier to a set of static HTML pages. The initial implementation of that feeds this static delivery tier was done with a spidering application. The way this spider was designed to work was for every Zope URL (http://www.example.com/foo/bar) to create a directory $DOCROOT/foo/bar and then write the content into $DOCROOT/foo/bar/index.html. The zope links would remain the same, but the request for /foo/bar would result in a redirect to /foo/bar/ before returning content. As for the problem you mentioned about missing content. Are you using something like cookie crumbler for authentication? Maybe your spider is not keeping and re-sending the cookies.
On Mon, 2005-02-07 at 11:13 -0500, Andrew Langmead wrote:
As for the problem you mentioned about missing content. Are you using something like cookie crumbler for authentication? Maybe your spider is not keeping and re-sending the cookies.
Nope. And the command (wget) implements cookies and uses then by default. It will even save and reload cookies across invocations (if you ask it to). Not all files had the content problem. So I think there will be a bit of cut and paste for the missing bits... -- Roger Oberholtzer <roger.oberholtzer@surbrunn.net>
participants (4)
-
Andreas Jung -
Andrew Langmead -
Paul Winkler -
Roger Oberholtzer