[Zope] Dumping Zope (CMF) site to file system
Eugene
el-spam at yandex.ru
Thu Jul 8 05:19:34 EDT 2004
Hello David,
DCS> I want to set up a process for dumping my Zope CMF site to the
DCS> filesystem, to be served by Apache. I'm interested in anyone who's doing
DCS> this - what tools are you using. I'm trying Wget, but the main problem
DCS> is dealing with absolute URLs. I can use the Wget --convert-links
DCS> option, which removes the href attribute from the <base> tag and makes
DCS> internal links relative. However, I still have a problem with folders.
DCS> The absolute_url() method does not return a trailing slash for folders.
DCS> Wget downloads the URL folder_name as a file called folder_name, but it
DCS> downloads folder_name/ as folder_name/index.html. I have already written
DCS> a relativeURL() script based on
DCS> portal_url.getRelativeUrl(), but it
DCS> doesn't return a trailing slash either, so I'll have to add one.
Recently I've done this problem.
The solution is next.
1. Make all your URLs end with slash.
I did it manually, by correcting some lists in portlets,
and after that I found how to redefine absolute_url() function.
Please, look for it here:
2. Run wget (I'm doing it from my Zope as a reaction on some user
action) but it's also could be done with shell script like below.
Convert links in downloaded files, erase <base ..> tag.
Also I edit html files to delete 'index.html' from links - any URL
now ends with '/'. (*)
If you wish you may optimize file by killing white space - I found
white space takes about 30-40% of html file.
3. Publish your files.
Here's the script:
el at test[<<debug-1/bin]%cat mirror.sh
#!/bin/sh
param=$1
if test "$param" = ""; then
param='-r -l 1 -i ../etc/wget-list'
else
param="http://www.test/$param"
fi
wget -v -nH -k -p -X images -x -R index_html $param
for i in `find ./ -name '*.html'`;
do
infa=`cat $i`
infa=`echo $infa|sed -e 's/href="\([a-zA-Z0-9._/-]*\)\/index.html"/href="\1\/"/g' \
-e 's/="index.html"/=".\/"/g' -e 's/<base href=""[^/]*\/>/<!--here was base tag-->/'`
echo $infa > $i
done
======
File wget-list contains extra files need to be downloaded:
el at test[<<debug-1/bin]%cat ../etc/wget-list
http://www.test/
http://www.test/xtra/head.css
http://www.test/xtra/default.css
http://www.test/xtra/inside.css
====
Addition:
(*) - It's my mania. I hate URL with a lot of junk like
http://site/print1.html?foo=bar&sid=4759436545&vasya=pupkine&junks=true¬hing=many-many....
The best URL is in format as supposed Tim Bernes Lee:
http://site/section/subsection/page/
--
Best regards,
Eugene mailto:el-spam at yandex.ru
More information about the Zope
mailing list