Let Zope chec a internal / external link
Hello, Is is possible tolet Zope check if a link is correct. Example: I've a Photoalbum an news items. These are connected with a link. But whem i move a news item to a archive, the link becomes broken. How can I prevent this error ? Is it also possible to check external links ? 'If a link doesn't exist anymore there stilis a broken link on the website. Question Can I preven zop displaying broken links ? I know this feature from tridion CMS (my seccond love ;-) Thanks for the help. Martin Koekenberg.
<snip> ----- Original Message ----- From: Martin Koekenberg Is is possible tolet Zope check if a link is correct. Example: I've a Photoalbum an news items. These are connected with a link. But whem i move a news item to a archive, the link becomes broken. How can I prevent this error ? Is it also possible to check external links ? 'If a link doesn't exist anymore there stilis a broken link on the website. <snip> There are some good tools out there that can check html links for you, such as linklint (http://www.linklint.org/). <snip> Question Can I preven zop displaying broken links ? <snip> Zope doesn't know if an off-site url your are embedding in your html is live or not (unless you test each link before you create the html - very ugly and who knows how long the link will stay good for?!). Jonathan
Hi Martin,
Is is possible tolet Zope check if a link is correct. I think this code is the way to go: http://www.zopelabs.com/cookbook/1014075524
However, you could replace the following lines: lines = f.readlines() for n in lines: ### concatonate lines to string whole_page = whole_page + n for: whole_page=f.read() which is more efficient. I also noticed that that ZServer has a bad habit (I don't know if this happens with other web servers too), so, not found pages have the title: <title>Zope</title> So, the title regular expression will think that it found a valid page. If you use urllib2 instead of urllib, then this problem will go away since a HTTPError will be returned instead. However, you will have to modify the code to catch the right exceptions: URLError and HTTPError. I don't know if the part, where the whole page is read and the title tag is parsed, is still needed with this change. If the title parsing is still needed, I found that there is a problem with the regular expression on that site. It raises an IndexError when having the title of the page on different lines: <title> This is the title </title> It always supposes that the title is in one line: <title>This is the title</title> Test it on the python console:
title_tag_line = re.compile(r"(<title>.*?</title>)",re.I) testTitle='<title>\n This is the title\n</title>' title = title_tag_line.findall(testTitle) title []
Which returns [] and what you realy want is: [' This is the title\n'] I think the problem can be corrected with a valid regular expression, but I'm not an expert. Perhaps you or somebody else can figure it out. Regards, Josef
Hi, Am Di, den 21.09.2004 schrieb Martin Koekenberg um 18:32:
Hello,
Is is possible tolet Zope check if a link is correct.
Example: I've a Photoalbum an news items. These are connected with a link. But whem i move a news item to a archive, the link becomes broken. How can I prevent this error ?
Is it also possible to check external links ? 'If a link doesn't exist anymore there stilis a broken link on the website. Question Can I preven zop displaying broken links ?
I know this feature from tridion CMS (my seccond love ;-)
Thanks for the help.
Yes it is. You need a basic html parser (at least a regular expression), some scripts, and /or a derived document object (from ZPT for examle), ZCatalog and a crontab entry ( for regular check of external links). Depending on skills and experience it should take you from 1 to 5 days. Replace all links on upload/document change with a call to your internal link checker like this: <a href="http://foo.bar.com/foo/bar">Foobar</a> becomes: <a tal:condition="here/check/12345" href="http://foo.bar.com/foo/bar">Foobar</a> Where check must be an object which knows 12345 is the key for entry http://foo.bar.com/foo/bar looks it up in the ZCatalog (see "catalog almost everything" howto) and returns true or false depending on the state in catalog in a method __getitem__ on this check object. Ideally all your documents are catalogaware so you can check their existence when you change something. External links have to be checked via cronjob and urllib I'd say. Regards Tino
participants (4)
-
Jonathan Hobbs -
Josef Meile -
Martin Koekenberg -
Tino Wildenhain