Hi Martin,
Is is possible tolet Zope check if a link is correct. I think this code is the way to go: http://www.zopelabs.com/cookbook/1014075524
However, you could replace the following lines: lines = f.readlines() for n in lines: ### concatonate lines to string whole_page = whole_page + n for: whole_page=f.read() which is more efficient. I also noticed that that ZServer has a bad habit (I don't know if this happens with other web servers too), so, not found pages have the title: <title>Zope</title> So, the title regular expression will think that it found a valid page. If you use urllib2 instead of urllib, then this problem will go away since a HTTPError will be returned instead. However, you will have to modify the code to catch the right exceptions: URLError and HTTPError. I don't know if the part, where the whole page is read and the title tag is parsed, is still needed with this change. If the title parsing is still needed, I found that there is a problem with the regular expression on that site. It raises an IndexError when having the title of the page on different lines: <title> This is the title </title> It always supposes that the title is in one line: <title>This is the title</title> Test it on the python console:
title_tag_line = re.compile(r"(<title>.*?</title>)",re.I) testTitle='<title>\n This is the title\n</title>' title = title_tag_line.findall(testTitle) title []
Which returns [] and what you realy want is: [' This is the title\n'] I think the problem can be corrected with a valid regular expression, but I'm not an expert. Perhaps you or somebody else can figure it out. Regards, Josef