[Zope] Re: Let Zope chec a internal / external link

Tue Sep 21 14:24:01 EDT 2004

Hi Martin,

> Is is possible tolet Zope check if a link is correct.
I think this code is the way to go:
http://www.zopelabs.com/cookbook/1014075524

However, you could replace the following lines:

lines = f.readlines()
for n in lines: ### concatonate lines to string
   whole_page = whole_page + n

for:
whole_page=f.read()

which is more efficient.

I also noticed that that ZServer has a bad habit (I don't know if this 
happens with other web servers too), so, not found pages have the title:

<title>Zope</title>

So, the title regular expression will think that it found a valid page. 
If you use urllib2 instead of urllib, then this problem will go away 
since a HTTPError will be returned instead. However, you will have to 
modify the code to catch the right exceptions: URLError and HTTPError. I 
don't know if the part, where the whole page is read and the title tag 
is parsed, is still needed with this change.

If the title parsing is still needed, I found that there is a problem 
with the regular expression on that site. It raises an IndexError when 
having the title of the page on different lines:

<title>
   This is the title
</title>

It always supposes that the title is in one line:

<title>This is the title</title>

Test it on the python console:

 >>> title_tag_line = re.compile(r"(<title>.*?</title>)",re.I)
 >>> testTitle='<title>\n  This is the title\n</title>'
 >>> title = title_tag_line.findall(testTitle)
 >>> title
[]

Which returns [] and what you realy want is:
['  This is the title\n']

I think the problem can be corrected with a valid regular expression, 
but I'm not an expert. Perhaps you or somebody else can figure it out.

Regards,
Josef