[Zope] Re: Let Zope chec a internal / external link
Josef Meile
jmeile at hotmail.com
Tue Sep 21 14:24:01 EDT 2004
Hi Martin,
> Is is possible tolet Zope check if a link is correct.
I think this code is the way to go:
http://www.zopelabs.com/cookbook/1014075524
However, you could replace the following lines:
lines = f.readlines()
for n in lines: ### concatonate lines to string
whole_page = whole_page + n
for:
whole_page=f.read()
which is more efficient.
I also noticed that that ZServer has a bad habit (I don't know if this
happens with other web servers too), so, not found pages have the title:
<title>Zope</title>
So, the title regular expression will think that it found a valid page.
If you use urllib2 instead of urllib, then this problem will go away
since a HTTPError will be returned instead. However, you will have to
modify the code to catch the right exceptions: URLError and HTTPError. I
don't know if the part, where the whole page is read and the title tag
is parsed, is still needed with this change.
If the title parsing is still needed, I found that there is a problem
with the regular expression on that site. It raises an IndexError when
having the title of the page on different lines:
<title>
This is the title
</title>
It always supposes that the title is in one line:
<title>This is the title</title>
Test it on the python console:
>>> title_tag_line = re.compile(r"(<title>.*?</title>)",re.I)
>>> testTitle='<title>\n This is the title\n</title>'
>>> title = title_tag_line.findall(testTitle)
>>> title
[]
Which returns [] and what you realy want is:
[' This is the title\n']
I think the problem can be corrected with a valid regular expression,
but I'm not an expert. Perhaps you or somebody else can figure it out.
Regards,
Josef
More information about the Zope
mailing list