[Zope] Scan for URLs from string
Jo Meder
jo@meder.de
Sun, 8 Dec 2002 00:14:26 +0100
--rwEMma7ioTxnRzrJ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Am 07.12.2002, 20:14 Uhr
schryb Peter Bengtsson <mail@peterbe.com>:
> I would like to display piece of text where weblinks are converted to <A
> HREF...
> Just like when you're reading emails that contain links. Those parsing
> usually go by startswith("http://") or startswith("www.") I guess.
>
> Anybody got one? ...or has an idea of a module that already does this that
> I can nick.
We use the attached code (it doesn't use regular expressions, but it
works pretty well for us). It can add links to plain text or html,
doesn't destroy URLs inside tags (like in <param
codebase="http://somewhere...">) and recognizes "http://", "ftp://",
"www." and things containing "@".
Hope it helps.
Jo.
--
Internetmanufaktur Jo Meder ---------------------- Berlin, Germany
http://www.meder.de/ ------------------- fon: ++49-30-417 17 63 33
Kollwitzstr. 75 ------------------------ fax: ++49-30-417 17 63 45
10435 Berlin --------------------------- mob: ++49-170- 2 98 89 97
Public GnuPG-Key ---------- http://www.meder.de/keys/jo-pubkey.txt
--rwEMma7ioTxnRzrJ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="addhrefs.py"
import string
def minpositive(px,px1):
if px<0:
if px1>=0:
return px1
return -1
if px1<0:
return px
return min(px,px1)
def addhrefs(text):
tags=text.split("<")
newtags=[]
in_link=0
for tag in tags:
ltag=tag.lower()
if not tag:
newtags.append("")
continue
if ltag[0]=="a":
in_link=1
elif ltag[0:1]=="/a":
in_link=0
if not in_link:
p=tag.find(">")
if p<0:
p=0
tag_pre=tag[:p]
tag=tag[p:]
ltag=ltag[p:]
while tag:
is_maillink=0
has_address=ltag.find("www.")
has_address=minpositive(has_address,ltag.find("http://"))
has_address=minpositive(has_address,ltag.find("ftp://"))
has_address=minpositive(has_address,ltag.find("@"))
if has_address>=0:
address_start=has_address
while address_start>=p:
if tag[address_start] in (">"," ","\t","\r","\n",'"',"'"):
break
address_start-=1
if tag[address_start] in (">"," ","\t","\r","\n",'"',"'"):
address_start+=1
address_end=has_address
while address_end<len(tag):
if tag[address_end] in (" ","\t","\r","\n",'"',"'"):
break
address_end+=1
address=tag[address_start:address_end]
if address.find("@")>=0:
link="<a href='mailto:"+address+"'>"+address+"</a>"
else:
if address[:3].lower()=="www":
link="<a href='http://"+address+"'>"+address+"</a>"
else:
link="<a href='"+address+"'>"+address+"</a>"
tag_pre+=tag[:address_start]+link
tag=tag[address_end:]
ltag=ltag[address_end:]
else:
tag_pre+=tag
tag=ltag=""
tag=tag_pre
newtags.append(tag)
return string.join(newtags,"<")
--rwEMma7ioTxnRzrJ--