[Zope] Scan for URLs from string

Jo Meder jo@meder.de
Sun, 8 Dec 2002 00:14:26 +0100


--rwEMma7ioTxnRzrJ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Am 07.12.2002, 20:14 Uhr
	schryb Peter Bengtsson <mail@peterbe.com>:

> I would like to display piece of text where weblinks are converted to <A 
> HREF...
> Just like when you're reading emails that contain links. Those parsing 
> usually go by startswith("http://") or startswith("www.") I guess.
> 
> Anybody got one? ...or has an idea of a module that already does this that 
> I can nick.

We use the attached code (it doesn't use regular expressions, but it
works pretty well for us). It can add links to plain text or html,
doesn't destroy URLs inside tags (like in <param
codebase="http://somewhere...">) and recognizes "http://", "ftp://",
"www." and things containing "@".


Hope it helps.

	Jo.


-- 
Internetmanufaktur Jo Meder ---------------------- Berlin, Germany
http://www.meder.de/ ------------------- fon: ++49-30-417 17 63 33
Kollwitzstr. 75 ------------------------ fax: ++49-30-417 17 63 45
10435 Berlin --------------------------- mob: ++49-170- 2 98 89 97
Public GnuPG-Key ---------- http://www.meder.de/keys/jo-pubkey.txt

--rwEMma7ioTxnRzrJ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="addhrefs.py"



import string


def minpositive(px,px1):
   if px<0:
      if px1>=0:
         return px1
      return -1
   if px1<0:
      return px
   return min(px,px1)


def addhrefs(text):
   tags=text.split("<")
   newtags=[]
   in_link=0
   for tag in tags:
      ltag=tag.lower()
      if not tag:
         newtags.append("")
         continue
      if ltag[0]=="a":
         in_link=1
      elif ltag[0:1]=="/a":
         in_link=0
      if not in_link:
         p=tag.find(">")
         if p<0:
            p=0
         tag_pre=tag[:p]
         tag=tag[p:]
         ltag=ltag[p:]
         while tag:
            is_maillink=0
            has_address=ltag.find("www.")
            has_address=minpositive(has_address,ltag.find("http://"))
            has_address=minpositive(has_address,ltag.find("ftp://"))
            has_address=minpositive(has_address,ltag.find("@"))
            if has_address>=0:
               address_start=has_address
               while address_start>=p:
                  if tag[address_start] in (">"," ","\t","\r","\n",'"',"'"):
                     break
                  address_start-=1
               if tag[address_start] in (">"," ","\t","\r","\n",'"',"'"):
                  address_start+=1
               address_end=has_address
               while address_end<len(tag):
                  if tag[address_end] in (" ","\t","\r","\n",'"',"'"):
                     break
                  address_end+=1
               address=tag[address_start:address_end]
               if address.find("@")>=0:
                  link="<a href='mailto:"+address+"'>"+address+"</a>"
               else:
                  if address[:3].lower()=="www":
                     link="<a href='http://"+address+"'>"+address+"</a>"
                  else:
                     link="<a href='"+address+"'>"+address+"</a>"
               tag_pre+=tag[:address_start]+link
               tag=tag[address_end:]
               ltag=ltag[address_end:]
            else:
               tag_pre+=tag
               tag=ltag=""
         tag=tag_pre
      newtags.append(tag)
   return string.join(newtags,"<")


--rwEMma7ioTxnRzrJ--