[Zope] easy regular expression for URL fixup

Thomas B. Passin tpassin@mitretek.org
Wed, 6 Mar 2002 18:18:47 -0500


[Ed Colmar]


>
> Hey Tom
>
> Thanks for the reply...
>
> Those backslashes are for escaping the special characters (\w and .).  Do
> they need to be doubled in this case?
>
Yes, they are for escapeing the special characters once they get to the
regular expression, but they have to get there first. They have to be
doubled, or  an alternative is

HTMLFILE=r'/\\w*\\.html'
htmlfile=re.compile(HTMLFILE)

Here the "r" indicates for Python to use the "raw" string, and not to excape
the backslashes (at least it used to be this way - I'm not quite sure about
2.2).


> This still is not working for me
>
> ###  I want: http://www.the.net/bigfolder/ ###
> import re
> url = "http://www.the.net/bigfolder/somepage.html"
> htmlfile = re.compile("/\\w*\\.html")
> m = htmlfile.match(url)
> if m:
>    folder_url = htmlfile.sub(url, "/")
>
>
> I'm also trying different variations to try and get a match.  None of
these
> are working either:
> htmlfile = re.compile("/.*$")   (this one should really be working yes?)
> htmlfile = re.compile("[a-z]*$")
> htmlfile = re.compile("\w*$")
>
> the only match I can make is this (which will match anything):
> htmlfile = re.compile(".*$")
>
I suggest you do

print url
matches=htmlfile.findall(url)
print matches

or

from pprint import pprint
pprint(matches)

You can best work this out in regular python, then copy the working code
into your Zope script.

This will show you exactly what the match found.  Regular expressions are
notoriously hard to get working right (not Python's fault, that's just how
they are), don't feel bad. You need to get more systematic about debugging -
check every step of the way to make sure you understand what is going on,
and read the docs for the re library.

Cheers,

Tom P