Re: [Zope] regex question

30 Nov 1999


      Sam Gendler wrote:
...
I have never been much of a regex master, and I am having difficulty
constructing one that should be fairly simple.  I want to find all the
text that is between <body> and </body> in a variable that may (and
probably will) contain newlines.  I am removing case insensitive
searches in the following examples in order to make the regex's simpler.
To grab find the opening <body> tag, I have '<[\t ]*body[\t ]*.*>'
which is almost correct, but not quite.  This expression finds
<bodystuff>, too, so I really need something that finds '<\t ]*body(\t
]+.*>)|(>)', but I can't find a construct that works.  Basically, it
needs at least one whitespace followed by stuff followed by '>', or else
it needs no whitespace followed by '>'
I can use similar code to find the </body> tag.
However, putting those two together around a \(\(.*\n*\)*\), in order to
match all the text between the <body> and </body> tags sends python into
an infinite loop.  It doesn't like it when I try to match an unlimited
number of lines.before the </body> tag
OK, I solved this one.  I can now determine the difference between
<bodykjhsd> and <body kjhsd>
I gave up on doing it correctly.  I am now compiling two different regex's,
one that finds the <body> tag, and one that finds the </body> tag.  I use
object.regs[index] to then splice the string into the correct substring.
UGLY.

--sam