[Zope] regex question
Sam Gendler
sgendler@teknolojix.com
Mon, 29 Nov 1999 23:32:34 -0800
Sam Gendler wrote:
> I have never been much of a regex master, and I am having difficulty
> constructing one that should be fairly simple. I want to find all the
> text that is between <body> and </body> in a variable that may (and
> probably will) contain newlines. I am removing case insensitive
> searches in the following examples in order to make the regex's simpler.
>
> To grab find the opening <body> tag, I have '<[\t ]*body[\t ]*.*>'
> which is almost correct, but not quite. This expression finds
> <bodystuff>, too, so I really need something that finds '<\t ]*body(\t
> ]+.*>)|(>)', but I can't find a construct that works. Basically, it
> needs at least one whitespace followed by stuff followed by '>', or else
> it needs no whitespace followed by '>'
>
> I can use similar code to find the </body> tag.
>
> However, putting those two together around a \(\(.*\n*\)*\), in order to
> match all the text between the <body> and </body> tags sends python into
> an infinite loop. It doesn't like it when I try to match an unlimited
> number of lines.before the </body> tag
OK, I solved this one. I can now determine the difference between
<bodykjhsd> and <body kjhsd>
I gave up on doing it correctly. I am now compiling two different regex's,
one that finds the <body> tag, and one that finds the </body> tag. I use
object.regs[index] to then splice the string into the correct substring.
UGLY.
--sam