[Zope] regex question

Sam Gendler sgendler@teknolojix.com
Mon, 29 Nov 1999 22:36:13 -0800


I have never been much of a regex master, and I am having difficulty
constructing one that should be fairly simple.  I want to find all the
text that is between <body> and </body> in a variable that may (and
probably will) contain newlines.  I am removing case insensitive
searches in the following examples in order to make the regex's simpler.

To grab find the opening <body> tag, I have '<[\t ]*body[\t ]*.*>'
which is almost correct, but not quite.  This expression finds
<bodystuff>, too, so I really need something that finds '<\t ]*body(\t
]+.*>)|(>)', but I can't find a construct that works.  Basically, it
needs at least one whitespace followed by stuff followed by '>', or else
it needs no whitespace followed by '>'

I can use similar code to find the </body> tag.

However, putting those two together around a \(\(.*\n*\)*\), in order to
match all the text between the <body> and </body> tags sends python into
an infinite loop.  It doesn't like it when I try to match an unlimited
number of lines.before the </body> tag

I could use two regsub.split calls to break the variable into its
respective parts (assuming I can sort out the first problem), but regsub
is written in python, while regex is written in c (or so I am told), so
I would prefer to use regex.

Please help.  Generally, whenever I ask a mailing list for regex help,
it always turns out to be somehing boneheaded that I am missing, so try
not to laugh at me ;-)

--sam