[Zope] regex question
Sam Gendler
sgendler@teknolojix.com
Mon, 29 Nov 1999 22:36:13 -0800
I have never been much of a regex master, and I am having difficulty
constructing one that should be fairly simple. I want to find all the
text that is between <body> and </body> in a variable that may (and
probably will) contain newlines. I am removing case insensitive
searches in the following examples in order to make the regex's simpler.
To grab find the opening <body> tag, I have '<[\t ]*body[\t ]*.*>'
which is almost correct, but not quite. This expression finds
<bodystuff>, too, so I really need something that finds '<\t ]*body(\t
]+.*>)|(>)', but I can't find a construct that works. Basically, it
needs at least one whitespace followed by stuff followed by '>', or else
it needs no whitespace followed by '>'
I can use similar code to find the </body> tag.
However, putting those two together around a \(\(.*\n*\)*\), in order to
match all the text between the <body> and </body> tags sends python into
an infinite loop. It doesn't like it when I try to match an unlimited
number of lines.before the </body> tag
I could use two regsub.split calls to break the variable into its
respective parts (assuming I can sort out the first problem), but regsub
is written in python, while regex is written in c (or so I am told), so
I would prefer to use regex.
Please help. Generally, whenever I ask a mailing list for regex help,
it always turns out to be somehing boneheaded that I am missing, so try
not to laugh at me ;-)
--sam