Sam Gendler wrote:
I have never been much of a regex master, and I am having difficulty constructing one that should be fairly simple. I want to find all the text that is between <body> and </body> in a variable that may (and probably will) contain newlines. I am removing case insensitive searches in the following examples in order to make the regex's simpler.
To grab find the opening <body> tag, I have '<[\t ]*body[\t ]*.*>' which is almost correct, but not quite. This expression finds <bodystuff>, too, so I really need something that finds '<\t ]*body(\t ]+.*>)|(>)', but I can't find a construct that works. Basically, it needs at least one whitespace followed by stuff followed by '>', or else it needs no whitespace followed by '>'
I can use similar code to find the </body> tag.
However, putting those two together around a \(\(.*\n*\)*\), in order to match all the text between the <body> and </body> tags sends python into an infinite loop. It doesn't like it when I try to match an unlimited number of lines.before the </body> tag
OK, I solved this one. I can now determine the difference between <bodykjhsd> and <body kjhsd> I gave up on doing it correctly. I am now compiling two different regex's, one that finds the <body> tag, and one that finds the </body> tag. I use object.regs[index] to then splice the string into the correct substring. UGLY. --sam