On Tue, Aug 05, 2003 at 06:39:11AM -0700, Dylan Reinhardt wrote:
So you can try something like:
-----
import re
style = re.compile('<style.*?>.*?</style>', re.I | re.S) script = re.compile('<script.*?>.*?</script>', re.I | re.S) tags = re.compile('<.*?>', re.S)
return tags.sub('', script.sub('', style.sub('', text)))
hmm... doesn't the tags pattern make the other two redundant? one problem with this approach is that it removes any xml or sgml markup from inside a <pre> block, which may not be what you want. Processing html with regular expressions is notoriously frustrating. I'd look in to using htmllib from the standard library, or fixing Strip-O-Gram to do what you want. -- Paul Winkler http://www.slinkp.com Look! Up in the sky! It's THE UNWORTHY SEEKER! (random hero from isometric.spaceninja.com)