Graham Chiu wrote:
I wish to allow users to enter comments into my database which are then viewable thru the browser.
Is there a Zope function that I can pass their text thru to remove all HTML?
Nope. There may be some standard python library module that I don't know about, however. Otherwise, you will have to write your own.
-Michel
Try htmllib. The following bit of python will strip all formatting from some HTML. It replaces all anchors with footnote style references and images with their alt text. If you want something a bit fancier you could add methods to the MyParser class to pass through particular tags (see the commented out methods as an example). It shouldn't be too hard to wrap something like this up in an external method (as presented it is a complete runnable program that retrieves a URL and displays the text). --------- File strip.py -------------- # Strip all HTML formatting. import sys,formatter,StringIO,htmllib,string from urllib import urlretrieve,urlcleanup class MyParser(htmllib.HTMLParser): def __init__(self): self.bodytext = StringIO.StringIO() writer = formatter.DumbWriter(self.bodytext) htmllib.HTMLParser.__init__(self, formatter.AbstractFormatter(writer)) def gettext(self): return self.bodytext.getvalue() # Uncomment these to pass through bold tags. # def start_b(self, attrs): # self.formatter.add_flowing_data('<b>') # # def end_b(self): # self.formatter.add_flowing_data('</b>') def GetPage(url): try: fn, h = urlretrieve(url) text = open(fn, "r").read() finally: urlcleanup() return text if __name__=='__main__': data = GetPage(sys.argv[1]) p = MyParser() p.feed(data) p.close() text = string.replace(p.gettext(), '\xa0', ' ') print text anchors = p.anchorlist for i in range(len(anchors)): print "[%d]: %s" % (i+1, anchors[i]) --------- end of strip.py -------------- -- Duncan Booth duncan@dales.rmplc.co.uk int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3" "\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure? http://dales.rmplc.co.uk/Duncan