I wish to allow users to enter comments into my database which are then viewable thru the browser. Is there a Zope function that I can pass their text thru to remove all HTML? ------- Regards, Graham Chiu gchiu<at>compkarori.co.nz http://www.compkarori.com/dynamo - The Homebuilt Dynamo http://www.compkarori.com/dbase - The dBase bulletin
Graham Chiu wrote:
I wish to allow users to enter comments into my database which are then viewable thru the browser.
Is there a Zope function that I can pass their text thru to remove all HTML?
Nope. There may be some standard python library module that I don't know about, however. Otherwise, you will have to write your own. -Michel
----- Original Message ----- From: Michel Pelletier <michel@digicool.com>
Graham Chiu wrote:
I wish to allow users to enter comments into my database which are then viewable thru the browser.
Is there a Zope function that I can pass their text thru to remove all HTML?
Nope. There may be some standard python library module that I don't know about, however. Otherwise, you will have to write your own.
On the other hand, if you tell the user up front that HTML tags are not allowed, you can simply html_quote the text. That will prevent any tags from rendering, and prevent normal uses of '<>&' characters from breaking anything. Cheers, Evan @ 4-am & digicool
Graham Chiu wrote:
I wish to allow users to enter comments into my database which are then viewable thru the browser.
Is there a Zope function that I can pass their text thru to remove all HTML?
Nope. There may be some standard python library module that I don't know about, however. Otherwise, you will have to write your own.
-Michel
Try htmllib. The following bit of python will strip all formatting from some HTML. It replaces all anchors with footnote style references and images with their alt text. If you want something a bit fancier you could add methods to the MyParser class to pass through particular tags (see the commented out methods as an example). It shouldn't be too hard to wrap something like this up in an external method (as presented it is a complete runnable program that retrieves a URL and displays the text). --------- File strip.py -------------- # Strip all HTML formatting. import sys,formatter,StringIO,htmllib,string from urllib import urlretrieve,urlcleanup class MyParser(htmllib.HTMLParser): def __init__(self): self.bodytext = StringIO.StringIO() writer = formatter.DumbWriter(self.bodytext) htmllib.HTMLParser.__init__(self, formatter.AbstractFormatter(writer)) def gettext(self): return self.bodytext.getvalue() # Uncomment these to pass through bold tags. # def start_b(self, attrs): # self.formatter.add_flowing_data('<b>') # # def end_b(self): # self.formatter.add_flowing_data('</b>') def GetPage(url): try: fn, h = urlretrieve(url) text = open(fn, "r").read() finally: urlcleanup() return text if __name__=='__main__': data = GetPage(sys.argv[1]) p = MyParser() p.feed(data) p.close() text = string.replace(p.gettext(), '\xa0', ' ') print text anchors = p.anchorlist for i in range(len(anchors)): print "[%d]: %s" % (i+1, anchors[i]) --------- end of strip.py -------------- -- Duncan Booth duncan@dales.rmplc.co.uk int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3" "\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure? http://dales.rmplc.co.uk/Duncan
participants (4)
-
Duncan Booth -
Evan Simpson -
Graham Chiu -
Michel Pelletier