[Zope] htmllib and my little remote webcache project
Ed Colmar
ed@sage.greengraphics.net
Wed, 20 Feb 2002 07:59:57 -0800 (PST)
Hey all!
So, I've started on my little remote webcache project. Thanks for all the
suggestions and ideas. Mostly I think it'll be a fun adventure, not
because it has any real practical value.
I'm messing around with htmllib, to parse out the page and (hopefully)
rewrite the URLs to what they should be.
I'm having trouble with it though. I havent fully grasped how it works.
If someone could take a quick look at my code and see if I'm doing
anything obviously wrong, that would be great!
Essentially it's parsing ok, but I haven't figured out how to replace
values, or how to get the data back out when it's finished.
===========================
class ImageParser(htmllib.HTMLParser):
def __init__(self, verbose=0):
self.images = []
# Make some kind of storage class to hold the new values
f = formatter.NullFormatter()
# Not sure what this is for
htmllib.HTMLParser.__init__(self, f, verbose)
# Initialize the parser
def handle_image(self, src, alt, *args):
"""This is called everytime we get an Image throught the parse
"""
print "[got image] %s" % src
# Yay
newimageid=createId()
# Get a unique id for this image
self.parsedfiletemp = self.parsedfiletemp + newimagetag
# move the new image data into the new file
def handle_data(self, data):
"""This is the data between the parsed cells
"""
print "got data %s" % data
# YAY!!
self.parsedfiletemp = str(self.parsedfiletemp) + str(data)
# move the data into the new file
def flush(self):
""" The end of the parse? """
print "[got Flush] "
# YAY!!
# save the data or something.
+++++++++++++++++++++++
Mostly this does what it should but it gives:
Error Type: IOError
Error Value: [Errno 5] Input/output error
File /usr/local/dc/zope/Extensions/localpagecache.py, line 72, in
handle_data
Is there another way that I should be keeping track of the data after it
goes through the parser?
THANKS!
-ed-