htmllib and my little remote webcache project
Hey all! So, I've started on my little remote webcache project. Thanks for all the suggestions and ideas. Mostly I think it'll be a fun adventure, not because it has any real practical value. I'm messing around with htmllib, to parse out the page and (hopefully) rewrite the URLs to what they should be. I'm having trouble with it though. I havent fully grasped how it works. If someone could take a quick look at my code and see if I'm doing anything obviously wrong, that would be great! Essentially it's parsing ok, but I haven't figured out how to replace values, or how to get the data back out when it's finished. =========================== class ImageParser(htmllib.HTMLParser): def __init__(self, verbose=0): self.images = [] # Make some kind of storage class to hold the new values f = formatter.NullFormatter() # Not sure what this is for htmllib.HTMLParser.__init__(self, f, verbose) # Initialize the parser def handle_image(self, src, alt, *args): """This is called everytime we get an Image throught the parse """ print "[got image] %s" % src # Yay newimageid=createId() # Get a unique id for this image self.parsedfiletemp = self.parsedfiletemp + newimagetag # move the new image data into the new file def handle_data(self, data): """This is the data between the parsed cells """ print "got data %s" % data # YAY!! self.parsedfiletemp = str(self.parsedfiletemp) + str(data) # move the data into the new file def flush(self): """ The end of the parse? """ print "[got Flush] " # YAY!! # save the data or something. +++++++++++++++++++++++ Mostly this does what it should but it gives: Error Type: IOError Error Value: [Errno 5] Input/output error File /usr/local/dc/zope/Extensions/localpagecache.py, line 72, in handle_data Is there another way that I should be keeping track of the data after it goes through the parser? THANKS! -ed-
participants (1)
-
Ed Colmar