htmllib and my little remote webcache project

20 Feb 2002

      Hey all!

So, I've started on my little remote webcache project.  Thanks for all the 
suggestions and ideas.  Mostly I think it'll be a fun adventure, not 
because it has any real practical value.

I'm messing around with htmllib, to parse out the page and (hopefully) 
rewrite the URLs to what they should be.

I'm having trouble with it though.  I havent fully grasped how it works.  
If someone could take a quick look at my code and see if I'm doing 
anything obviously wrong, that would be great!

Essentially it's parsing ok, but I haven't figured out how to replace 
values, or how to get the data back out when it's finished.  

===========================

class ImageParser(htmllib.HTMLParser): 
    def __init__(self, verbose=0): 
        self.images = []                                               
        # Make some kind of storage class to hold the new values
        f = formatter.NullFormatter()                                  
        # Not sure what this is for
        htmllib.HTMLParser.__init__(self, f, verbose)                  
        # Initialize the parser

    def handle_image(self, src, alt, *args):
        """This is called everytime we get an Image throught the parse
        """ 
        print "[got image] %s" % src                                   
        # Yay

        newimageid=createId()                                          
        # Get a unique id for this image
        self.parsedfiletemp = self.parsedfiletemp + newimagetag                
        # move the new image data into the new file

    def handle_data(self, data):
        """This is the data between the parsed cells
        """
        print "got data %s" % data                                   
        # YAY!!
        self.parsedfiletemp = str(self.parsedfiletemp) + str(data)                       
        # move the data into the new file

    def flush(self):    
        """ The end of the parse? """
        print "[got Flush] "                                           
        # YAY!!
        # save the data or something.

+++++++++++++++++++++++

Mostly this does what it should but it gives:
Error Type: IOError
Error Value: [Errno 5] Input/output error
File /usr/local/dc/zope/Extensions/localpagecache.py, line 72, in 
handle_data

Is there another way that I should be keeping track of the data after it 
goes through the parser?

THANKS!

-ed-

Ed Colmar

tags

participants (1)