[Zope] HTML parsers and Wget like function
Bakhtiar A Hamid
kedai at kedai.com.my
Thu Jul 1 11:36:39 EDT 2004
On Thu, 01 Jul 2004 20:02:02 +0900, Grant Morganryuuguu wrote
> I am considering Zope/python for a project and would like to get
> some pointers to see if this is a reasonable fit. I need to get a
> URL from the web, parse the HTML ,extract some data from the page,
> rewrite the <a href> tags and display it on the website. I found the
> HTML parser in library
> http://www.python.org/doc/current/lib/markup.html and
> http://www.crummy.com/software/BeautifulSoup/ (which is down now but
> was up a couple of days ago) does anyone have any other suggestions
> for manipulating HTML in Zope/python. For getting the the page from
> a URL is there something like Wget (unix program) in Zope for this -
> I searched around the manual but did not see anything.
>
there's KebasData (http://www.zope.org/Members/kedai/KebasData)
it can scrape pages, parse for what ever, but the regex may be a bit of a
head spinner. so a regex tool would help (kde has one, there's one for bash,
iirc, etc)
rewriting url can be done in the render_method. a bit tricky, since the
original can change anytime
it's not great code, but works for me.
cookies are not there yet. so is using python own socket.timeoutsocket().
kebasdata was written a while back, whne there was no timeout support in
python core; so i used timeoutsocket to ..er.. timeout .. :P
soon, methinks
> Thanks,
> Grant
> _______________________________________________
> Zope maillist - Zope at zope.org
> http://mail.zope.org/mailman/listinfo/zope
> ** No cross posts or HTML encoding! **
> (Related lists -
> http://mail.zope.org/mailman/listinfo/zope-announce
> http://mail.zope.org/mailman/listinfo/zope-dev )
--
NSTP (M) BHD
More information about the Zope
mailing list