[Zope] how to extract info from html?
Luke Tymowski
luke@seeto.com
Thu, 17 May 2001 13:05:40 -0400
hello,
i'm trying to automate the creation of RSS files.
my pages consist of a series of paragraphs with links and commentary (weblog).
i need to extract the url, the title, and description from each paragraph.
how do i do that?
i suspect it's simple but i'm not exactly a programmer.
for example,
<p>
Zope.org: <a href="some url">New Zope Release</a> "Zope 2.4.0a, the first alpha release of 2.4 is out now and features ..."
</p>
<p>
XML.com: <a href="another url">Zope and XML</a> "An interesting article on creating XML with Zope"
</p>
taking that, how do I end up with:
title: New Zope Release
link: some url
description: Zope 2.4.0a, the first alpha release of 2.4 is out now and fetures ...
title: Zope and XML
link: another url
description: An interesting article on creating XML with Zope
I suspect it would be much easier if I were using ZClasses to create each entry. but i'm not. :(
I can strip out the header and footer code so all i'm left with is balanced paragaphs. by balanced i mean each block of text is wrapped in opening and closing tags <p> and </p>. but then i get stuck.
thanks,
luke