how to extract info from html?

17 May 2001

      hello,

i'm trying to automate the creation of RSS files. 
my pages consist of a series of paragraphs with links and commentary (weblog).
i need to extract the url, the title, and description from each paragraph.

how do i do that?
i suspect it's simple but i'm not exactly a programmer.

for example,

<p>
Zope.org: <a href="some url">New Zope Release</a> "Zope 2.4.0a, the first alpha release of 2.4 is out now and features ..."
</p>

<p>
XML.com: <a href="another url">Zope and XML</a> "An interesting article on creating XML with Zope"
</p>

taking that, how do I end up with:

title: New Zope Release
link: some url
description: Zope 2.4.0a, the first alpha release of 2.4 is out now and fetures ...

title: Zope and XML
link: another url
description: An interesting article on creating XML with Zope

I suspect it would be much easier if I were using ZClasses to create each entry. but i'm not. :(

I can strip out the header and footer code so all i'm left with is balanced paragaphs. by balanced i mean each block of text is wrapped in opening and closing tags <p> and </p>. but then i get stuck.

thanks,

luke

Luke Tymowski

tags

participants (1)