how to extract info from html?
hello, i'm trying to automate the creation of RSS files. my pages consist of a series of paragraphs with links and commentary (weblog). i need to extract the url, the title, and description from each paragraph. how do i do that? i suspect it's simple but i'm not exactly a programmer. for example, <p> Zope.org: <a href="some url">New Zope Release</a> "Zope 2.4.0a, the first alpha release of 2.4 is out now and features ..." </p> <p> XML.com: <a href="another url">Zope and XML</a> "An interesting article on creating XML with Zope" </p> taking that, how do I end up with: title: New Zope Release link: some url description: Zope 2.4.0a, the first alpha release of 2.4 is out now and fetures ... title: Zope and XML link: another url description: An interesting article on creating XML with Zope I suspect it would be much easier if I were using ZClasses to create each entry. but i'm not. :( I can strip out the header and footer code so all i'm left with is balanced paragaphs. by balanced i mean each block of text is wrapped in opening and closing tags <p> and </p>. but then i get stuck. thanks, luke
participants (1)
-
Luke Tymowski