[Zope] Natural XML Parsing

Wed, 14 Mar 2001 10:49:33 +0100

On Tue, Mar 13, 2001 at 06:46:42PM -0500, Jeff Griffith wrote:
> Hey All,
> 
> While messing around with XML I became really frustrated at how unnatural it
> seemed to extract data out of documents.  I ended up writing my own python
> class to do exactly what felt natural.  The problem is the entire time I
> kept telling myself "this code has to already exist somewhere"  I just could
> never find it.
> 
> So if anyone is interested take a look at how this class works and tell me
> if
> 1) this seems useful
> 2) this is already supported by someone else
> 
> A brief example of how it is used
> --------------
> 
> < collection >
>   < comic issue="1">
>      < author > Stan Lee < /author >
>   < /comic >
> < /collection >
> 
> can be accessed as
> 
> collection.comic.author.TEXT
> collection.comic.ATTR('issue')
> ---------------
> a more complete write-up w/source is available at
> http://www.people.hbs.edu/jgriffith/simplexmlobject.html

Have you looked at ParsedXML?

  http://www.zope.org/Wikis/DevSite/Projects/ParsedXML/Releases

It is a fully DOM level 2 compliant XML object. It'll let you access your
XML via DOM calls. They may seem less convenient than your example, but
will cover the cases where you have more than one 'comic' element.

Your example could translate to:

  xmlDoc.documentElement.childNodes[0].childNodes[0].childNodes[0]
  xmlDoc.documentElement.childNodes[0].getAttribute('issue')

or by using the tagname the first example would become:

  comic = xmlDoc.documentElement.getElementsByTagName('comic')[0]
  authorName = comic.getElementsByTagName('author')[0].getChildNodes(0)

In future versions, XML id support is planned; if you have a DTD that
tells the parser what attributes are Id's, you can traverse the
tree using Id attributes for the same effect. Then you can use a document
like:

  <collection>
    <comic issue="1" id="first_issue">
       <author>Stan Lee</author>
    </comic>
  </collection>

and code like:

  comic = xmlDoc.getElementById('first_issue')
  authorName = comic.getElementsByTagName('author')[0].getChildNodes(0)

Hope this helps!

-- 
Martijn Pieters
| Software Engineer  mailto:mj@digicool.com
| Digital Creations  http://www.digicool.com/
| Creators of Zope   http://www.zope.org/
---------------------------------------------