ID's in one XML Document nt unique?
hello amos, since you are the creator of XML documents, I wanted to ask you whether the node IDs in the XML document tree are not necessarily unique. I have a XML document and I try to create an index. I planned of using the node IDs from the tree as my identifiers. But I noticed that I have an ID twice. Yes, it is at a different level of the tree. Is this a bug? Is there another way to accomplish my task? regards, stephan -- Stephan Richter iXL - Software Designer and Engineer CBU - Physics, Computer Science and Chemistry Student
Stephan Richter wrote:
hello amos,
since you are the creator of XML documents, I wanted to ask you whether the node IDs in the XML document tree are not necessarily unique. I have a XML document and I try to create an index. I planned of using the node IDs from the tree as my identifiers. But I noticed that I have an ID twice. Yes, it is at a different level of the tree.
Is this a bug? Is there another way to accomplish my task?
Odd -- I haven't encountered this yet in my documents. All ids are unique there and I'm depending on it. Also I know from the code that they're supposed to be unique.. What *is* a problem currently (I think) is that you can't copy a branch from one XML document into another with the DOM methods. (the DOM doesn't provide for this) So there you can run into unique ID problems. Regards, Martijn
At 04:51 PM 9/5/99 -0500, Stephan Richter wrote:
since you are the creator of XML documents, I wanted to ask you whether the node IDs in the XML document tree are not necessarily unique. I have a XML document and I try to create an index. I planned of using the node IDs from the tree as my identifiers. But I noticed that I have an ID twice. Yes, it is at a different level of the tree.
Is this a bug? Is there another way to accomplish my task?
This is not a bug. It is possible to have a URL like this e5/e5/e17 The normal way to get a unique identifier for Zope objects including Elements of an XML Document for the purposes of indexing is to use a full URL such as FolderA/FolderB/myXMLDoc/e12/e15/e15 One important thing to note about XML Document elements is that when you re-parse the XML, the parser will create new nodes that probably won't have the same ids as the old ones. This means you'll have to rebuild your indexes when you re-parse your xml. In the future I'll probably add something like <foo zope:id="e15">blah</foo> to help solve this problem. This is a little ways off though. I've got to get XML Document doing namespaces first, plus more thought (and discussion with Jim ;-) needs to go into this first. Hope this helps. -Amos
Amos Latteier wrote:
At 04:51 PM 9/5/99 -0500, Stephan Richter wrote:
since you are the creator of XML documents, I wanted to ask you whether the node IDs in the XML document tree are not necessarily unique. I have a XML document and I try to create an index. I planned of using the node IDs from the tree as my identifiers. But I noticed that I have an ID twice. Yes, it is at a different level of the tree.
Is this a bug? Is there another way to accomplish my task?
This is not a bug. It is possible to have a URL like this
e5/e5/e17
Hm, I'm confused. Doesn't the _make_id() function in Node.py make sure each node in an XML Document has an unique ID? How do duplicate ids arise this way? Also, why allow duplicate ids at all? Regards, Martijn
At 15:32 07/09/99 , Martijn Faassen wrote:
Amos Latteier wrote:
At 04:51 PM 9/5/99 -0500, Stephan Richter wrote:
since you are the creator of XML documents, I wanted to ask you
whether the
node IDs in the XML document tree are not necessarily unique. I have a XML document and I try to create an index. I planned of using the node IDs from the tree as my identifiers. But I noticed that I have an ID twice. Yes, it is at a different level of the tree.
Is this a bug? Is there another way to accomplish my task?
This is not a bug. It is possible to have a URL like this
e5/e5/e17
Hm, I'm confused. Doesn't the _make_id() function in Node.py make sure each node in an XML Document has an unique ID? How do duplicate ids arise this way? Also, why allow duplicate ids at all?
Regards,
Martijn
Not the individual parts strings deimited by slashes are unique, but the URL as a whole is. _make_id() makes sure there are no two elements with the same URL, it does so by making sure there are no two elements with the same id on one level of the URL. e5/e5/e17 addresses one element only, e5/e5/e18 another. So does e6/e5/e17. -- Martijn Pieters, Web Developer | Antraciet http://www.antraciet.nl | Tel: +31-35-7502100 Fax: +31-35-7502111 | mailto:mj@antraciet.nl http://www.antraciet.nl/~mj | PGP: http://wwwkeys.nl.pgp.net:11371/pks/lookup?op=get&search=0xA8A32149 ------------------------------------------
Martijn Pieters wrote: [snip]
Hm, I'm confused. Doesn't the _make_id() function in Node.py make sure each node in an XML Document has an unique ID? How do duplicate ids arise this way? Also, why allow duplicate ids at all?
Not the individual parts strings deimited by slashes are unique, but the URL as a whole is.
I understood that part, but..
_make_id() makes sure there are no two elements with the same URL, it does so by making sure there are no two elements with the same id on one level of the URL.
I must be seriously misreading the code then! As I understand it, it goes up the XML tree from the current node until the parent node isn't an XML Node anymore. It would find the top node of the trees this way. Then it checks the next_id attribute there and uses this for the new id. It also increases the _next_id attribute by 1. Wouldn't this guarantee an unique id each time it's called? Perhaps there's something in the way it's called that makes me misread it.. Or I misread the loop that I think goes up the tree? Regards, Martijn
At 05:01 PM 9/7/99 +0200, Martijn Faassen wrote:
_make_id() makes sure there are no two elements with the same URL, it does so by making sure there are no two elements with the same id on one level of the URL.
I must be seriously misreading the code then! As I understand it, it goes up the XML tree from the current node until the parent node isn't an XML Node anymore. It would find the top node of the trees this way. Then it checks the next_id attribute there and uses this for the new id. It also increases the _next_id attribute by 1. Wouldn't this guarantee an unique id each time it's called? Perhaps there's something in the way it's called that makes me misread it.. Or I misread the loop that I think goes up the tree?
Dear Martijn, I believe that this is in fact what _make_id does. However, I can imagine scenarios in which moving Nodes around, rebuilding nodes, and the like could allow duplicate ids to exist at different levels of the tree. At this point I don't think that XML Document guarantees that all Node ids in a tree are unique. Let me repeat the XML Document is in *alpha* which means that it may change significantly. If you have strong feelings about how this issue, please write a short proposal/justification and send me patches to implement it. I believe that this whole discussion started because someone wanted to identify Nodes in a sure fire way. I still suggest that folks use a full path such as: myDoc/e15/e27/e66 To identify Nodes of an XML Document. Even if XML Document guarantees that all Node ids in the tree are unique, I still think that is the right way to identify Nodes in most cases. Thanks! -Amos
Amos Latteier wrote:
At 05:01 PM 9/7/99 +0200, Martijn Faassen wrote:
_make_id() makes sure there are no two elements with the same URL, it does so by making sure there are no two elements with the same id on one level of the URL.
I must be seriously misreading the code then! As I understand it, it goes up the XML tree from the current node until the parent node isn't an XML Node anymore. It would find the top node of the trees this way. Then it checks the next_id attribute there and uses this for the new id. It also increases the _next_id attribute by 1. Wouldn't this guarantee an unique id each time it's called? Perhaps there's something in the way it's called that makes me misread it.. Or I misread the loop that I think goes up the tree?
I believe that this is in fact what _make_id does.
Ah, I see. I was starting to doubt my understanding of the whole thing, ability to read source, and so on, so I wanted to make sure I knew what was going on. Such a relief! :)
However, I can imagine scenarios in which moving Nodes around, rebuilding nodes, and the like could allow duplicate ids to exist at different levels of the tree. At this point I don't think that XML Document guarantees that all Node ids in a tree are unique.
I see. Let me think this through out loud here. Hm, I can't see how moving nodes around (in a _single_ tree) can by itself duplicate node ids. Reconstructing parts of the tree also seems to always result in new node ids to be generated (as far as I understand it _make_id is always called in the builder, as it uses appendChild). The only case I can think of right now is when the *root* of the tree (that keeps the _next_id) is somehow modified so that _next_id is wiped out. Could you describe another scenario where duplicate ids are generated?
Let me repeat the XML Document is in *alpha* which means that it may change significantly. If you have strong feelings about how this issue, please write a short proposal/justification and send me patches to implement it.
I know it's in alpha, but I'm working on bashing it into beta eventually. :) This note is part of the friendly bashing (um, 'debate') of course. My strong feeling is that each node _should_ have an unique id (in the tree). It makes it easier to selectively render part of the tree differently. For instance, if I have a number of paragraphs, and I want one paragraph to be temporarily 'focused' I can re-render the entire thing with a special variable focused=<id of paragraph>. When the paragraph render method notices 'focused' is set to the current id, it will render the paragraph differently. This kind of thing is nice when you try to implement XML editors in Zope (which I am trying to). I imagine this is solveable with absolute_url(), but it seems to be more difficult. Also you may run into trouble with acquisition this way. I think duplicate ids in the same tree may make acquisition of nodes in general less clean anyway, but I haven't thought this through yet.. I'd like to work on patches, but so far I haven't thought of a scenario yet that generates duplicate ids! There must be some as that's what prompted Stephan's question. Perhaps Stephan has a simple scenario?
I believe that this whole discussion started because someone wanted to identify Nodes in a sure fire way. I still suggest that folks use a full path such as:
myDoc/e15/e27/e66
To identify Nodes of an XML Document. Even if XML Document guarantees that all Node ids in the tree are unique, I still think that is the right way to identify Nodes in most cases.
Why so? It is fairly complicated extracting the right path, at least far more complicated than getting at the id. As I understand it, you need to use absolute_url(), somehow cut off the front part, and compare that, or is there a simpler way? Since nodes have ids anyway, and these ids are currently at least _approximately_ unique, wouldn't it be easier on all of us to make them completely unique? Regards, Martijn
Well, you may have figured this out by the time I find a phone line to send this message but you two seem to be talking about different kinds of IDs. XML documents have a notion of "ID" (a document-unique element identifier). Paul Prescod Martijn Pieters wrote:
At 15:32 07/09/99 , Martijn Faassen wrote:
Amos Latteier wrote:
At 04:51 PM 9/5/99 -0500, Stephan Richter wrote:
since you are the creator of XML documents, I wanted to ask you
whether the
node IDs in the XML document tree are not necessarily unique. I have a XML document and I try to create an index. I planned of using the node IDs from the tree as my identifiers. But I noticed that I have an ID twice. Yes, it is at a different level of the tree.
Is this a bug? Is there another way to accomplish my task?
This is not a bug. It is possible to have a URL like this
e5/e5/e17
Hm, I'm confused. Doesn't the _make_id() function in Node.py make sure each node in an XML Document has an unique ID? How do duplicate ids arise this way? Also, why allow duplicate ids at all?
Regards,
Martijn
Not the individual parts strings deimited by slashes are unique, but the URL as a whole is.
_make_id() makes sure there are no two elements with the same URL, it does so by making sure there are no two elements with the same id on one level of the URL.
e5/e5/e17 addresses one element only, e5/e5/e18 another. So does e6/e5/e17.
-- Martijn Pieters, Web Developer | Antraciet http://www.antraciet.nl | Tel: +31-35-7502100 Fax: +31-35-7502111 | mailto:mj@antraciet.nl http://www.antraciet.nl/~mj | PGP: http://wwwkeys.nl.pgp.net:11371/pks/lookup?op=get&search=0xA8A32149 ------------------------------------------
_______________________________________________ Zope maillist - Zope@zope.org http://www.zope.org/mailman/listinfo/zope
(To receive general Zope announcements, see: http://www.zope.org/mailman/listinfo/zope-announce
For developer-specific issues, zope-dev@zope.org - http://www.zope.org/mailman/listinfo/zope-dev )
Paul Prescod wrote:
Well, you may have figured this out by the time I find a phone line to send this message but you two seem to be talking about different kinds of IDs. XML documents have a notion of "ID" (a document-unique element identifier).
I was talking about the internal Zope ids here, and I think Amos mentioned something about XML ids, though I think he got what I was talking about too. There doesn't seem to be any reason why the XML ID and Zope id shouldn't be the same thing, though, which is another argument to make sure node Zope ids are unique. :) Regards, Martijn
participants (5)
-
Amos Latteier -
Martijn Faassen -
Martijn Pieters -
Paul Prescod -
Stephan Richter