[Zope-CMF] CMFUid isn't generating real uids
Gregoire Weber
gregweb at gmx.ch
Mon Aug 16 10:03:21 EDT 2004
Hi Christian,
even though some of the following points already got mentioned by others,
I'd like as originator of CMFUid to answer them from my point of view:
1) CMFUid uids are ensured to be unique (per instance), really!!!:
By the way: the first uid returned is 1 (one) not zero to avoid
malfunctions if people do (wrongly) do tests like 'if returned_uid:
do some stuff' instead of 'if returned_uid is not None: do seome
stuff'.
2) I looked for a mathematical proof that "md5.md5(data).hexdigest()"
encoding does not generate doubles for the first 2^64 unique ids.
I didn't find such a proof so I felt back to the simple uncode
counter.
I had a look at AT's uuid implementation and felt badly mimicing
this behaviour. I just didn't want to feel guilty in case someone
write on the zope-cmf list that he has two content object having
the same "unique id".
From my experience as realtime system programmer everything even
with negligible probabilty happens once a time. You can bet your
life on that. Sorry, that's life, that's not me!
Due to my lack of mathematical foundations the idea was not to write a
sophisticated unique id generation algorithm (this should be done by
mathematicians!)
3) I'm played with the idea of letting people pre or append a kind
of site identifier to simply make the ids "more" universally unique
(seeing the problem of "more") through the ZMI. By giving people
control on this 'site identifier' we give away the responability
chosing the "right" policy.
Because of lack of time, I couldn't do that. Contributions
desired!
4) Architecture (IMPORTANT):
In the spirit of Zope3 the main goal of the design of CMFUid was to
write an extensible mini framework where everybody can replace parts
of the functionality if needed without the need to replace parts which
already do the job "as expected".
I know this point stands in contrast to AT's uid/reference engines design
where everything is addressed through the bloated Archetypes tool. This
architectural problem was one of the main reasons to write a totaly new
implementation.
So:
'portal_uidgenerator' is one of many possible implementations. Replacing it
with your own should not affect applications using it as long the interface
contract is met.
5) I first thought I would build a 'portal_uidgen' registry where different
uid and uuid implementations could be pluged in. But I didn't have enough
time to do this. For the moment I'm even not sure this is a good idea!
See additional comments below:
At 17:34 14.08.2004 +0200, Christian Heimes wrote:
>Hello everybody!
>
>I had a short look at CMFUid because I hoped we could replace the AT uid generator and lookup tool with a more general tool that's also working with non Archtypes types.
That was one of the main intentions.
>I was disapointed to shocked as I saw how the uids are generated.
Hope you recovered well ;-)
>It's just an incremental counter which is more likely a very simplistic id generator but not a real uid generator.
simple/simplistic: Yes
not a real unique id generator: Sorry, it is!
>It's not likely that other portals are using the same uid, it's a FACT because every portal is starting the with id 0. IMO that's not unique.
CMFUid does not ensure universally unique ids and does not encode uids in some
manner because:
- I didn't feel capable to ensure universality
- Some people like short unique ids like 'article/1876'
>In Archetypes we are also including informations about the machine in the uid to create uuid (universal unique ids). The uuid is created from the local host name, a time stamp, a random value and from abitrary arguments and the output of md5.md5(data).hexdigest() is used as uid. This makes it mathematical nearly impossible to have one uid twice over all cmf based sites on earth.
I read the word 'mathematically nearly' and feel bad (Sorry about that).
>A good uid should follow these rules (from the manual of uuidgen)
>DESCRIPTION
> The uuidgen program creates a new universally unique
> identifier (UUID)
> using the libuuid(3) library. The new UUID can reasonably be
> considered
> unique among all UUIDs created on the local system, and among
> UUIDs created
> on other systems in the past and in the future
From http://www.die.net/doc/linux/man/man3/libuuid.3.html:
The UUIDs generated by this library can be reasonably expected to be
unique within a system, and unique across all systems. They could be
used, for instance, to generate unique HTTP cookies across multiple
web servers without communication between the servers, and without
fear of a name clash.
I'm not feeling much better with 'be reasonably expected' but it's perhaps
a better solution than the archetypes one.
IMHO: The requirements need not be such hard for shop sites as for sites
holding content over years or decades.
It's not so important (for me) that perhaps some chinese computer
users harddisk uuid is the same as the one of my harddsik. So
less sophisticated algoritmns are ok.
Gregoire
More information about the Zope-CMF
mailing list