[Zope-CMF] CMFUid isn't generating real uids

Mon Aug 16 10:03:21 EDT 2004

Hi Christian,

even though some of the following points already got mentioned by others, 
I'd like as originator of CMFUid to answer them from my point of view:

   1) CMFUid uids are ensured to be unique (per instance), really!!!: 
      By the way: the first uid returned is 1 (one) not zero to avoid 
      malfunctions if people do (wrongly) do tests like 'if returned_uid: 
      do some stuff' instead of 'if returned_uid is not None: do seome 
      stuff'.

   2) I looked for a mathematical proof that "md5.md5(data).hexdigest()" 
      encoding does not generate doubles for the first 2^64 unique ids.
      I didn't find such a proof so I felt back to the simple uncode 
      counter.

      I had a look at AT's uuid implementation and felt badly mimicing
      this behaviour. I just didn't want to feel guilty in case someone
      write on the zope-cmf list that he has two content object having
      the same "unique id". 

      From my experience as realtime system programmer everything even 
      with negligible probabilty happens once a time. You can bet your 
      life on that. Sorry, that's life, that's not me!

      Due to my lack of mathematical foundations the idea was not to write a 
      sophisticated unique id generation algorithm (this should be done by 
      mathematicians!)

   3) I'm played with the idea of letting people pre or append a kind
      of site identifier to simply make the ids "more" universally unique
      (seeing the problem of "more") through the ZMI. By giving people
      control on this 'site identifier' we give away the responability 
      chosing the "right" policy.

      Because of lack of time, I couldn't do that. Contributions 
      desired!

   4) Architecture (IMPORTANT):

      In the spirit of Zope3 the main goal of the design of CMFUid was to 
      write an extensible mini framework where everybody can replace parts 
      of the functionality if needed without the need to replace parts which
      already do the job "as expected". 

      I know this point stands in contrast to AT's uid/reference engines design 
      where everything is addressed through the bloated Archetypes tool. This 
      architectural problem was one of the main reasons to write a totaly new
      implementation.

      So:

      'portal_uidgenerator' is one of many possible implementations. Replacing it
      with your own should not affect applications using it as long the interface
      contract is met. 

   5) I first thought I would build a 'portal_uidgen' registry where different 
      uid and uuid implementations could be pluged in. But I didn't have enough
      time to do this. For the moment I'm even not sure this is a good idea!

See additional comments below:

At 17:34 14.08.2004 +0200, Christian Heimes wrote:
>Hello everybody!
>
>I had a short look at CMFUid because I hoped we could replace the AT uid generator and lookup tool with a more general tool that's also working with non Archtypes types. 

That was one of the main intentions.

>I was disapointed to shocked as I saw how the uids are generated.

Hope you recovered well ;-) 

>It's just an incremental counter which is more likely a very simplistic id generator but not a real uid generator. 

simple/simplistic: Yes
not a real unique id generator: Sorry, it is!

>It's not likely that other portals are using the same uid, it's a FACT because every portal is starting the with id 0. IMO that's not unique.

CMFUid does not ensure universally unique ids and does not encode uids in some 
manner because:

   - I didn't feel capable to ensure universality

   - Some people like short unique ids like 'article/1876'

>In Archetypes we are also including informations about the machine in the uid to create uuid (universal unique ids). The uuid is created from the local host name, a time stamp, a random value and from abitrary arguments and the output of md5.md5(data).hexdigest() is used as uid. This makes it mathematical nearly impossible to have one uid twice over all cmf based sites on earth.

I read the word 'mathematically nearly' and feel bad (Sorry about that).

>A good uid should follow these rules (from the manual of uuidgen)
>DESCRIPTION
>       The  uuidgen  program  creates  a  new universally unique
>       identifier (UUID)
>       using the libuuid(3) library.  The new UUID can  reasonably  be
>       considered
>       unique among all UUIDs created on the local system, and among
>       UUIDs created
>       on other systems in the past and in the future

 From http://www.die.net/doc/linux/man/man3/libuuid.3.html:

   The UUIDs generated by this library can be reasonably expected to be 
   unique within a system, and unique across all systems. They could be 
   used, for instance, to generate unique HTTP cookies across multiple 
   web servers without communication between the servers, and without 
   fear of a name clash.

I'm not feeling much better with 'be reasonably expected' but it's perhaps 
a better solution than the archetypes one.

IMHO: The requirements need not be such hard for shop sites as for sites
      holding content over years or decades.

      It's not so important (for me) that perhaps some chinese computer 
      users harddisk uuid is the same as the one of my harddsik. So
      less sophisticated algoritmns are ok.

Gregoire