[Zope] Re:Re: eliminating dupes in a list (Tres Seaver)

5 Apr 2000

      sathya <linuxcraft@redspice.com> asked:
...
sathya <linuxcraft@redspice.com> asked:
...
I have a list to pass in as a parameter to dtml-in but before doing that I
would like to eliminate duplicates  form the list.
ie in ['1','2','1'] I want to skip the duplicate 1. is there a zope hack for
this or do I have to use an external method
This requires some Python expression trickery which can't (currently) be done
within DTML (filter and map aren't available to DTML).  You are probably better
off using a PythonMethod for such logic.  For grins, I used the Python
interpreter to bang out the following Python expression:
filter( None, map( lambda i, d={}:
                       ( i, None )[ d.has_key(i) or d.update( {i: 1} ) or 0 ]
                  , foo ) )
This is too convoluted to use in production code (and it strips out 0 values,
too) -- much better a nice, straightforward, "Pythonic" solution, a Python
method 'uniq' taking a single argument, 'items':
d = {}
for item in items:
    if not d.has_key( item ):
       d.update( { item: 1 } )
return d.keys()
Call from DTML:
<dtml-in "uniq( myItems )" sort>
   ...
 </dtml-in>
-- 
=========================================================
Tres Seaver  tseaver@digicool.com   tseaver@palladion.com
You're uniq method is not as fast as it could be. The call to has_key is superfluous and the update call has to creates a dictionary which then gets thrown away.

All you need to do is:
def uniq2(items):
    d = {}
    for item in items:
        d[item]=1
    return d.keys()

This saves creating a dictionary, and having to hash the key twice for every item. It runs about 2-3 times faster

test
# list of 10000 random integers
results show unique keys and time
there is a small advantage to not being the first run for string keys

bash-2.02$ python uniq.py
Integer keys 
uniq1 6214 0.0955042775547
uniq2 6214 0.0308158706009

String keys
14650 words in file
uniq1 3521 0.0922106302846
uniq2 3521 0.036688347093
Second time around is a bit faster
uniq1 3521 0.088270029681
uniq2 3521 0.036336634824

test code
specify a text file to read into list of words 
===========

import time
import whrandom
import string

def uniq1(items):
    d = {}
    for item in items:
        if not d.has_key( item ):
            d.update( { item: 1 } )
    return d.keys()

def uniq2(items):
    d = {}
    for item in items:
        d[item]=1
    return d.keys()

generator = whrandom.whrandom()

# get some text file this is a zope mailing list file of about 90K
listofwords = string.split(open('c:/temp/message.txt').read())

listofitems = []
for item in range(10000):
    listofitems.append(generator.randint(0,9500))

print 'Integer keys'
starttime = time.clock()
starttime = time.clock()

l = uniq1(listofitems)
stoptime = time.clock()
print 'uniq1', len(l), stoptime-starttime

l = uniq2(listofitems)
stoptime = time.clock()
print 'uniq2', len(l), stoptime-starttime

print
print 'String keys'
print len(listofwords), 'words in file'

starttime = time.clock()
l = uniq1(listofwords)
stoptime = time.clock()
print 'uniq1', len(l), stoptime-starttime

starttime = time.clock()
l = uniq2(listofwords)
stoptime = time.clock()
print 'uniq2', len(l), stoptime-starttime

print 'Second time around is a bit faster'
starttime = time.clock()
l = uniq1(listofwords)
stoptime = time.clock()
print 'uniq1', len(l), stoptime-starttime

starttime = time.clock()
l = uniq2(listofwords)
stoptime = time.clock()
print 'uniq2', len(l), stoptime-starttime

[Zope] Re:Re: eliminating dupes in a list (Tres Seaver)

Robert Roy