[Zope] Re:Re: eliminating dupes in a list (Tres Seaver)
Robert Roy
rjroy@takingcontrol.com
Wed, 05 Apr 2000 19:04:27 -0400
sathya <linuxcraft@redspice.com> asked:
>sathya <linuxcraft@redspice.com> asked:
>
>> I have a list to pass in as a parameter to dtml-in but before doing that I
>> would like to eliminate duplicates form the list.
>> ie in ['1','2','1'] I want to skip the duplicate 1. is there a zope hack for
>> this or do I have to use an external method
>
>This requires some Python expression trickery which can't (currently) be done
>within DTML (filter and map aren't available to DTML). You are probably better
>off using a PythonMethod for such logic. For grins, I used the Python
>interpreter to bang out the following Python expression:
>
> filter( None, map( lambda i, d={}:
> ( i, None )[ d.has_key(i) or d.update( {i: 1} ) or 0 ]
> , foo ) )
>
>This is too convoluted to use in production code (and it strips out 0 values,
>too) -- much better a nice, straightforward, "Pythonic" solution, a Python
>method 'uniq' taking a single argument, 'items':
>
> d = {}
> for item in items:
> if not d.has_key( item ):
> d.update( { item: 1 } )
> return d.keys()
>
>Call from DTML:
>
> <dtml-in "uniq( myItems )" sort>
> ...
> </dtml-in>
>--
>=========================================================
>Tres Seaver tseaver@digicool.com tseaver@palladion.com
>
>
You're uniq method is not as fast as it could be. The call to has_key is superfluous and the update call has to creates a dictionary which then gets thrown away.
All you need to do is:
def uniq2(items):
d = {}
for item in items:
d[item]=1
return d.keys()
This saves creating a dictionary, and having to hash the key twice for every item. It runs about 2-3 times faster
test
# list of 10000 random integers
results show unique keys and time
there is a small advantage to not being the first run for string keys
bash-2.02$ python uniq.py
Integer keys
uniq1 6214 0.0955042775547
uniq2 6214 0.0308158706009
String keys
14650 words in file
uniq1 3521 0.0922106302846
uniq2 3521 0.036688347093
Second time around is a bit faster
uniq1 3521 0.088270029681
uniq2 3521 0.036336634824
test code
specify a text file to read into list of words
===========
import time
import whrandom
import string
def uniq1(items):
d = {}
for item in items:
if not d.has_key( item ):
d.update( { item: 1 } )
return d.keys()
def uniq2(items):
d = {}
for item in items:
d[item]=1
return d.keys()
generator = whrandom.whrandom()
# get some text file this is a zope mailing list file of about 90K
listofwords = string.split(open('c:/temp/message.txt').read())
listofitems = []
for item in range(10000):
listofitems.append(generator.randint(0,9500))
print 'Integer keys'
starttime = time.clock()
starttime = time.clock()
l = uniq1(listofitems)
stoptime = time.clock()
print 'uniq1', len(l), stoptime-starttime
l = uniq2(listofitems)
stoptime = time.clock()
print 'uniq2', len(l), stoptime-starttime
print
print 'String keys'
print len(listofwords), 'words in file'
starttime = time.clock()
l = uniq1(listofwords)
stoptime = time.clock()
print 'uniq1', len(l), stoptime-starttime
starttime = time.clock()
l = uniq2(listofwords)
stoptime = time.clock()
print 'uniq2', len(l), stoptime-starttime
print 'Second time around is a bit faster'
starttime = time.clock()
l = uniq1(listofwords)
stoptime = time.clock()
print 'uniq1', len(l), stoptime-starttime
starttime = time.clock()
l = uniq2(listofwords)
stoptime = time.clock()
print 'uniq2', len(l), stoptime-starttime