[Zope-CMF] CatalogTool: validity check extremely inefficient

Dieter Maurer dieter at handshake.de
Fri Jun 25 16:25:42 EDT 2004


Today, I have analysed why CatalogTool searches take an incredible
long time. The "validity" check ("effective <= now <= expires")
turned out to be the culprit: more precisely: the "effective <= now" part.

The reason: "Unindex.Unindex._apply_index" uses successive
"union"s to combine the document lists for the individual
terms in the range. This algorithm has quadratic runtime
behaviour in the number of terms!
Instead, it should use "multiunion" which has linear runtime
in the number of terms!


To demonstrate the difference run the following script.
The "multiunion" variant is orders of magnitude faster
than the current implementation for portals with several
ten thousand objects...


from time import time
from BTrees.IIBTree import IITreeSet, union, multiunion
from random import randint

def u(n):
  s=time()
  r = None
  for i in range(n): r = union(r,l[i])
  print 'union', n, time() - s

def m(n):
  sl = l[:n]
  s=time()
  multiunion(sl)
  print 'multiunion', n, time()-s

l = [IITreeSet([randint(0,1<<30)]) for i in range(20000)]

# 5.000 terms
u(5000)
# 0.647865056992
m(5000)

# 10.000 terms
# 0.00777697563171
u(10000)
# 2.45900988579
m(10000)
# 0.0156350135803

# 20.000 terms
u(20000)
# 9.70094203949
m(20000)
# 0.0316338539124


-- 
Dieter


More information about the Zope-CMF mailing list