[Zope-CMF] Re: CatalogTool: validity check extremely inefficient

Andreas Jung lists at zopyx.com
Sat Jun 26 02:38:42 EDT 2004


I am sure you are having a patch :-)

Andreas

--On Freitag, 25. Juni 2004 22:25 Uhr +0200 Dieter Maurer 
<dieter at handshake.de> wrote:

> Today, I have analysed why CatalogTool searches take an incredible
> long time. The "validity" check ("effective <= now <= expires")
> turned out to be the culprit: more precisely: the "effective <= now" part.
>
> The reason: "Unindex.Unindex._apply_index" uses successive
> "union"s to combine the document lists for the individual
> terms in the range. This algorithm has quadratic runtime
> behaviour in the number of terms!
> Instead, it should use "multiunion" which has linear runtime
> in the number of terms!
>
>
> To demonstrate the difference run the following script.
> The "multiunion" variant is orders of magnitude faster
> than the current implementation for portals with several
> ten thousand objects...
>
>
> from time import time
> from BTrees.IIBTree import IITreeSet, union, multiunion
> from random import randint
>
> def u(n):
>   s=time()
>   r = None
>   for i in range(n): r = union(r,l[i])
>   print 'union', n, time() - s
>
> def m(n):
>   sl = l[:n]
>   s=time()
>   multiunion(sl)
>   print 'multiunion', n, time()-s
>
> l = [IITreeSet([randint(0,1<<30)]) for i in range(20000)]
>
># 5.000 terms
> u(5000)
># 0.647865056992
> m(5000)
>
># 10.000 terms
># 0.00777697563171
> u(10000)
># 2.45900988579
> m(10000)
># 0.0156350135803
>
># 20.000 terms
> u(20000)
># 9.70094203949
> m(20000)
># 0.0316338539124
>
>
> --
> Dieter






More information about the Zope-CMF mailing list