[Zope-CMF] cmf zpt_generic/search - sequence_length is very slow for large collections, len () is much faster.

Henryk Paluch hpaluch at gitus.cz
Mon Jan 3 02:56:37 EST 2005


Florent Guillaume wrote:

>>items = ctool.searchResults(kw)
>>batch_obj = Batch(items, 25, b_start, orphan=1)
>>length = batch_obj.sequence_length
>>
>>### my guess: len(items)==batch_obj.sequence_length
>>###    - but len(items) is much faster.
>>    
>>
>
>Well then that's strange, because in the end batch_obj.sequence_length
>does just call lazily len(items). From reading the code I think there
>shouldn't be any speed difference. Are you sure about your measurements?
>  
>
Hi!
  For me, it matters very much for large collections ( > 5000 objects) - 
the results are reproducible for both BDB and standard FileStorage.

 Here is my current portal_skins/custom/search:
----------------------------------------------------------------------
from Products.PythonScripts.standard import thousands_commas
from ZTUtils import Batch
from Products.CMFCore.utils import getToolByName

ctool = getToolByName(script, 'portal_catalog')
utool = getToolByName(script, 'portal_url')
portal_url = utool()
epoch = DateTime('1970/01/01 00:00:01 GMT')


options = {}

target = '%s/search' % portal_url
kw = context.REQUEST.form.copy()
for k, v in kw.items():
    if k in ('review_state', 'Title', 'Subject', 'Description', 
'portal_type',
             'listCreators'):
        if same_type(v, []):
            v = filter(None, v)
        if not v:
            del kw[k]
    elif k in ('created',):
        if v['query'] == epoch and v['range'] == 'min':
            del kw[k]
        else:
            # work around problems with DateTime in records
            kw[k] = v.copy()
    elif k in ('go', 'go.x', 'go.y', 'b_start'):
            del kw[k]

t1 = float(DateTime())
container.log_me('initating search for %s %f' % ( str(kw),t1 ) )
items = ctool.searchResults(kw)
container.log_me('search took %f' % (float(DateTime()) - t1) )
batch_obj = Batch(items, 25, b_start, orphan=1)
container.log_me('batch1 %f' % (float(DateTime()) - t1) )
#length = batch_obj.sequence_length
length = len (items)

container.log_me('batch2 %f' % (float(DateTime()) - t1) )
summary = { 'length': length and thousands_commas(length) or '',
            'type': (length == 1) and 'item' or 'items',
            'match': kw.get('SearchableText') }
container.log_me('batch3 %f' % (float(DateTime()) - t1) )
navigation = context.getBatchNavigation(batch_obj, target, **kw)
container.log_me('batch4 %f' % (float(DateTime()) - t1) )

timeSpent = '%f' % (float(DateTime())-t1)
options['batch'] = { 'summary': summary,
                     'listItemBrains': batch_obj,
                     'navigation': navigation
                      }
options['timeSpent'] = timeSpent

return context.search_results_template(**options)
----------------------------------------------------------------
For my len(items) I get very nice timings:
2005-01-03T08:54:15 ERROR(200) log_me initating search for 
{'portal_type': ['Document'], 'SearchableText': ''} 1104738855.015000
------
2005-01-03T08:54:24 ERROR(200) log_me search took 9.605000
------
2005-01-03T08:54:24 ERROR(200) log_me batch1 9.705000
------
2005-01-03T08:54:24 ERROR(200) log_me batch2 9.707000
------
2005-01-03T08:54:24 ERROR(200) log_me batch3 9.710000
------
2005-01-03T08:54:24 ERROR(200) log_me batch4 9.756000
---------------------------------------------------------------------------
Please note, that len(items) took 9.707-9.705=0.002s only. Search itself 
took 9.7s first time, but only about 0.1 s next time(s) (cached).

but for
length = batch_obj.sequence_length
one get horrible timings:

2005-01-03T08:59:15 ERROR(200) log_me initating search for 
{'SearchableText': ''} 1104739155.636000
------
2005-01-03T08:59:15 ERROR(200) log_me search took 0.103000
------
2005-01-03T08:59:15 ERROR(200) log_me batch1 0.108000
------
2005-01-03T09:01:19 ERROR(200) log_me batch2 123.474000
------
2005-01-03T09:01:19 ERROR(200) log_me batch3 123.477000
------
2005-01-03T09:01:19 ERROR(200) log_me batch4 123.482000
------------------------------
that is  123.474 - 0.108 = 123 s (!) just for batch_obj.sequence_length 
! Although I don't understand it - there is very noticable performance 
impact - it looks like that sequence_length  instantiate objects for 
some mysterious reason...

The results were obtained for 92627 Document objects in CMF portal.

-- 
---(c)--------------------------------------------------
GITUS, s.r.o. Spitalska 2a, 190 00 Praha 9, CZ
Henryk Paluch - analytik/programator
mailto: hpaluch at gitus.cz  http://www.gitus.cz
--------------------------------------------------------



More information about the Zope-CMF mailing list