[Zope-CMF] cmf zpt_generic/search - sequence_length is very slow
for large collections, len () is much faster.
Henryk Paluch
hpaluch at gitus.cz
Mon Jan 3 02:56:37 EST 2005
Florent Guillaume wrote:
>>items = ctool.searchResults(kw)
>>batch_obj = Batch(items, 25, b_start, orphan=1)
>>length = batch_obj.sequence_length
>>
>>### my guess: len(items)==batch_obj.sequence_length
>>### - but len(items) is much faster.
>>
>>
>
>Well then that's strange, because in the end batch_obj.sequence_length
>does just call lazily len(items). From reading the code I think there
>shouldn't be any speed difference. Are you sure about your measurements?
>
>
Hi!
For me, it matters very much for large collections ( > 5000 objects) -
the results are reproducible for both BDB and standard FileStorage.
Here is my current portal_skins/custom/search:
----------------------------------------------------------------------
from Products.PythonScripts.standard import thousands_commas
from ZTUtils import Batch
from Products.CMFCore.utils import getToolByName
ctool = getToolByName(script, 'portal_catalog')
utool = getToolByName(script, 'portal_url')
portal_url = utool()
epoch = DateTime('1970/01/01 00:00:01 GMT')
options = {}
target = '%s/search' % portal_url
kw = context.REQUEST.form.copy()
for k, v in kw.items():
if k in ('review_state', 'Title', 'Subject', 'Description',
'portal_type',
'listCreators'):
if same_type(v, []):
v = filter(None, v)
if not v:
del kw[k]
elif k in ('created',):
if v['query'] == epoch and v['range'] == 'min':
del kw[k]
else:
# work around problems with DateTime in records
kw[k] = v.copy()
elif k in ('go', 'go.x', 'go.y', 'b_start'):
del kw[k]
t1 = float(DateTime())
container.log_me('initating search for %s %f' % ( str(kw),t1 ) )
items = ctool.searchResults(kw)
container.log_me('search took %f' % (float(DateTime()) - t1) )
batch_obj = Batch(items, 25, b_start, orphan=1)
container.log_me('batch1 %f' % (float(DateTime()) - t1) )
#length = batch_obj.sequence_length
length = len (items)
container.log_me('batch2 %f' % (float(DateTime()) - t1) )
summary = { 'length': length and thousands_commas(length) or '',
'type': (length == 1) and 'item' or 'items',
'match': kw.get('SearchableText') }
container.log_me('batch3 %f' % (float(DateTime()) - t1) )
navigation = context.getBatchNavigation(batch_obj, target, **kw)
container.log_me('batch4 %f' % (float(DateTime()) - t1) )
timeSpent = '%f' % (float(DateTime())-t1)
options['batch'] = { 'summary': summary,
'listItemBrains': batch_obj,
'navigation': navigation
}
options['timeSpent'] = timeSpent
return context.search_results_template(**options)
----------------------------------------------------------------
For my len(items) I get very nice timings:
2005-01-03T08:54:15 ERROR(200) log_me initating search for
{'portal_type': ['Document'], 'SearchableText': ''} 1104738855.015000
------
2005-01-03T08:54:24 ERROR(200) log_me search took 9.605000
------
2005-01-03T08:54:24 ERROR(200) log_me batch1 9.705000
------
2005-01-03T08:54:24 ERROR(200) log_me batch2 9.707000
------
2005-01-03T08:54:24 ERROR(200) log_me batch3 9.710000
------
2005-01-03T08:54:24 ERROR(200) log_me batch4 9.756000
---------------------------------------------------------------------------
Please note, that len(items) took 9.707-9.705=0.002s only. Search itself
took 9.7s first time, but only about 0.1 s next time(s) (cached).
but for
length = batch_obj.sequence_length
one get horrible timings:
2005-01-03T08:59:15 ERROR(200) log_me initating search for
{'SearchableText': ''} 1104739155.636000
------
2005-01-03T08:59:15 ERROR(200) log_me search took 0.103000
------
2005-01-03T08:59:15 ERROR(200) log_me batch1 0.108000
------
2005-01-03T09:01:19 ERROR(200) log_me batch2 123.474000
------
2005-01-03T09:01:19 ERROR(200) log_me batch3 123.477000
------
2005-01-03T09:01:19 ERROR(200) log_me batch4 123.482000
------------------------------
that is 123.474 - 0.108 = 123 s (!) just for batch_obj.sequence_length
! Although I don't understand it - there is very noticable performance
impact - it looks like that sequence_length instantiate objects for
some mysterious reason...
The results were obtained for 92627 Document objects in CMF portal.
--
---(c)--------------------------------------------------
GITUS, s.r.o. Spitalska 2a, 190 00 Praha 9, CZ
Henryk Paluch - analytik/programator
mailto: hpaluch at gitus.cz http://www.gitus.cz
--------------------------------------------------------
More information about the Zope-CMF
mailing list