Re: [Zope] Major problems with slow catalog querie
The catalog search times mentioned below are the time the query is with the catalog called from a python script like... context.do_nothing_script() results = context.portal_catalog( search_dict ) context.do_nothing_script() The 'do_nothing_script' does just that, but it shows me exactly the time the catalog takes using the CallProfiler Product. The 'search_dict' is a dictionary of the search queries for the catalog indexes as I found this improved the catalog speed a little, rather than passing the whole REQUESt dict. (This is being run on a P111 900Mhz with win2000, Zope 2.5.1, Python 2.1.3)
If you can isolate a case where searching the catalog alone (with no other operations being performed) takes minutes I would like a chance to analyse it closer.
I am digging around my code at the moment trying to find problems I have created myself :-). I have discovered a few errors and this has helped with the speed. I have have two search forms: 1. Basic search which searches on: 'SearchableText' (textindex), 'Subject' (keywordindex), 'Type' (keywordindex), 'sort_on', and 'sort_order' I have managed to improve the catalog search to around 0.5 to 1 second. The 'sort_on' parameter seems to chew up most of this time. Is there any way to improve the 'sort_on' speed? 2. Advanced search which can search on one or more of the following; 'SearchableText' (textindex), 'Subject' (keywordindex), 'Type' (keywordindex), 'type' (keywordindex), 'modified' (fieldindex {passing 'query':'blah' and 'range':'min' in a dict} ), 'item_size' (fieldindex), 'value' (fieldindex {passing 'query':['q_min','q_max'] and 'range':'minmax' in dict} ), 'Creator' (fieldindex), 'state' (fieldindex), 'country' (fieldindex), 'sort_on', and 'sort_order'. I have noticed that the 'modified' parameter causes a huge hit on the catalogs performance. Is this normal? Are there things I can do to improve it? Passing the following to the portal_catalog causes the catalog to take around 35-45 seconds on average (These queries return 400 results from the 10,500 items cataloged)... {'modified': {'range': 'min', 'query': DateTime('1995/01/01')}, 'Type': ['Classified'], 'sort_on': 'modified', 'review_state': 'published', 'sort_order': 'reverse'} Taking out the 'modified' key and passing the following reduces the catalog search time to approx 0.5 to 1 second on average... {'Type': ['Classified'], 'sort_on': 'modified', 'review_state': 'published', 'sort_order': 'reverse'} The catalog returns results within 0.5 to 1 second with any combination of the indexes searched on as listed for the advanced search above. But as soon as the modified query is put into the search_dict it all goes out the window. I had experimented with creating an index which cataloged the modified date as a floating point number. Searching on the floating point number (modified date) made no difference to the search speed. As I wrote this I thought whether cataloging the date as an interger would be any different. Searching the modified date as an interger has reduced the search time from average of 35-45 seconds down to 1.5 - 5 seconds. Still not as fast as I would like, but a whole lot better. Does this suggest I have a problem with zope or the cmf? I have run out of ideas to improve it further.
Some questions for you first: How many objects are in your catalog, 11,000?
At this stage there is around 10,500 objects. These are stored in 8 'Portal Folder' folders in a generic member account. One of these folders holds approx 5600 of these items, could this be causing a problem for the catalog? Future content will be created by individual members in the cmf. The generic account won't have more content added to it. Content will more than likely be slowly deleted or moved from this generic account to another member account for someone to take over. The objects are instances of Products which inherit from either Skinned_Folder or Link from the CMFDefault. And all objects inherit DefaultDublinCoreImpl from CMFDefault.
How many indexes are you searching simultaneously? What kinds of indexes are these? Do the searches involve globbing? (* and ? wildcards for text searches).
The Vocabulary in use is the default 'ZopeSplitter' with globbing enabled. I haven't got as far as testing search speeds with wildcards as yet.
Globbing searches can be achingly slow using TextIndex/Vocabulary (The vocabulary is at fault). ZCTextIndex seems to perform much better in this regard.
Any help you can give or where I can find examples/information on performance of similar zope/cmf sites as a comparison of what I should be able to squeeze out of zope would be a great help. Thanks for your time, it is very much appreciated. Richard __________________________________________________ Do You Yahoo!? Everything you'll ever need on one web page from News and Sport to Email and Music Charts http://uk.my.yahoo.com
=?iso-8859-1?Q?Lea_Smith?= writes:
The catalog search times mentioned below are the time the query is with the catalog called from a python script like...
context.do_nothing_script() results = context.portal_catalog( search_dict ) context.do_nothing_script()
The 'do_nothing_script' does just that, but it shows me exactly the time the catalog takes using the CallProfiler Product. How it does this? Looks quite strange for me.
... Passing the following to the portal_catalog causes the catalog to take around 35-45 seconds on average (These queries return 400 results from the 10,500 items cataloged)...
{'modified': {'range': 'min', 'query': DateTime('1995/01/01')}, 'Type': ['Classified'], 'sort_on': 'modified', 'review_state': 'published', 'sort_order': 'reverse'}
Taking out the 'modified' key and passing the following reduces the catalog search time to approx 0.5 to 1 second on average...
{'Type': ['Classified'], 'sort_on': 'modified', 'review_state': 'published', 'sort_order': 'reverse'} I expect, you do not have documents modified before 1995/01/01?
This would mean, the "modified" subquery will return all documents. However, it will construct the resulting set piecemeal, with each value for "modified" adding a single or a few documents only (the chance that they are modified at the same second it not too large). You must expect a quadratic runtime behaviour (in the number of documents) in this case.
... As I wrote this I thought whether cataloging the date as an interger would be any different. Searching the modified date as an interger has reduced the search time from average of 35-45 seconds down to 1.5 - 5 seconds. Still not as fast as I would like, but a whole lot better. Surprising!
Did I not already suggest, you use Zope's profile support ("Contol_Panel --> Debug information --> Profile Information"). Do it, you will clearly see what causes the large time... Dieter
participants (2)
-
Dieter Maurer -
Lea Smith