Catalog reindexing: field by field or not?
I notice that when calling manage_reindexIndex with several indexes, it will simply call reindexIndex once for each index. The question is, is there a performace penalty for reindexing several fields separately like this, or will it always be faster to only reindex the relevant indexes? It seems to me that at some point it should be faster to just reindex the whole catalog, since that just traverses through the objects once... (This is Zope 2.6, btw, maybe it's changed in 2.7?) //Lennart
On Fri, 20 Feb 2004 11:50:46 +0100 Lennart Regebro <lennart@regebro.nu> wrote:
I notice that when calling manage_reindexIndex with several indexes, it will simply call reindexIndex once for each index.
The question is, is there a performace penalty for reindexing several fields separately like this, or will it always be faster to only reindex the relevant indexes?
I'm not sure I understand the question. It will definitely be faster to update fewer indexes. Calling it repeatedly with one index is probably not much different than calling it with several at once AFAICT. It looks to me like this could be optimized by only doing one pass through the objects and calling catalog_object() with multiple indexes instead of the current implementation which does one pass over all objects per index.
It seems to me that at some point it should be faster to just reindex the whole catalog, since that just traverses through the objects once...
Perhaps, except that reindexing the whole catalog also updates metadata whereas reindexing indexes individually does not. So it would depend how much metadata you have.
(This is Zope 2.6, btw, maybe it's changed in 2.7?)
This has not changed much from 2.6 to 2.7 that I know of. -Casey
From: "Casey Duncan" <casey@zope.com>
I'm not sure I understand the question.
OK, I'll try to clarify. If you update all of the catalog, you'll go through all the objects once, and reindex them. If you reindex, say, five indexes, you'll go through all objects five times. My question is basically: Which of these are faster? It seems to me that the first one should be. And then the optimization of mentioning which indexes to reindex in fact makes it slower. I just want to make sure that this is the case, and that I haven't misunderstood everything.
It looks to me like this could be optimized by only doing one pass through the objects and calling catalog_object() with multiple indexes instead of the current implementation which does one pass over all objects per index.
Yup. But at the moment I'm talking asa Zope user, and not a Zope developer, so from my current point of view that's not an option. ;-)
Perhaps, except that reindexing the whole catalog also updates metadata whereas reindexing indexes individually does not. So it would depend how much metadata you have.
Ah. Well, that makes is a bit better. But I'll probably still update the whole catalog as soon as more than one index needs to be reindexed. It should make things much faster on a big instance, since what takes time there would be to load all the objects twice...
Lennart Regebro wrote at 2004-2-20 11:50 +0100:
I notice that when calling manage_reindexIndex with several indexes, it will simply call reindexIndex once for each index.
The question is, is there a performace penalty for reindexing several fields separately like this, or will it always be faster to only reindex the relevant indexes?
It is, as the objects are loaded over and over again from ZODB. The implementation of "reindexIndex" is really stupid. In our Zope installation, I changed it to: def manage_reindexIndex(self, ids=None, REQUEST=None, RESPONSE=None, URL1=None): """Reindex indexe(s) from a ZCatalog""" if not ids: return MessageDialog(title='No items specified', message='No items were specified!', action = "./manage_catalogIndexes",) if isinstance(ids, types.StringType): ids = (ids,) # DM: what a stupid implementation -- optimize! # for name in ids: # self.reindexIndex(name, REQUEST) self.reindexIndex(ids, REQUEST) .... def reindexIndex(self, name, REQUEST): # DM: optimize! idxs = isinstance(name, types.StringType) and [name] or name .... -- Dieter
participants (4)
-
Casey Duncan -
Dieter Maurer -
Lennart Regebro -
Lennart Regebro