[Zope-CMF] reindexing optimizations

Alec Mitchell apm13 at columbia.edu
Fri Nov 18 12:17:18 EST 2005


Howdy CMFers,

So, Sidnei has been plugging away at the "AT reindexes things an obscene 
number of times" issue today, and appears to have fixed many of the AT 
triggered indexing redundancies.  There are however still a few places in 
CMF where some cataloging redundancy might be avoided.  One obvious place is 
during object creation, where the following happens:

*) TypesTool.constructInstance() is triggered
    **) A _setObject call results in CMFCatalogAware.manage_afterAdd() which 
triggers a full indexObject().
    *) This is shortly followed by TypesTool._finishConstruction()
        *) Which calls CMFCatalogAware.notifyWorkflowCreated()
            *) Which in turn calls WorkFlowTool._reindexWorkflowVariables()
                **) Which does a CMFCatalogAware.reindexObject([idxs]) on 
workflow specific variables (with a full metadata update)
                *) And calls CMFCatalogAware.reindexObjectSecurity() which 
reindexes the object only on the security index, and doesn't touch metadata.
        **) TypesTool._finishConstruction() then does another 
CMFCatalogAware.reindexObject().

So we have two full reindexes, and three metadata updates.  The last reindex 
appears to be there only to catch the change to 'portal_type' in 
_finishConstruction.  So, this final reindexObject, might safely be changed 
to reindexObject(['portal_type', 'Type']), though the possibility exists 
that other indexed attributes added by 3rd parties may depend on the value 
of portal_type (say, I use an autogenerated Title which includes the Type).  
Additionally, almost immediately before this last reindexObject call, 
another reindexObject call has happened in notifyWorkflowCreated, which 
included a full catalog metadata update.  As a result, updating the catalog 
metadata here is certainly redundant.  Unfortunately, the 
CMFCatalogAware.reindexObject method provides no means of avoiding the 
duplicate metadata update, though it would be trivial to add and to use 
here.

Another option suggested by Sidnei on IRC, which would avoid the potential 
issues with limiting the variables indexed in the final reindex.  Would be 
to let CMFCatalogAware.manage_afterAdd know (presumably via some state 
variable) that it is being invoked through constructInstance/invokeFactory, 
in which case it could safely skip the initial indexing and allow 
_finishConstruction to take care of indexing the object fully on it's own at 
the end.  In the long term we will probably be better served by delaying all 
indexing to transaction boundaries, though it will be a fair bit harder to 
implement, and may irk some developers who depend on immediate changes to 
the catalog on reindex.

Alec


More information about the Zope-CMF mailing list