Jonathan Hobbs wrote:
From: "Johan Carlsson" <johanc@easypublisher.com>
Why would it be smaller? You still need to load the indexes when you do a search, right? Or do you intend to index different objects in different catalogs? In that case couldn't you use the idxs attribute of ZCatalog::catalog_object(self, obj, uid=None, idxs=None, update_metadata=1)?
Moving only the ZCTextIndex (and its Lexicon) into a separate ZCatalog should result in a smaller ZCatalog, as the space required by the other 4 indexes (3 Field Indexes and another ZCTextIndex) will be located in a different folder - I am going to load the ZCatalog containing the main ZCTextIndex into a Temporary Folder (to hold it in memory).
You could also always create an external (to ZCatalog) Id Generator Service, that can be accessed from both zcatalogs/catalogs to get a unique RID that can be used in both catalogs. Skiping the problem with longs and potentially the problem of supporting a modified version of BTrees.
There's some code for making transition-aware counter that you might want have a look at. I guess it needs some improvements though?
#This is browed from Zope 2.4.3 ZODB.tests.ConflictResolution from Persistence import Persistent #This PCounter doesn't provide a unique ID. #It does increment ones per call (even if several threads collide) #but the value returned will be +2 for both threads. class PCounter(Persistent): _value = 0 def __init__(self, val=None): if val is not None: if type(val)==IntType: self._value=val elif hasattr(val, '_count'): self._value=getattr(val, '_count',0) else: self._value=0 def __repr__(self): return self._value def getUniqueId(self): self._value = self._value + 1 return self._value def _p_resolveConflict(self, oldState, savedState, newState): savedDiff = savedState['_value'] - oldState['_value'] newDiff = newState['_value'] - oldState['_value'] oldState['_value'] = oldState['_value'] + savedDiff + newDiff return oldState
class PCounter2(PCounter): def _p_resolveConflict(self, oldState, savedState, newState): raise ConflictError
Thanks for the 'heads-up'. I had hoped to use OIDs instead of RIDs, but hadn't considered the 64/32 bit problem. I'll have to see if I can find a 64bit BTrees package, and failing that, try to modify the existing package to use long ints - this just keeps getting better and better :)
Cool! I love to hear how this turns out, so please keep me posted :-)
After some more digging around this was the approach I was going to try: 1) Build and populate a standard ZCatalog, then get the RIDs from the catalog for each entry. 2) Modify 'catalog_object' (and the underlying routines) to accept an optional RID parameter (use the passed RID instead of generating one internally). 3) Build the second ZCatalog, passing the RIDs from the first catalog 4) Modify the Lazy class to include a new routine LazyInt, which would be similar to LazyCat, but would do an intersection instead of a join (this would be the tricky bit). 5) Modify ZCatalog's 'searchResults' (and underlying 'search') routines to accept an optional parameter 'resultSet'. resultSet would be a lazy sequence returned from a previous ZCatalog search (the initial ZCatalog search would not pass a 'resultSet' parameter). This optional resultSet, if present, would be LazyInt'd with the result set generated by the current search. In theory (ha!) this should allow us to do two separate search on two separate catalogs then use the existing search machinery (aside from the new LazyInt) to marshall the results and present us with a normal lazy result set. But then we came up with a much MUCH simpler solution... We are going to encode all of the index data from the 4 other index fields and append them to the full-text field. We are then going to eliminate the other 4 indexes and only use the ZCTextIndex. Just before calling searchResults, we will programmatically (and transparently to the user) append the encoded fields we want the search to include. The intermediate result sets (created for each search term/word) are 'joined' by the existing search machinery. This will (in theory, yet again) give us a type of index search within ZCTextIndex. This allows us (hopefully) to maintain the functionality we need, reduce the index size/overhead, and improve search performance without having to hack ZCatalog (yeah!) I'll let you know if it actually works :-) Jonathan