Hello, fellow zope developers :) I made a "nadako-sorting" branch of zope.index to add sorting support for zope indexes. I defined a new IIndexSort interface that specifies a "sort" method that takes document ids and returns an iterable of those ids, sorted by indexed value. It supports optional "limit" and "reverse" arguments. It also raises KeyError if you pass a docid that is not indexed by this index. Can someone review the interface definition and make any suggestions/objections? I also made an implementation for the FieldIndex that may not be too optimal, but I'm currently most interested in clean and universal IIndexSort definition that any index could efficiently implement. If there's no objections, I'd like to merge it to trunk and start adding sorting features for zope.app.catalog/z3c.indexer, as well as zc.catalog indexes. Thanks in advance! -- WBR, Dan Korostelev
Dan Korostelev wrote:
Hello, fellow zope developers :)
I made a "nadako-sorting" branch of zope.index to add sorting support for zope indexes. I defined a new IIndexSort interface that specifies a "sort" method that takes document ids and returns an iterable of those ids, sorted by indexed value. It supports optional "limit" and "reverse" arguments. It also raises KeyError if you pass a docid that is not indexed by this index.
Can someone review the interface definition and make any suggestions/objections?
I also made an implementation for the FieldIndex that may not be too optimal, but I'm currently most interested in clean and universal IIndexSort definition that any index could efficiently implement.
If there's no objections, I'd like to merge it to trunk and start adding sorting features for zope.app.catalog/z3c.indexer, as well as zc.catalog indexes.
Thanks in advance!
Hi Dan, I've done this work too... but outside zope.index... please see (for example): http://svn.repoze.org/repoze.catalog/trunk/repoze/catalog/indexes/field.py It's reasonably optimized. - C
2008/12/27 Chris McDonough <chrism@plope.com>:
Dan Korostelev wrote:
I also made an implementation for the FieldIndex that may not be too optimal, but I'm currently most interested in clean and universal IIndexSort definition that any index could efficiently implement.
I've done this work too... but outside zope.index... please see (for example):
http://svn.repoze.org/repoze.catalog/trunk/repoze/catalog/indexes/field.py
It's reasonably optimized.
Thanks for the point. That's something I was going to write to add optimizations for FieldIndex sorting, now I only need to adapt your code and that's great! :-) However, the more important thing for now is the IIndexSort interface declaration. Is it okay and fits any possible sortable index? ;-) -- WBR, Dan Korostelev
Dan Korostelev wrote:
2008/12/27 Chris McDonough <chrism@plope.com>:
Dan Korostelev wrote:
I also made an implementation for the FieldIndex that may not be too optimal, but I'm currently most interested in clean and universal IIndexSort definition that any index could efficiently implement. I've done this work too... but outside zope.index... please see (for example):
http://svn.repoze.org/repoze.catalog/trunk/repoze/catalog/indexes/field.py
It's reasonably optimized.
Thanks for the point. That's something I was going to write to add optimizations for FieldIndex sorting, now I only need to adapt your code and that's great! :-)
There are many tests in there too. Note that the algorithms came mostly from the Zope 2 Catalog code. The only things I'm less than sure about in there is: - the computation of when to use n-best and lazy (there are constants in there stolen from Zope 2). - The fact that when we need to sort in reverse order we can't be lazy. I.e. in the branch that reads: # If the result set is not much larger than the number # of documents in this index, or if we need to sort in # reverse order, use a non-lazy sort. BTree values can't be iterated in reverse order, that's why we don't try the lazy case here if it's a reverse sort. There may be a better way to do this.
However, the more important thing for now is the IIndexSort interface declaration. Is it okay and fits any possible sortable index? ;-)
It looks like we came up with the same interface independently; I think that's a good sign. - C
Chris McDonough wrote:
Dan Korostelev wrote:
2008/12/27 Chris McDonough <chrism@plope.com>:
Dan Korostelev wrote:
I also made an implementation for the FieldIndex that may not be too optimal, but I'm currently most interested in clean and universal IIndexSort definition that any index could efficiently implement. I've done this work too... but outside zope.index... please see (for example):
http://svn.repoze.org/repoze.catalog/trunk/repoze/catalog/indexes/field.py
It's reasonably optimized. Thanks for the point. That's something I was going to write to add optimizations for FieldIndex sorting, now I only need to adapt your code and that's great! :-)
There are many tests in there too. Note that the algorithms came mostly from the Zope 2 Catalog code. The only things I'm less than sure about in there is:
- the computation of when to use n-best and lazy (there are constants in there stolen from Zope 2).
- The fact that when we need to sort in reverse order we can't be lazy. I.e. in the branch that reads:
# If the result set is not much larger than the number # of documents in this index, or if we need to sort in # reverse order, use a non-lazy sort.
BTree values can't be iterated in reverse order, that's why we don't try the lazy case here if it's a reverse sort. There may be a better way to do this.
However, the more important thing for now is the IIndexSort interface declaration. Is it okay and fits any possible sortable index? ;-)
It looks like we came up with the same interface independently; I think that's a good sign.
One difference is that the repoze.catalog implementation skips docids that aren't in the index rather than raising a KeyError. I think I like the KeyError better.
- C
Hi, Chris and other developers. I adapted repoze.catalog field index sorting code and tests for standard zope.index FieldIndex and looks like it works. :) Though I like raising the KeyError when trying to sort docids that are not indexed, I had to remove it from interface declaration, and change it to skipping values as in ZCatalog/repoze.catalog. The problem is that it's not trivial to efficiently implement that checking in some cases. For example in FieldIndex sort method, in the use_lazy branch, set intersections are used and I can't see a way to check if some of docids are not indexed besides storing some additional variable which sucks. :) I also adapted repoze.catalog's KeywordIndex fixes/optimizations to zope.index KeywordIndex and TopicIndex. There is now also a "nadako-sorting" branch of zope.app.catalog. It adds sorting/limiting/reversing features to the "searchResults" method of catalog. It also introduces KeywordIndex for catalog, because I added support for IIndexSearch to zope.index KeywordIndex. :) I request some review from developers and if there won't be any objections, I'll merge this code to trunks of zope.index and zope.app.catalog. Thanks in advance. -- WBR, Dan Korostelev
Dan Korostelev wrote:
Hi, Chris and other developers.
I adapted repoze.catalog field index sorting code and tests for standard zope.index FieldIndex and looks like it works. :)
Though I like raising the KeyError when trying to sort docids that are not indexed, I had to remove it from interface declaration, and change it to skipping values as in ZCatalog/repoze.catalog. The problem is that it's not trivial to efficiently implement that checking in some cases. For example in FieldIndex sort method, in the use_lazy branch, set intersections are used and I can't see a way to check if some of docids are not indexed besides storing some additional variable which sucks. :)
I also adapted repoze.catalog's KeywordIndex fixes/optimizations to zope.index KeywordIndex and TopicIndex.
There is now also a "nadako-sorting" branch of zope.app.catalog. It adds sorting/limiting/reversing features to the "searchResults" method of catalog. It also introduces KeywordIndex for catalog, because I added support for IIndexSearch to zope.index KeywordIndex. :)
I request some review from developers and if there won't be any objections, I'll merge this code to trunks of zope.index and zope.app.catalog.
Looks good! A couple of things: - I'm tempted to ditch the case normalization feature of keyword indexes. This is actually the application's job; there's no guarantee that values that are passed by the app will be strings. I'm not sure that anyone has actually used keyword indexes, as their results could not until now be intersected with the results from any other index so there don't appear to be any backwards compatibility goals to service here (KeywordIndexes appeared to be abandonware until we paid attention to them). - Likewise, I don't think there are are any consumers of topicindexes, for the same reasons, so the "read on write" upgrade in FilteredSetBase should probably go. Users that have pickles around which are still IISets can do a migration. Thanks! - C
2008/12/28 Chris McDonough <chrism@plope.com>:
Dan Korostelev wrote:
- I'm tempted to ditch the case normalization feature of keyword indexes. This is actually the application's job; there's no guarantee that values that are passed by the app will be strings. I'm not sure that anyone has actually used keyword indexes, as their results could not until now be intersected with the results from any other index so there don't appear to be any backwards compatibility goals to service here (KeywordIndexes appeared to be abandonware until we paid attention to them).
I agree that the normalization feature of KeywordIndex can stay in the way sometimes, but I personally find it useful for simple cases. I'd made a base class that doesn't do any normalization and a subclass that does (it's the reverse currently). I doubt as well that there's someone who are using KeywordIndex in its current state, but I'm not sure that we can just refactor things so it will break backward compatibility even in this case. I'd like to hear more community thoughts on that.
- Likewise, I don't think there are are any consumers of topicindexes, for the same reasons, so the "read on write" upgrade in FilteredSetBase should probably go. Users that have pickles around which are still IISets can do a migration.
Here's the same question as above. I really don't like the write on read repickle that I did in FilteredSetBase for migrating to IFSet. We could write a paragraph explaining those changes and how to adapt current data/code for those people who uses current KeywordIndex/TopicIndex, if there are any of them. -- WBR, Dan Korostelev
Dan Korostelev wrote at 2008-12-27 17:10 +0300:
I made a "nadako-sorting" branch of zope.index to add sorting support for zope indexes. I defined a new IIndexSort interface that specifies a "sort" method that takes document ids and returns an iterable of those ids, sorted by indexed value. It supports optional "limit" and "reverse" arguments. It also raises KeyError if you pass a docid that is not indexed by this index.
A better behaviour might be to sort them at the bottom or let the behaviour be controlled by an additional (optional) parameter. Options are: your solution (raise an exception) silently ignore objects not indexed (how the Zope "ZCatalog" works) sort at the bottom. Further potential interest: Often you want nested sorting: sort according to one index; for documents not distinguished by this index, sort on the next level by another index and so on. To implement this, the index must not only sort the documents but also indicate which documents it could not distinguish (by e.g. returning a generator or sequences). -- Dieter
participants (3)
-
Chris McDonough -
Dan Korostelev -
Dieter Maurer