ZCatalog Indexes tab crawls...
Hi, Has anyone noticed that the ZCatalog Indexes tab crawls if you have loads of objects indexed. My guess is that some types of index take way too long to figure out how many objects are indexed. Anyone know which index types those could be? BTW, would anyone object if I removed that object count, since it's not often very useful... cheers, Chris
Chris Withers wrote:
My guess is that some types of index take way too long to figure out how many objects are indexed.
This was confirmed by commenting out: <dtml-var numObjects missing="n/a"> ...in catalogIndexes.dtml
BTW, would anyone object if I removed that object count, since it's not often very useful...
so... would anyone mind? cheers, Chris
so... would anyone mind?
Well, I've often been interested to note the numbers. It gave me a feeling for which indexes are heavily used. Sure, I could figure this out without looking at this page, but the (lack of) speed hasn't bugged me .. -- Jean Jordaan http://www.upfrontsystems.co.za
--On Donnerstag, 17. Juli 2003 12:26 Uhr +0200 Jean Jordaan <jean@upfrontsystems.co.za> wrote:
so... would anyone mind?
Well, I've often been interested to note the numbers. It gave me a feeling for which indexes are heavily used. Sure, I could figure this out without looking at this page, but the (lack of) speed hasn't bugged me .
The problem is caused by calling len() on the indexes btrees. Instead a counter implemented btree.Length should be used in the future. -aj
Chris Withers wrote at 2003-7-17 11:12 +0100:
Has anyone noticed that the ZCatalog Indexes tab crawls if you have loads of objects indexed.
My guess is that some types of index take way too long to figure out how many objects are indexed. Anyone know which index types those could be?
The one that provide the correct number of indexed objects (rather than just the number of indexed terms). Because the same object can be indexed under several terms, determining the number of indexed objects requires to build the union of all the index values. This almost surely has quadratic (worst case) runtime characteristics.
BTW, would anyone object if I removed that object count, since it's not often very useful...
You probably should replace it with the size of the index (i.e. the number of index terms). Formerly, the index overview displayed this information but under a buggy "# objects" title. Someone fixed this for most indexes, they now show the number of objects but at a high price. I suggest to change the title to "# index terms" and revert for the indexes to the old behaviour. Others pointed out, that also the size determination for an index may be expensive. However, it is at most linear in the number (rather than quadratic) and all recently created indexes now use "BTrees.Length" to maintain their size (which gives constant time). Having a feeling how large an index is is valuable information. Dieter
Actually I regard the current behavior as a feature. Using a stopwatch and a slide-rule I can estimate to within 100 objects, how many values are indexed in a catalog by measuring the time it takes to draw the indexes page. Please do not remove this most valued feature! -Casey On Thursday 17 July 2003 04:35 pm, Dieter Maurer wrote:
Chris Withers wrote at 2003-7-17 11:12 +0100:
Has anyone noticed that the ZCatalog Indexes tab crawls if you have loads of objects indexed.
My guess is that some types of index take way too long to figure out how many objects are indexed. Anyone know which index types those could be?
The one that provide the correct number of indexed objects (rather than just the number of indexed terms).
Because the same object can be indexed under several terms, determining the number of indexed objects requires to build the union of all the index values. This almost surely has quadratic (worst case) runtime characteristics.
BTW, would anyone object if I removed that object count, since it's not often very useful...
You probably should replace it with the size of the index (i.e. the number of index terms).
Formerly, the index overview displayed this information but under a buggy "# objects" title. Someone fixed this for most indexes, they now show the number of objects but at a high price.
I suggest to change the title to "# index terms" and revert for the indexes to the old behaviour.
Others pointed out, that also the size determination for an index may be expensive. However, it is at most linear in the number (rather than quadratic) and all recently created indexes now use "BTrees.Length" to maintain their size (which gives constant time).
Having a feeling how large an index is is valuable information.
Dieter
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
--On Donnerstag, 17. Juli 2003 18:22 Uhr -0400 Casey Duncan <casey@zope.com> wrote:
Actually I regard the current behavior as a feature. Using a stopwatch and a slide-rule I can estimate to within 100 objects, how many values are indexed in a catalog by measuring the time it takes to draw the indexes page.
Please do not remove this most valued feature!
I agree but the current implementation sux. Switching to a counter based solution would solve the problem. The only problem I see is to keep the code fully backward compatible. -aj
Andreas Jung wrote I agree but the current implementation sux. Switching to a counter based solution would solve the problem. The only problem I see is to keep the code fully backward compatible.
if there's no counter present: create one, do a count of the docs, initialise the counter display counter
Anthony Baxter wrote:
if there's no counter present: create one, do a count of the docs, initialise the counter
display counter
Sounds good, what needs to happen to make this happen? Since this is a bug fix, can it go on the 2.6 branch? cheers, Chris
--On Freitag, 18. Juli 2003 13:53 Uhr +0100 Chris Withers <chrisw@nipltd.com> wrote:
Anthony Baxter wrote:
if there's no counter present: create one, do a count of the docs, initialise the counter
display counter
Sounds good, what needs to happen to make this happen?
Since this is a bug fix, can it go on the 2.6 branch?
First write the fix and then let's see if it might break something :-) -aj
Anthony Baxter wrote at 2003-7-18 15:14 +1000:
Andreas Jung wrote I agree but the current implementation sux. Switching to a counter based solution would solve the problem. The only problem I see is to keep the code fully backward compatible.
if there's no counter present: create one, do a count of the docs, initialise the counter
We can use the size of the "_unindex". However, is it really worth it? "#objects" suggests that it is the number of objects indexed by this index. Who is interested in this information? Unless one has inhomogeous objects, almost all objects are indexed by every index. Thus, "#objects" is likely to be similar for many indexes. A much more interesting information would be the size of the index measured by the number of index terms. Dieter
On Friday 18 July 2003 01:29 pm, Dieter Maurer wrote:
Anthony Baxter wrote at 2003-7-18 15:14 +1000:
Andreas Jung wrote I agree but the current implementation sux. Switching to a counter
based
solution would solve the problem. The only problem I see is to keep the code fully backward compatible.
if there's no counter present: create one, do a count of the docs, initialise the counter
We can use the size of the "_unindex".
However, is it really worth it?
"#objects" suggests that it is the number of objects indexed by this index. Who is interested in this information?
Unless one has inhomogeous objects, almost all objects are indexed by every index. Thus, "#objects" is likely to be similar for many indexes.
A much more interesting information would be the size of the index measured by the number of index terms.
I agree. and as a plus, its a minor change to the software... -Casey
Dieter Maurer wrote:
"#objects" suggests that it is the number of objects indexed by this index. Who is interested in this information?
Well, it's been useful to be on several occasions when I've seen one index has less objects in than another...
Unless one has inhomogeous objects, almost all objects are indexed by every index. Thus, "#objects" is likely to be similar for many indexes.
Hmmm... I use ZCatalogs a _lot_ for searching over inhomogenous sets of objects. For example, that's it's primary role in the CMF... cheers, Chris
Chris Withers wrote at 2003-7-21 08:22 +0100:
Dieter Maurer wrote:
"#objects" suggests that it is the number of objects indexed by this index. Who is interested in this information?
Well, it's been useful to be on several occasions when I've seen one index has less objects in than another...
Unless one has inhomogeous objects, almost all objects are indexed by every index. Thus, "#objects" is likely to be similar for many indexes.
Hmmm... I use ZCatalogs a _lot_ for searching over inhomogenous sets of objects. For example, that's it's primary role in the CMF...
CMF's catalog is highly standardized, thanks to Dublin Core. All standard CMF content types define DC attributes. Therefore, each CMF content object is indexed under each DC field index. The "Subjects" index may lack some objects (because they do not define any "Subjects"). A text index may lack a few objects (because some objects may have have both an empty "Title" and an empty "Descritpion"). But overall, unless you have special (non DC derived) indexes, all "#objects" should be very similar. Dieter
Dieter Maurer wrote: But overall, unless you have special (non DC derived) indexes, That can well be the case... Anyway, what are we going to do about this crawling tab? Chris
Casey Duncan wrote:
Actually I regard the current behavior as a feature. Using a stopwatch and a slide-rule I can estimate to within 100 objects, how many values are indexed in a catalog by measuring the time it takes to draw the indexes page.
Please do not remove this most valued feature!
I see now winks so am scared ;-) Seriously though, it is kinda problematic when you want to get to the ZMI of an index and have to guess the URL 'cos hitting the indexes page cripples the server... cheers, Chris
Dieter Maurer wrote:
I suggest to change the title to "# index terms" and revert for the indexes to the old behaviour.
If that'll make it quicker, cool :-)
Others pointed out, that also the size determination for an index may be expensive. However, it is at most linear in the number (rather than quadratic) and all recently created indexes now use "BTrees.Length" to maintain their size (which gives constant time).
Having a feeling how large an index is is valuable information.
Indeed... Chris
--On Freitag, 18. Juli 2003 13:52 Uhr +0100 Chris Withers <chrisw@nipltd.com> wrote:
Dieter Maurer wrote:
I suggest to change the title to "# index terms" and revert for the indexes to the old behaviour.
If that'll make it quicker, cool :-)
I am usually not interested in the number of index terms but in the number of documents that are indexed. This is much more meaningful. -aj
participants (6)
-
Andreas Jung -
Anthony Baxter -
Casey Duncan -
Chris Withers -
Dieter Maurer -
Jean Jordaan