Hi, Michael. Sorry for taking so long getting back to you. The programmer solving our problems with the post codes has solved it in a different way than what I would've done (his method is way superior), so we're not ending up adding all addresses as Zope Objects. Therefore, I don't have any benchmark tests available. We are going to transfer some 10GB of data at a later stage though (within a month), and that could result in some tests being done - if so, I'll send you an email. :-)
Erik Enge wrote:
The programmer solving our problems with the post codes has solved it in a different way than what I would've done (his method is way superior), so we're not ending up adding all addresses as Zope Objects.
Oh well. Does anyone else have any setups that store truly massive (50k, 100k, 1M, you know, *lots*) numbers of objects? Preferably stored in a BTree of some sort (ZPatterns Rack, BTree folder, etc.). the objects can be simple ZClasses, or almost anything else. I'm trying to find out of there is a point where you start getting non-linear performance penalties for additional objects (storing, retreiving, or indexing). Meanwhile Erik, what approach *did* your programmer take?
Therefore, I don't have any benchmark tests available. We are going to transfer some 10GB of data at a later stage though (within a month), and that could result in some tests being done - if so, I'll send you an email. :-)
I'll look forward to it. Cheers, Michael Bernstein.
I did try that but gave up when it got very unwieldy. Mind you that was nearly a year ago before btree folders and I knew what the heck I was actually doing in Zope. Cheers. -- Andy McKay. ----- Original Message ----- From: "Michael R. Bernstein" <webmaven@lvcm.com> To: "Erik Enge" <erik@thingamy.net> Cc: <zope-dev@zope.org> Sent: Thursday, April 05, 2001 7:24 AM Subject: Re: [Zope-dev] 27 million objects.
Erik Enge wrote:
The programmer solving our problems with the post codes has solved it in
a
different way than what I would've done (his method is way superior), so we're not ending up adding all addresses as Zope Objects.
Oh well. Does anyone else have any setups that store truly massive (50k, 100k, 1M, you know, *lots*) numbers of objects? Preferably stored in a BTree of some sort (ZPatterns Rack, BTree folder, etc.). the objects can be simple ZClasses, or almost anything else. I'm trying to find out of there is a point where you start getting non-linear performance penalties for additional objects (storing, retreiving, or indexing).
Meanwhile Erik, what approach *did* your programmer take?
Therefore, I don't have any benchmark tests available. We are going to transfer some 10GB of data at a later stage though (within a month), and that could result in some tests being done - if so, I'll send you an email. :-)
I'll look forward to it.
Cheers,
Michael Bernstein.
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
On Thu, 5 Apr 2001, Michael R. Bernstein wrote:
I'm trying to find out of there is a point where you start getting non-linear performance penalties for additional objects (storing, retreiving, or indexing).
I don't know, but I feel that is the case. Actually, I know it is the case, but I don't know what is causing it. I know what isn't helping though; CatalogAwareness. I added 2000 objects with XML-RPC. No other queries were done during that period. For each object about 70 DTML Method/Documents were added. The first couple of hundres went with a pace of 2-3 seconds per object. After that it started to get real slow, and when I reached about 500 I was down to 5 seconds per object. I killed that script, rewrote it to only add 20-25 DTML Methods/Documents and removed the CatalogAwareness and whoosh! Under 1 second for each object and it stayed like that for the entire 2000 objects. The server is a 1GHz thingy with 1GB RAM. It wasn't working too hard - it seemed.
Meanwhile Erik, what approach *did* your programmer take?
Well, the obviously more correct one. :) He just made the files (that I were going to index in a Catalog) stay on the filesystem and wrote some nice regexps to do the searching I though I needed the speed of the Catalog to do (yeah, yeah, I'm a rookie). Thanks Jim! :)
I'll look forward to it.
Ok, and you know what to do if you haven't heard from me and the year is not 2001 any more ;)
I don't know, but I feel that is the case. Actually, I know it is the case, but I don't know what is causing it. I know what isn't helping though; CatalogAwareness. I added 2000 objects with XML-RPC. No other queries were done during that period. For each object about 70 DTML Method/Documents were added. The first couple of hundres went with a pace of 2-3 seconds per object. After that it started to get real slow, and when I reached about 500 I was down to 5 seconds per object. I killed that script, rewrote it to only add 20-25 DTML Methods/Documents and removed the CatalogAwareness and whoosh! Under 1 second for each object and it stayed like that for the entire 2000 objects.
Hear, hear. The cost of the incremental cataloguing is horrific. One of our developers when we started with Zope a year ago wrote a great script that imported a bunch, restarted zope, packed it, restarted it, imported a bunch more to get optimal performance. We dont do it like that any more ;) -- Andy McKay
Zopista wrote:
Hear, hear. The cost of the incremental cataloguing is horrific. One of our developers when we started with Zope a year ago wrote a great script that imported a bunch, restarted zope, packed it, restarted it, imported a bunch more to get optimal performance. We dont do it like that any more ;)
Doesn't the new catalog/btree implementation fix this? cheers, Chris
As I said this was a year ago... but still incremental cataloging is very expensive. -- Andy McKay. ----- Original Message ----- From: "Chris Withers" <chrisw@nipltd.com> To: "Zopista" <zopista@zopezen.org> Cc: "Erik Enge" <erik@thingamy.net>; "Michael R. Bernstein" <webmaven@lvcm.com>; <zope-dev@zope.org> Sent: Monday, April 09, 2001 7:31 AM Subject: Re: [Zope-dev] 27 million objects.
Zopista wrote:
Hear, hear. The cost of the incremental cataloguing is horrific. One of
our
developers when we started with Zope a year ago wrote a great script that imported a bunch, restarted zope, packed it, restarted it, imported a bunch more to get optimal performance. We dont do it like that any more ;)
Doesn't the new catalog/btree implementation fix this?
cheers,
Chris
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Andy McKay wrote:
As I said this was a year ago... but still incremental cataloging is very expensive.
How come? I always thought this was one of Zope's strong points as opposed to, say, Lotus Notes' batch view buildign paradigm... cheers, Chris
Any cataloguing and un-cataloguing of an object is expensive, c'mon you are changing all the indices, vocabulary and so on. You never notice it normally for 1 - 10 things, but run an import script of 10000 and catalog each object as it gets added (rather than all of them at the end) and you'll notice the difference. (This script was cataloguing 250,000 mail messages, one at a time. Big no-no) -- Andy McKay. ----- Original Message ----- From: "Chris Withers" <chrisw@nipltd.com> To: "Andy McKay" <andym@activestate.com> Cc: "Erik Enge" <erik@thingamy.net>; "Michael R. Bernstein" <webmaven@lvcm.com>; <zope-dev@zope.org> Sent: Monday, April 09, 2001 10:52 AM Subject: Re: [Zope-dev] 27 million objects.
Andy McKay wrote:
As I said this was a year ago... but still incremental cataloging is
very
expensive.
How come? I always thought this was one of Zope's strong points as opposed to, say, Lotus Notes' batch view buildign paradigm...
cheers,
Chris
Andy McKay wrote:
Any cataloguing and un-cataloguing of an object is expensive, c'mon you are changing all the indices, vocabulary and so on. You never notice it normally for 1 - 10 things, but run an import script of 10000 and catalog each object as it gets added (rather than all of them at the end) and you'll notice the difference. (This script was cataloguing 250,000 mail messages, one at a time. Big no-no)
Perhaps I expressed myself poorly. What I am watching out for is evidence that adding, indexing, reindexing, or retreiving *a single object* (or a small set of objects), takes longer if there are more objects stored/indexed already. In other words, does the time to store/index/reindex/retreive an object change (for the worse) depending on whether there are 10,000 objects, 100,000 objects or 10,000,000 objects stored/cataloged in the ZODB/ZCatalog? Previously, the fact that searching performance suffered depending on a combination of number of total objects and the size of the result set (irrespective of the batch size, apparently), came to light, and has apparently been fixed. Now searching performance scales with the number of cataloged objects. So, are there any non-linear gotchas waiting for me? Michael Bernstein.
In other words, does the time to store/index/reindex/retreive an object change (for the worse) depending on whether there are 10,000 objects, 100,000 objects or 10,000,000 objects stored/cataloged in the ZODB/ZCatalog?
I don't think it makes much of a difference. At least not a big one.
Previously, the fact that searching performance suffered depending on a combination of number of total objects and the size of the result set (irrespective of the batch size, apparently), came to light, and has apparently been fixed. Now searching performance scales with the number of cataloged objects.
I think searching performance has improved significantly... for instance, searching an indexed term inside the text of copies of the Zope mail list is very fast now. It wasn't actually that bad before, but it's faster now.
So, are there any non-linear gotchas waiting for me?
I don't think so...
Andy McKay wrote:
Any cataloguing and un-cataloguing of an object is expensive, c'mon you
are
changing all the indices, vocabulary and so on. You never notice it normally for 1 - 10 things, but run an import script of 10000 and catalog each object as it gets added (rather than all of them at the end) and you'll notice the difference. (This script was cataloguing 250,000 mail messages, one at a time. Big no-no)
Perhaps I expressed myself poorly.
Yeah I think me and Chris have wandered off the point. Since my experience with a large catalog was a year ago I will shut up until I have something valuable to add :) Cheers. -- Andy McKay.
The improvements to incremental indexing in the recent-release catalog has been more along the lines of reducing the amount of churn resulting in out-of-control ZODB size growth than improved speed. ----- Original Message ----- From: "Andy McKay" <andym@ActiveState.com> To: "Chris Withers" <chrisw@nipltd.com> Cc: "Erik Enge" <erik@thingamy.net>; "Michael R. Bernstein" <webmaven@lvcm.com>; <zope-dev@zope.org> Sent: Monday, April 09, 2001 1:59 PM Subject: Re: [Zope-dev] 27 million objects.
Any cataloguing and un-cataloguing of an object is expensive, c'mon you are changing all the indices, vocabulary and so on. You never notice it normally for 1 - 10 things, but run an import script of 10000 and catalog each object as it gets added (rather than all of them at the end) and you'll notice the difference. (This script was cataloguing 250,000 mail messages, one at a time. Big no-no) -- Andy McKay.
----- Original Message ----- From: "Chris Withers" <chrisw@nipltd.com> To: "Andy McKay" <andym@activestate.com> Cc: "Erik Enge" <erik@thingamy.net>; "Michael R. Bernstein" <webmaven@lvcm.com>; <zope-dev@zope.org> Sent: Monday, April 09, 2001 10:52 AM Subject: Re: [Zope-dev] 27 million objects.
Andy McKay wrote:
As I said this was a year ago... but still incremental cataloging is
very
expensive.
How come? I always thought this was one of Zope's strong points as opposed to, say, Lotus Notes' batch view buildign paradigm...
cheers,
Chris
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Andy McKay wrote:
Any cataloguing and un-cataloguing of an object is expensive, c'mon you are changing all the indices, vocabulary and so on.
Yup...
You never notice it normally for 1 - 10 things, but run an import script of 10000 and catalog each object as it gets added (rather than all of them at the end) and you'll notice the difference. (This script was cataloguing 250,000 mail messages, one at a time. Big no-no)
Hmmm... I see your point now, and I think it's along the lines of "indexing 250,000 objects is slow, it's just when you take the performance hit". Correct? If so, then I don't really understand:
Hear, hear. The cost of the incremental cataloguing is horrific.
How does this cost differ from non-incremenetal cataloguig? Erm... actually... what is non-incremental cataloguing and how do you do it? cheers, Chris
On Thu, 5 Apr 2001, Michael R. Bernstein wrote:
I'm trying to find out of there is a point where you start getting non-linear performance penalties for additional objects (storing, retreiving, or indexing).
I've just finished adding a somewhat small number of objects: 5000. For every 1000th object, the Data.fs seemed to grow to about 900MB; that's when things started going slow, in a non-linear fashion (this is more a hunch than something I payed much attention to). I paused the script (fancy Unix-command: "^Z") for every 1000th object, packed the database (which shrunk to 19.5MB! Hmpf.) and restarted the script (again, fancy Unix-command: "fg"). Then I was back to the same speed as I initially had. Does ZODB have a problem with big Data.fs files? Not that I know. However, I do have a really fast SCSI-subsystem here so that shouldn't be a big problem either. I did some copying around with a couple of gigs, and it seems that my hunch is right: ZODB does not have a problem with big Data.fs files, the hardware does. This could be caused indirectly by ZODB if it does too many operations on the file, but I'm not too conserned about that. Ie. a solution could be to have ZODB play around with the Data.fs at a less frequent pace, or do it in another fashion. However, that's not really solving any problems, unless ZODB is a total maniac with the filesystem. I'm converting to ReiserFS this afternoon, maybe that will improve things a bit. Someone told me that ZEO and bulk-adding could be a thing to look at...
Erik Enge wrote:
On Thu, 5 Apr 2001, Michael R. Bernstein wrote:
I'm trying to find out of there is a point where you start getting non-linear performance penalties for additional objects (storing, retreiving, or indexing).
I've just finished adding a somewhat small number of objects: 5000. For every 1000th object, the Data.fs seemed to grow to about 900MB; that's when things started going slow, in a non-linear fashion (this is more a hunch than something I payed much attention to).
I paused the script (fancy Unix-command: "^Z") for every 1000th object, packed the database (which shrunk to 19.5MB! Hmpf.) and restarted the script (again, fancy Unix-command: "fg"). Then I was back to the same speed as I initially had.
This level of growth doesn't seem like a sane level of growth... what Zope version are you using?
Does ZODB have a problem with big Data.fs files? Not that I know. However, I do have a really fast SCSI-subsystem here so that shouldn't be a big problem either.
I did some copying around with a couple of gigs, and it seems that my hunch is right: ZODB does not have a problem with big Data.fs files, the hardware does.
This could be caused indirectly by ZODB if it does too many operations on the file, but I'm not too conserned about that. Ie. a solution could be to have ZODB play around with the Data.fs at a less frequent pace, or do it in another fashion. However, that's not really solving any problems, unless ZODB is a total maniac with the filesystem.
I'm converting to ReiserFS this afternoon, maybe that will improve things a bit.
Someone told me that ZEO and bulk-adding could be a thing to look at...
Isn't bulk-adding what you're doing now?
On Thu, 26 Apr 2001, Chris McDonough wrote:
This level of growth doesn't seem like a sane level of growth... what Zope version are you using?
Zope 2.3.1b1
Someone told me that ZEO and bulk-adding could be a thing to look at...
Isn't bulk-adding what you're doing now?
It is, but I'm not using ZEO.
participants (6)
-
Andy McKay -
Chris McDonough -
Chris Withers -
Erik Enge -
Michael R. Bernstein -
Zopista