Thousands of Objects - how do they manifest? (newbie)
I'm thinking of developing an auction system using Zope and I'm going to be dealing with a lot of objects (if these things are represented that way rather than as records in a database). But, I'm wondering about a few things: 1) Are all objects in Zope persistent? ZClass instances? Python objects? 2) Are all object instances visible as entries in a Zope UI folder? Is it possible to have persistent objects without them being visible? Or does it not matter that the UI will attempt to list them all (millions or more)? 3) If I am expecting a very large number of objects would I always be better off using an external database? or are there circumstances where ZODB is quite able to look after them? 4) If I need to be assured of being able to backup/restore the database and/or rebuild the site in the event of catastrophe does an external database provide any greater facility/reliability for achieving this? Or is ZEO the thing to use? Thanks for answers. :)
On Monday 06 January 2003 10:04 am, Crosbie Fitch wrote:
I'm thinking of developing an auction system using Zope and I'm going to be dealing with a lot of objects (if these things are represented that way rather than as records in a database).
But, I'm wondering about a few things:
1) Are all objects in Zope persistent? ZClass instances? Python objects?
No, you can create regular python objects (and even ZClass instances) that exist only in memory. Just don't mix in the persistent base class, or don't mount it into another persistent object, like a folder.
2) Are all object instances visible as entries in a Zope UI folder?
No, you can create hidden objects all over the place if you want.
Is it possible to have persistent objects without them being visible? Or does it not matter that the UI will attempt to list them all (millions or more)?
Regular folders are not designed for this, but there is a product call BTreeFolder2 which can contain huge numbers of objects efficiently. You can also create your own data structures using BTrees to contain arbitrarily large collections of objects efficiently.
3) If I am expecting a very large number of objects would I always be better off using an external database?
Possibly. It really depends on the nature of the data, the amount of write concurrency, and your query requirements among other things. If the data maps well to records and you expect many users to write to the database at the same time and you need very flexible query/reporting then an RDBMS may be better. If OTOH, your objects are blobish or highly variable, its a read-heavy application (most web apps are) and you have pretty straighforward query requirements, then ZODB storage may be better. That said, the current FileStorage (the default used with Zope) has a memory resident index which grows linearly with the number of objects in your database. This will consume a lot of memory if you have millions of objects (some thing like 12-20 bytes per object). I am actually currently working on a solution to this though... You could also use a different type of storage such as DirectoryStorage or BerkeleyStorage which do not have memory resident object indexes. or are there circumstances where ZODB is
quite able to look after them?
4) If I need to be assured of being able to backup/restore the database and/or rebuild the site in the event of catastrophe does an external database provide any greater facility/reliability for achieving this? Or is ZEO the thing to use?
ZEO is not replication/redundancy of data. It is for scalability/availablity by allowing multiple app servers share the same storage. There is a commercial product called ZRS (Zope Replication Service) which can be used to replicate ZODB data in real time. This can be used as an automatic backup as well AFAIK. Filestorage stores everything in a single file, which can be challenging to backup when it gets really large. Again, other storages (like DirectoryStorage and BerkeleyStorage) can make this easier. As for restoration, FileStorage is pretty resilient to data loss due to the fact that it only appends transactions to the file. So you generally only loose the last few transactions if the server fails catastrophically. There are several python utilities that come with Zope/ZODB that can be used to check the file consistency, recover transaction data, etc. hth, -Casey
On Mon, Jan 06, 2003 at 12:48:22PM -0500, Casey Duncan wrote:
As for restoration, FileStorage is pretty resilient to data loss due to the fact that it only appends transactions to the file.
The last time this subject came up, there was a quibble about this that made me perhaps overly cautious about assuming the above is true. I've dug it out of the archives and read it more closely now. Summary: NORMALLY zope & FileStorage only append to the file, but it is conceivable that a third-party Zope product might alter this. I don't know which, if any, Products actually do this. Here's the message from the zope@zope.org archive: Toby Dickenson wrote:
On Monday 28 October 2002 1:41 am, Jens Vagelpohl wrote:
FileStorage only appends at the end of the file.
Not entirely true. FileStorage still supports a non-transactional undo mechanism that writes bytes to the middle of files. This mechanism is not normally used by Zope, however it might be used by other non-Zope ZODB applications, or custom products.
(could it be exploited by an attacker who wanted to break your backups? hmmmm)
Proviing that this type of live backup is safe requires knowledge about how the backup program will read the file. The obvious approach of reading from start to end is compatible with FileStorages append-only approach, but not all backup programs operate that way. I prefer to take a copy of the data.fs using 'cp' (which I know to be safe), and backup that.
-- Paul Winkler http://www.slinkp.com Look! Up in the sky! It's THE CROTCH! (courtesy of isometric.spaceninja.com)
On Monday 06 January 2003 5:48 pm, Casey Duncan wrote:
That said, the current FileStorage (the default used with Zope) has a memory resident index which grows linearly with the number of objects in your database. This will consume a lot of memory if you have millions of objects (some thing like 12-20 bytes per object). I am actually currently working on a solution to this though...
Interesting. Anything public?
4) If I need to be assured of being able to backup/restore the database and/or rebuild the site in the event of catastrophe does an external database provide any greater facility/reliability for achieving this? Or is ZEO the thing to use?
ZEO is not replication/redundancy of data. It is for scalability/availablity by allowing multiple app servers share the same storage.
There is a commercial product called ZRS (Zope Replication Service) which can be used to replicate ZODB data in real time. This can be used as an automatic backup as well AFAIK.
You may also be interested in an alpha-quality replication tool for DirectoryStorage 1.1: http://dirstorage.sourceforge.net/replica.html -- Toby Dickenson http://www.geminidataloggers.com/people/tdickenson
On Wednesday 08 January 2003 10:32 am, Toby Dickenson wrote:
On Monday 06 January 2003 5:48 pm, Casey Duncan wrote:
That said, the current FileStorage (the default used with Zope) has a memory resident index which grows linearly with the number of objects in your database. This will consume a lot of memory if you have millions of objects (some thing like 12-20 bytes per object). I am actually currently working on a solution to this though...
Interesting. Anything public?
Not yet... it will be initially an alternate implementation of FileStorage that has an index that is not memory resident. The format of Data.fs would be unchanged, but the index file is completely different. I am planning to write an initial implementation soon to gauge the memory/performance tradeoff and see if my idea is worthwhile in general ;^) -Casey
On Mon, Jan 06, 2003 at 03:04:39PM -0000, Crosbie Fitch wrote:
1) Are all objects in Zope persistent? ZClass instances? Python objects?
No, usually, and not necessarily. Objects are persistent if their classes inherit (directly or indirectly) from Persistent, and if the object is added to another object which is an ObjectManager, e.g. a Folder instance. Otherwise they are not persistent. In order to meet these requirements, the object should be constructed either as a ZClass or as a python Product. YOu can define and instantiate classes in an External Method, but making them persistent would be Wrong. The reason I said ZClass instances are "usually" persistent is that you might add one to e.g. a TemporaryFolder whose sub-objects are never written to disk. But in most cases, you add instances to a Folder and they are persistent.
2) Are all object instances visible as entries in a Zope UI folder?
Yes.
Is it possible to have persistent objects without them being visible?
not with a standard Folder.
Or does it not matter that the UI will attempt to list them all (millions or more)?
It would be very hard to use. :) Have a look at Shane's BTreeFolder2, which is designed to solve this kind of problem. http://hathaway.freezope.org/Software/BTreeFolder2
3) If I am expecting a very large number of objects would I always be better off using an external database? or are there circumstances where ZODB is quite able to look after them?
don't know, sorry. i've never pushed a zope folder beyond 100 or so items. My biggest ZODB at the moment contains 150727 items for a total of 2.4 GB, according to the control panel.
4) If I need to be assured of being able to backup/restore the database and/or rebuild the site in the event of catastrophe does an external database provide any greater facility/reliability for achieving this?
depends on the database I guess. You can back up zope just by copying the ZODB, but that gets interesting when it's very large. I've heard on this list that copying a live ZODB might be problematic if zope writes to it while you're backing it up. I don't know if it leads to simply missing some updates in the copy, or worse corruption. So I'm stuck with some downtime while making the backup copy, which is not insignificant with 2.4 GB of data. One strategy I'm looking at to mitigate this would be: 1) restart zope in read-only mode. 2) make a copy of the zodb. 3) restart zope in normal read-write mode. So I'd have only two very brief outages instead of one long one.
Or is ZEO the thing to use?
ZEO does not address backup issues at all. It just allows you to run multiple Zope servers from one ZODB. -- Paul Winkler http://www.slinkp.com Look! Up in the sky! It's PERVERT EPSILON! (courtesy of isometric.spaceninja.com)
On Mon, Jan 06, 2003 at 09:56:31AM -0800, Paul Winkler wrote:
You can back up zope just by copying the ZODB, but that gets interesting when it's very large. I've heard on this list that copying a live ZODB might be problematic if zope writes to it while you're backing it up.
I should have been clear: this is referring to the single-file FileStorage that zope uses by default, where everything goes in the Data.fs file. I plan to use DirectoryStorage when it reaches a stable 1.0 release. -- Paul Winkler http://www.slinkp.com Look! Up in the sky! It's THE ABSORBABLE KID! (courtesy of isometric.spaceninja.com)
On Monday 06 January 2003 3:04 pm, Crosbie Fitch wrote:
But, I'm wondering about a few things:
1) Are all objects in Zope persistent? ZClass instances? Python objects?
No. For example a ZCatalog has one UI 'presence' but is made up of thousands of persistent objects.
3) If I am expecting a very large number of objects would I always be better off using an external database? or are there circumstances where ZODB is quite able to look after them?
If your stuff looks like a table then use a relational db. If your stuff looks like objects then ZODB will be ok up to several gigabytes.
4) If I need to be assured of being able to backup/restore the database and/or rebuild the site in the event of catastrophe does an external database provide any greater facility/reliability for achieving this?
There are homebrew options for FileStorage. BerkelyStorages and DirectoryStorage have standard tools.
Or is ZEO the thing to use?
Yes, but it doesnt help with backup. -- Toby Dickenson http://www.geminidataloggers.com/people/tdickenson
On Monday 06 January 2003 6:41 pm, Toby Dickenson wrote: Urgh, my quoting was bad. Here is the answer paired up with the right question:
2) Are all object instances visible as entries in a Zope UI folder? Is it possible to have persistent objects without them being visible?
No. For example a ZCatalog has one UI 'presence' but is made up of thousands of persistent objects.
-- Toby Dickenson http://www.geminidataloggers.com/people/tdickenson
From: Toby Dickenson On Monday 06 January 2003 3:04 pm, Crosbie Fitch wrote:
But, I'm wondering about a few things: 1) Are all objects in Zope persistent? ZClass instances? Python objects?
No. For example a ZCatalog has one UI 'presence' but is made up of thousands of persistent objects.
Ah, so a ZClass can create instances of 'private' classes that have mixed in the persistence functionality - and these persistent objects won't manifest as entries in Zope folders?
If your stuff looks like a table then use a relational db. If your stuff looks like objects then ZODB will be ok up to several gigabytes.
Well, I can make it look like either very easily. Although, naturally, I don't expect immediate success, I don't want to rue the day I picked a non-scalable choice. If as far as future scalability is concerned, organising my object relationships is better done within a relational database, than directly as persistent objects, then that's what I'll do. The thing is I've heard so much good stuff about Zope, that I don't want to use an external database if the built in system, ZODB is far better suited. As far as I could surmise, external databases may only have been required for incorporation of legacy systems, but I'm beginning to suspect that an external database may be a valid choice for a new system, given scalability requirements. I had wondered if perhaps ZEO was what you added when ZODB ran into its ceiling, but it seems that ZEO is for load balancing rather than scaling capacity.
On Monday 06 January 2003 7:04 pm, Crosbie Fitch wrote:
For example a ZCatalog has one UI 'presence' but is made up of thousands of persistent objects.
Ah, so a ZClass can create instances of 'private' classes that have mixed in the persistence functionality - and these persistent objects won't manifest as entries in Zope folders?
Correct, although I suspect a Python product would be more manageable than a ZClass for this type of work. ZClass == developed though a browser, and ZCatalog is definitely not a ZClass. -- Toby Dickenson http://www.geminidataloggers.com/people/tdickenson
participants (4)
-
Casey Duncan -
Crosbie Fitch -
Paul Winkler -
Toby Dickenson