To all- Two questions: 1. Zope/ZEO Scaling. I spent several hours over the past 3 days reading about Zope scalability with ZEO. I am now looking for numbers. Two questions (nearly one and the same) which I did not find answered: a. How well does ZEO scale in DATA SIZE (not availability). i.e. How many items can I have in a folder? How many users/objects can it handle? b. Numbers. There were no specific configuration numbers which I found. How many objects? How many users? How much data? I'm looking to be able to handle at least 40,000 user objects and a similar number of other larger data objects of various sizes (2k - 20k+). Users could be pushed off to an ldap server or some other environment via the LoginManager Mod etc, however I would still have 40,000 user folders in a single folder. Much of my application is of course generate once, serve a million times, but almost all objects can expect regular changes. Currently I have implemented the bulk of my objects in an SQL database which I am serving through a Zope front end. A pure Zope object setup however offers some conveniences over my setup in terms of possibly simpler access controls and, if ZEO lives up to it's claim, simpler scalability. I would be interested in hearing users comments towards ZEOs ability to scale Zope and any specific numbers from various Zope/ZEO setups. Numbers from single Zope Installs are also very welcome. 2. Groups. I have also written my own Authentication system built off of an SQL database to accommodate for groups, and group based access controls. I have read the documentation on Zope3's plan for Groups of principles, however seeing as Zope3 has not even released a build yet I believe I will need to roll my own in this area. I have done some google searching but not found any other Zope modules to handle group based authentication. If anyone knows of such a module I would be very interested in knowing more. Thank you all for your time. Sincerely, Eric Seidel
Eric Seidel writes:
1. Zope/ZEO Scaling. I spent several hours over the past 3 days reading about Zope scalability with ZEO. I am now looking for numbers. Two questions (nearly one and the same) which I did not find answered: This is not a question about ZEO but about the used "Storage".
Toby Dickenson (name maybe misspelled) has recently given some estimations for the standard storage, "FileStorage". Search the archives, please...
.... How many items can I have in a folder? Normal folders store their content in a tuple. If you access the folder, the complete tuple is fetched into memory. You do not want this for large numbers of items. Use a BTreeFolder in this case. As the name tells, it uses a tree structure to store the content. Access is far more fine grained than with standard Folders.
b. Numbers. There were no specific configuration numbers which I found. How many objects? How many users? How much data? Search for Toby's message...
I'm looking to be able to handle at least 40,000 user objects and a similar number of other larger data objects of various sizes (2k - 20k+). Users could be pushed off to an ldap server or some other environment via the LoginManager Mod etc, however I would still have 40,000 user folders in a single folder. With a BTreeFolder, I would not be worried with this number. ...
Dieter
(cc Shane too) On Wednesday 10 Jul 2002 7:01 pm, Dieter Maurer wrote:
How many items can I have in a folder?
Normal folders store their content in a tuple. If you access the folder, the complete tuple is fetched into memory.
Hmm, not quite. it stores the sub-objects in the folder's __dict__. It does have a tuple which stores the objects "id"s (so it can know which attributes are ObjectManager-managed, and which are not) and a cached copy of the meta-type.
You do not want this for large numbers of items. Use a BTreeFolder in this case. As the name tells, it uses a tree structure to store the content. Access is far more fine grained than with standard Folders.
BTreeFolder was definitely a huge advantage before Zope 2.6, because the old ZODB cache did not cope well with the fact that the __dict__ loaded all 40,000 sub-objects into memory as ghosts. Zope 2.6 has a different cache manager that does not panic when it is given huge numbers of ghosts. As a rough guess each ghost adds 100 bytes, so BTreeFolder is saving you 4M of ram (per worker thread). Not bad, but maybe not enough to justify installing a seperate product. BTreeFolder does give you are more scalable user interface as standard, but with that many you still might want to think about replacing it with something customised to your data.
I'm looking to be able to handle at least 40,000 user objects and a similar number of other larger data objects of various sizes (2k - 20k+). Users could be pushed off to an ldap server or some other environment via the LoginManager Mod etc, however I would still have 40,000 user folders in a single folder.
With a BTreeFolder, I would not be worried with this number.
BTreeFolder has a problem that it doesnt store *all* of its data in the BTree. It still has the tuple caching id and meta-type, thanks to it ObjectManager base class. In this case it is a 40,000 element tuple. That would be enough to get me worried. (ps; I worry easily)
Toby Dickenson wrote:
BTreeFolder has a problem that it doesnt store *all* of its data in the BTree. It still has the tuple caching id and meta-type, thanks to it ObjectManager base class. In this case it is a 40,000 element tuple. That would be enough to get me worried.
(ps; I worry easily)
I've been updating BTreeFolder lately. The latest code is called BTreeFolder2, though the reason I changed the name is now gone and I may decide to rename it back to BTreeFolder. It's available at cvs.zope.org under /Products. The newest code stores *all* subobject data in BTrees; no more giant tuple. That giant tuple turned out to be a bit of a problem for very large folders, since every time you add or remove an item, a new multi-megabyte pickle is generated, transferred, and appended to a file. That problem is gone now. The newest code also features unit tests, a CMF-friendly version, conflict prevention, and a unique ID generation utility. BTreeFolder is useful where you need something like a big dictionary that rarely gets exposed to the user, but you want it to remain discoverable and easily manipulated in emergencies. It doesn't always take the place of something customized, but it's definitely more scalable than ever. Shane
Shane Hathaway wrote:
The newest code stores *all* subobject data in BTrees; no more giant tuple. That giant tuple turned out to be a bit of a problem for very large folders, since every time you add or remove an item, a new multi-megabyte pickle is generated, transferred, and appended to a file. That problem is gone now. The newest code also features unit tests, a CMF-friendly version, conflict prevention, and a unique ID generation utility.
Would you consider the code suitable for production use yet? cheers, Chris
Chris Withers wrote:
Shane Hathaway wrote:
The newest code stores *all* subobject data in BTrees; no more giant tuple. That giant tuple turned out to be a bit of a problem for very large folders, since every time you add or remove an item, a new multi-megabyte pickle is generated, transferred, and appended to a file. That problem is gone now. The newest code also features unit tests, a CMF-friendly version, conflict prevention, and a unique ID generation utility.
Would you consider the code suitable for production use yet?
Yes, a version of it is in production. I still expect to change it, though. ;-) Shane
The newest code stores *all* subobject data in BTrees; no more giant tuple. That giant tuple turned out to be a bit of a problem for very large folders, since every time you add or remove an item, a new multi-megabyte pickle is generated, transferred, and appended to a file. That problem is gone now. The newest code also features unit tests, a CMF-friendly version, conflict prevention, and a unique ID generation utility.
Would you consider the code suitable for production use yet?
Yes, a version of it is in production. I still expect to change it, though. ;-)
Whats the reason for not using this instead of the current folder code, as default? Size? /Magnus
Magnus Heino wrote:
Whats the reason for not using this instead of the current folder code, as default? Size?
Are you talking about making all folders hold their items in BTrees? That's kind of hard for Zope 2 because of all the backward compatibility issues, but I think it's on the plan for Zope 3. Shane
On Monday 15 Jul 2002 2:40 pm, Shane Hathaway wrote:
Magnus Heino wrote:
Whats the reason for not using this instead of the current folder code, as default? Size?
Are you talking about making all folders hold their items in BTrees? That's kind of hard for Zope 2 because of all the backward compatibility issues, but I think it's on the plan for Zope 3.
I dont think we would want to use BTrees for *every* folder; they have a significant overhead compared to ordinary Folders which makes them unattractive for folders that hold a small number of subobjects. I guess the break-even point is around where the BTree doesnt all fit into one bucket, which is several hundred items if I remember correctly. It would be nice to have Folders that automagically upgraded themselves to BTreeFolders once some size threshold was exceeded.
Toby Dickenson wrote:
On Monday 15 Jul 2002 2:40 pm, Shane Hathaway wrote:
Magnus Heino wrote:
Whats the reason for not using this instead of the current folder code, as default? Size?
Are you talking about making all folders hold their items in BTrees? That's kind of hard for Zope 2 because of all the backward compatibility issues, but I think it's on the plan for Zope 3.
I dont think we would want to use BTrees for *every* folder; they have a significant overhead compared to ordinary Folders which makes them unattractive for folders that hold a small number of subobjects. I guess the break-even point is around where the BTree doesnt all fit into one bucket, which is several hundred items if I remember correctly.
It would be nice to have Folders that automagically upgraded themselves to BTreeFolders once some size threshold was exceeded.
Agreed. We'll need to do measurements at some point to find out whether the overhead really is significant, and where the transition point should be (since we don't want to make end users think about such things. :-) ) Shane
participants (6)
-
Chris Withers -
Dieter Maurer -
Eric Seidel -
Magnus Heino -
Shane Hathaway -
Toby Dickenson