Request for comments: Directory storage
Hello all, You probably saw my yesterday post with the first alpha of ReiserStorage. One of the questions that people tend to ask about it is wheter they can use it without reiserfs. There are two problems with not using reiserfs: 1. ReiserStorage (now renamed to DirectoryStorage) stores each object in a separate file and *all* the files in a single directory. This was done in order to let the filesystem what it was meant to do: store and retrieve files quickly. While reiserfs is *extremely* good at this (it uses a btree to store directory entries), most other filesystems do linear searches when finding a file so performance is very bad when you have many files in a single directory. This problem can be solved by splitting files into multiple directories when not using reiserfs. This would add a little overhead but it is tolerable. 2. Waste of space. Typical block-allocation filesystems like ext2 and FAT will waste alot of space in the usage pattern of DirectoryStorage. ReiserFS packs small files together in the btree, so it solves the problem, but I have no ideea how this could be fixed easyly on the other fs's. Comments ? Suggestions ? PS: a new DirectoryStorage release will be done today, with bugfixes and new features. -Petru
Petru Paler: This is the embodiment of my MutliFileStorage thingy on Jim's ZODB Wiki. I droped it (Never picked it up) when Mountable Storage was announced. I'll create a ReierFS partition some time this week and try it out. Excellent!
Hello all,
You probably saw my yesterday post with the first alpha of ReiserStorage. One of the questions that people tend to ask about it is wheter they can use it without reiserfs. There are two problems with not using reiserfs:
1. ReiserStorage (now renamed to DirectoryStorage) stores each object in a separate file and *all* the files in a single directory. This was done in order to let the filesystem what it was meant to do: store and retrieve files quickly. While reiserfs is *extremely* good at this (it uses a btree to store directory entries), most other filesystems do linear searches when finding a file so performance is very bad when you have many files in a single directory. This problem can be solved by splitting files into multiple directories when not using reiserfs. This would add a little overhead but it is tolerable.
2. Waste of space. Typical block-allocation filesystems like ext2 and FAT will waste alot of space in the usage pattern of DirectoryStorage. ReiserFS packs small files together in the btree, so it solves the problem, but I have no ideea how this could be fixed easyly on the other fs's.
Comments ? Suggestions ?
PS: a new DirectoryStorage release will be done today, with bugfixes and new features.
-Petru
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Jason Spisak CIO HireTechs.com 6151 West Century Boulevard Suite 900 Los Angeles, CA 90045 P. 310.665.3444 F. 310.665.3544 Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats.
This is the embodiment of my MutliFileStorage thingy on Jim's ZODB Wiki. I
Oops, didn't saw the ZODB Wiki until now :)
droped it (Never picked it up) when Mountable Storage was announced. I'll create a ReierFS partition some time this week and try it out. Excellent!
Glad you find it interesting :) One thing I should mention is that you should use the rupasov hash option on the partition. teahash will also work, but rupasov hash is designed for consecutive file names and DirectoryStorage specifically exploits that feature. -Petru
Petru Paler writes:
This is the embodiment of my MutliFileStorage thingy on Jim's ZODB Wiki. I
Oops, didn't saw the ZODB Wiki until now :)
droped it (Never picked it up) when Mountable Storage was announced. I'll create a ReierFS partition some time this week and try it out. Excellent!
Glad you find it interesting :)
Very. I think the line between FileSYstem and Database gets more blurred every day.
One thing I should mention is that you should use the rupasov hash option on the partition. teahash will also work, but rupasov hash is designed for consecutive file names and DirectoryStorage specifically exploits that feature.
Thanks for the heads up. I have to patch my kernel up to 2.2.11 first, then I'll be recompiling, then creating. It may take me a bit. But I sure as heck want to try it. Thanks again. All my best, Jason Spisak CIO HireTechs.com 6151 West Century Boulevard Suite 900 Los Angeles, CA 90045 P. 310.665.3444 F. 310.665.3544 Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats.
Glad you find it interesting :)
Very. I think the line between FileSYstem and Database gets more blurred every day.
That's what Hans Reiser says all the time :)
Thanks for the heads up. I have to patch my kernel up to 2.2.11 first, then I'll be recompiling, then creating. It may take me a bit. But I sure as heck want to try it. Thanks again.
2.2.11 ? You should really use the latest kernel (2.2.15 at this time, 2.2.16 due to be out soon). -Petru
Petru Paler writes:
Glad you find it interesting :)
Very. I think the line between FileSYstem and Database gets more blurred every day.
That's what Hans Reiser says all the time :)
Thanks for the heads up. I have to patch my kernel up to 2.2.11 first, then I'll be recompiling, then creating. It may take me a bit. But I sure as heck want to try it. Thanks again.
2.2.11 ? You should really use the latest kernel (2.2.15 at this time, 2.2.16 due to be out soon).
-Petru
If I am going through the trouble, I guess I should. ;) Jason Spisak CIO HireTechs.com 6151 West Century Boulevard Suite 900 Los Angeles, CA 90045 P. 310.665.3444 F. 310.665.3544 Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats.
From my naive understanding, would this help with the problem ZODB has with regard to folders with many objects? Would a person who is using DirectoryStorage not necessarily be required to partition their objects into an artificially derived hierarchical directory structure?
In other words can it be a possible solution to http://www.zope.org/Wikis/zope-dev/ReallyBigFolders ? Ooo, if so any idea on ETA? Thanks, Jimmie Houchin Petru Paler wrote:
Hello all,
You probably saw my yesterday post with the first alpha of ReiserStorage. One of the questions that people tend to ask about it is wheter they can use it without reiserfs. There are two problems with not using reiserfs:
1. ReiserStorage (now renamed to DirectoryStorage) stores each object in a separate file and *all* the files in a single directory. This was done in order to let the filesystem what it was meant to do: store and retrieve files quickly. While reiserfs is *extremely* good at this (it uses a btree to store directory entries), most other filesystems do linear searches when finding a file so performance is very bad when you have many files in a single directory. This problem can be solved by splitting files into multiple directories when not using reiserfs. This would add a little overhead but it is tolerable.
2. Waste of space. Typical block-allocation filesystems like ext2 and FAT will waste alot of space in the usage pattern of DirectoryStorage. ReiserFS packs small files together in the btree, so it solves the problem, but I have no ideea how this could be fixed easyly on the other fs's.
Comments ? Suggestions ?
PS: a new DirectoryStorage release will be done today, with bugfixes and new features.
-Petru
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
From my naive understanding, would this help with the problem ZODB has with regard to folders with many objects? Would a person who is using DirectoryStorage not necessarily be required to partition their objects into an artificially derived hierarchical directory structure?
No, these are unrelated. DirectoryStorage is a storage for ZODB and doesn't really care about what the application (Zope in this case) stores in it.
In other words can it be a possible solution to http://www.zope.org/Wikis/zope-dev/ReallyBigFolders ?
Ooo, if so any idea on ETA?
I don't know. Michel Pelletier has released an alpha BTreeFolder, but I didn't had time to look at it.. -Petru
Jimmie Houchin wrote:
From my naive understanding, would this help with the problem ZODB has with regard to folders with many objects? Would a person who is using DirectoryStorage not necessarily be required to partition their objects into an artificially derived hierarchical directory structure?
In other words can it be a possible solution to http://www.zope.org/Wikis/zope-dev/ReallyBigFolders ?
No. The problem with ReallyBigFolders is that currently folders store their children in a python dictionary. When one object is accessed from the folder, the entire dictionary is loaded into memory. This problem is independent of the storage. I'm not sure how ReiserStorage works, but storages in general know nothing about the containment relationships of the objects they store. The storage just considers them 'records' with a certain id. The solution to making ReallyBigFolders is very similar to RFS though, BTrees; except in the case of RFS records are stored as files which are stored efficiently as btrees, and in the case of ReallyBigFolders sub-objects are stored as nodes in BTrees (which eventually become records which are stored as files which are stored as nodes in BTrees...) -- -Michel Pelletier http://www.zope.org/Members/michel/MyWiki Visit WikiCentral for the latest Zen: http://www.zope.org/Members/WikiCentral
On Wed, 7 Jun 2000 18:00:31 +0300 (EEST), Petru Paler <ppetru@coltronix.com> wrote:
Hello all,
You probably saw my yesterday post with the first alpha of ReiserStorage.
Woohoo! Id like to try this on NT with NTFS too (which has similar performance characteristics with big directories and small files). Do you think this is worth a try, or does your ReiserStorage use other Unix-specific tricks? Toby Dickenson tdickenson@geminidataloggers.com
Hello all,
You probably saw my yesterday post with the first alpha of ReiserStorage.
Woohoo!
Id like to try this on NT with NTFS too (which has similar performance characteristics with big directories and small files). Do you think this is worth a try, or does your ReiserStorage use other Unix-specific tricks?
The version I sent to the list assumes a '/' directory separator. I'm releasing a fixed version (a snapshot of my working code) in a couple of seconds. I do think it's worth a try, but I doubt NTFS is as good as ReiserFS :) -Petru
On Wed, 7 Jun 2000, Petru Paler wrote:
Comments ? Suggestions ?
PS: a new DirectoryStorage release will be done today, with bugfixes and new features.
I'd love some sort of benchmarking tool for this (and posibly other Storages). I guess the best way would a python script that uses urllib. Something that would algorithmically pump up the DB to > 1GB in size and retrieve the URL's. Any volunteers or am I doing it in my copious spare time (tm)? I've got a nice NetApp here to run some tests on. -- Stuart Bishop Work: zen@cs.rmit.edu.au Senior Systems Alchemist Play: zen@shangri-la.dropbear.id.au Computer Science, RMIT University
I'd love some sort of benchmarking tool for this (and posibly other Storages). I guess the best way would a python script that uses urllib. Something that would algorithmically pump up the DB to > 1GB in size and retrieve the URL's. Any volunteers or am I doing it in my copious spare time (tm)?
It would be great if you could do it, but beware that you will be benchmarking a lot of overhead if you only plan to measure storage performance. Why not use ZODB directly ?
I've got a nice NetApp here to run some tests on.
What filesystem does that use ? -Petru
On Fri, 9 Jun 2000, Petru Paler wrote:
I'd love some sort of benchmarking tool for this (and posibly other Storages). I guess the best way would a python script that uses urllib. Something that would algorithmically pump up the DB to > 1GB in size and retrieve the URL's. Any volunteers or am I doing it in my copious spare time (tm)?
It would be great if you could do it, but beware that you will be benchmarking a lot of overhead if you only plan to measure storage performance. Why not use ZODB directly ?
If I talk HTTP, it measures things fully - Python's interpreter lock will mean a storage system written in python will benchmark better without having to compete with ZServer, and vice versa for storage systems with non-pythonic bits.
I've got a nice NetApp here to run some tests on.
What filesystem does that use ?
No idea :-) Something log based that is very fast and handles huge directories happily. It also appears that another member of this list has an EMC Symmetrix box to test on, which I believe is the next (and highest) level up from a Netapp. I've attached a prerelease alpha of zouch.py for giggles. Not even a command line yet, so you will need to edit some code at the bottom. The current settings generate about 360 directories and about 36000 files, and proceeds to make about 180000 reads. This bloated by test ZODB to just over 200MB and took about 2.6 hours attacking my development Zope server from another host on my LAN. Todo: tidy and vet ugly code command line interface dynamic option (do more intensive DTML stuff - currently just standard_html_header/standard_html_footer) catalog option (since DTML Documents arn't catalog aware, will need to make two calls to make a new document) upload larger documents and some binaries (200MB isn't great for benchmarking when you might have a gig of ram doing caching for you) standard test suite better reporting spinning dohicky so we know it hasn't hung without having to look at log files -- Stuart Bishop Work: zen@cs.rmit.edu.au Senior Systems Alchemist Play: zen@shangri-la.dropbear.id.au Computer Science, RMIT University
It would be great if you could do it, but beware that you will be benchmarking a lot of overhead if you only plan to measure storage performance. Why not use ZODB directly ?
If I talk HTTP, it measures things fully - Python's interpreter lock will mean a storage system written in python will benchmark better without having to compete with ZServer, and vice versa for storage systems with non-pythonic bits.
Yes, you are right.
What filesystem does that use ?
No idea :-) Something log based that is very fast and handles huge directories happily. It also appears that another member of this list has an EMC Symmetrix box to test on, which I believe is the next (and highest) level up from a Netapp.
Mmmm... I heard that Network Appliance hired a couple of the SGI engineers that designed XFS ?
I've attached a prerelease alpha of zouch.py for giggles. Not even a command line yet, so you will need to edit some code at the bottom. The current settings generate about 360 directories and about 36000 files, and proceeds to make about 180000 reads. This bloated by test ZODB to just over 200MB and took about 2.6 hours attacking my development Zope server from another host on my LAN.
Cool :) Thanks for writing this, it will be very useful for benchmarking. -Petru
participants (6)
-
Jason Spisak -
Jimmie Houchin -
Michel Pelletier -
Petru Paler -
Stuart 'Zen' Bishop -
Toby Dickenson