[ZODB-Dev] Major refactoring of the ZEO ClientStorage Blob Cache

Christian Theune ct at gocept.com
Wed Dec 3 01:50:21 EST 2008


Hi,

On Tue, 2008-12-02 at 12:03 -0500, Jim Fulton wrote:
> ZEO has two modes for dealing with client blob data, shared, and non- 
> shared.  In shared mode, a distributed file system is used to share a  
> blob directory with a ZEO server.  This requires management of a  
> distributed file system, in addition to the ZEO protocol.  Any caching  
> is provided by the distributed file system.
> 
> In non-shared mode, blob data are downloaded to the ZEO client using  
> the ZEO protocol.  No distributed file-system is needed and blob files  
> are cached locally. Unfortunately, the current implementation provides  
> no facilities for managing the client cache. There are no provisions  
> in the ZEO client software for removing unused blob files and the blob  
> implementation makes almost no provision for blob file removal.
> 
> I'm working on refactoring ClientStorage's handling of non-shared blob  
> data.  I'm implementing a mechanism for periodically cleaning out  
> files that haven't been accessed in a while. As part of this, I'm  
> going to radically change the layout of the ClientStorage's non-shared  
> blob directory.
> 
> Currently, the bushy layout, with deeply nested directories is used.  
> While I think this layout makes some sense on the server, I don't  
> think it makes much sense on the client.  Cleaning up unused blob  
> files is complicated by the need to clean up directories too.  I'm  
> going to go for a fairly flat layout.  There will be a small number  
> (997) of directories and blob files will reside directly in these  
> directories.  (The directory will be chosen by taking the remainder of  
> dividing an oid by 997.)

Any specific reason for this specific number?

> It appears that modern operating systems can  
> handle large directories just fine.  I've created directories with 1  
> million files on Linux/Ext, Mac OS X/HFS+, and Windows XP/NTFS and saw  
> no degredation in performance as the number of files in a directory  
> increased.

FTR: The reason for introducing the bushy layout is due to restrictions
on the number of directory entries a directory can contain which seem to
be a different restriction than the number of file entries a directory
can contain. At least on ext3 I can't create more than 65k directories
in a directory while I still can create a lot more files in the same
directory. Wikipedia has a generally good overview and comparison
between file systems but doesn't cover the maximum number of directory
entries per directory.

> I plan to have ClientStorage use the file layout mentioned above.  The  
> ClientStorage constructor will fail if an older layout is found. An  
> alternative is to just log a warning and ignore the existing  
> directories, as the new directories will have non-overlapping names.
> 
> I mention this both as a heads up and to see if anyone can point out a  
> problem with my approach.  I have a feeling that no one is using non- 
> shared client blob directories for anything important yet, so I assume  
> the change won't have much effect.

I am. I'd prefer if you'd fail on the directory structure instead of
mixing it with the new approach.

Christian

-- 
Christian Theune · ct at gocept.com
gocept gmbh & co. kg · forsterstraße 29 · 06112 halle (saale) · germany
http://gocept.com · tel +49 345 1229889 7 · fax +49 345 1229889 1
Zope and Plone consulting and development
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : http://mail.zope.org/pipermail/zodb-dev/attachments/20081203/0b8f61a8/attachment.bin 


More information about the ZODB-Dev mailing list