[ZODB-Dev] zc.zlibstorage and IMVCCStorage (RelStorage) -- No Compression

jason.madden at nextthought.com jason.madden at nextthought.com
Sat Mar 16 20:47:17 UTC 2013


Hi all,

We recently ran into some surprising behaviour when combining
zc.zlibstorage on top of a RelStorage (i.e., you don't get compressed
records in the SQL database), and I was wondering if anyone else had
noticed the same thing. We came up with (what seems to be) a
workaround for our use cases (so far anyway), and I was curious if it
seems like a good idea to others or if there are reasons to avoid
it.

Background
----------

We noticed this when we had some FileStorage ZODB databases that were
compressed with zc.zlibstorage and we uploaded them to a RelStorage
database (also configured with zc.zlibstorage) using zodbconvert, with
a configuration like so:

	%import zc.zlibstorage
	<zlibstorage source>
	    <filestorage source>
	        path ...data/data.fs
	    </filestorage>
	</zlibstorage>
	<zlibstorage destination>
	    <relstorage destination>
	        ...
	    </relstorage>
	</zlibstorage>

When we started our application against the RelStorage database (using
a similar RelStorage + zc.zlibstorage configuration), we were
surprised to be greeted with "UnpicklingError: bad pickle data." A
quick peek at the pickled data (in the SQL database's object_state
table) showed that it was compressed by zc.zlibstorage: it had the
leading '.z' record marker.

Setting a breakpoint in ZODB.Connection showed us that, despite the
configuration, Connection's `_storage` was not a ZlibStorage after
all, but a plain RelStorage. However, the ZODB.DB instance's `storage`
*was* a ZlibStorage (as expected).

IMVCCStorage
------------

The issue seems to be that when a ZODB.Connection is constructed with
a storage that provides `IMVCCStorage`, it retains and uses, not that
storage object, but the result of calling
`IMVCCStorage.new_instance()`. As a storage wrapper, ZlibStorage
claims to provide all the same interfaces as the underlying storage;
any attributes provided by the underlying storage that ZlibStorage
itself does not provide are delegated to the underlying storage through __getattr__.
ZlibStorage does not define `new_instance`, so the underlying
RelStorage is invoked to create and return a raw RelStorage instance,
with the net result being that the Connection uses the raw RelStorage
and never the wrapping ZlibStorage and so can never read or write
compressed records. (zodbconvert uses the storage objects directly,
without a Connection and new_instance, so that's why it was able to
write compressed records initially.)

Solution
--------

The solution that we're testing now (and which so far seems to be
working---our application can read and write the compressed databases
records) was to patch in a new_instance method to ZlibStorage to make
it wrap the underlying storage again:

	def new_instance(self):
	    new_self = type(self).__new__(type(self))
	    # Preserve _transform, etc
	    new_self.__dict__ = self.__dict__.copy()
	    new_self.base = self.base.new_instance()
	    # Because these are bound methods, we must re-copy 
	    # them or ivars might be wrong, like _transaction
	    for name in self.copied_methods:
	        v = getattr(new_self.base, name, None)
		    if v is not None:
	                setattr(new_self, name, v)
	return new_self

	ZlibStorage.new_instance = new_instance

Does anyone have any comments on this? (My google-fu didn't find this
mentioned before, but my google-fu is sometimes weak.) The only IMVCCStorage 
I've worked with so far has been RelStorage; is this likely to break with
others or are we likely to run into trouble with RelStorage down the line?
Does this seem like something that could make it into the tree, or is it
special-purpose enough that we should continue patching?

Thanks,
Jason


More information about the ZODB-Dev mailing list