[ZODB-Dev] Using zodb and blobs
Tres Seaver
tseaver at palladion.com
Tue Apr 13 19:58:11 EDT 2010
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Nitro wrote:
> Hello Tres,
>
> thanks for your detailed answers!
>
> Am 12.04.2010, 22:42 Uhr, schrieb Tres Seaver <tseaver at palladion.com>:
>
>>> Additionally I made some quick performance tests. I committed 1kb sized
>>> objects and I can do about 40 transaction/s if one object is changed per
>>> transaction. For 100kb objects it's also around 40 transactions/s. Only
>>> for object sizes bigger than that the raw I/O throughput seems to start
>>> to
>>> matter.
>> 40 tps sounds low: are you pushing blob content over the wire somehow?
>
> No, that test was with a plain file storage. Just a plain Persistent
> object with a differently sized string and an integer attribute. I did
> something like
>
> 1) create object with attribute x (integer) and y (variably sized string)
> 2) for i in range(100): obj.x = i; transaction.commit()
> 3) Measure time taken for step 2
>
>>> Still don't know the answers to these:
>>>
>>> - Does it make sense to use ZODB in this scenario? My data is not suited
>>> well for an RDBMS.
>> YMMV. I still default to using ZODB for anything at all, unless the
>> problem smells very strongly relational.
>
> Ok, the problem at hand certainly doesn't smell relational. It is more
> about storing lots of different data than querying it extensively. It's a
> mixture of digital asset management (the blobs are useful for this part)
> and "projects" which reference the assets. The projects are shared between
> the clients and will consist of a big tree with Persistent objects hooked
> up to it.
I have seen the ZEO storage committing transactions at least an order of
magnitude faster than that (e.g., when processing incoming newswire
feeds). I would guess that there could have been some other latencies
involved in your setup (e.g., that 0-100ms lag you mention below).
>>> - Are there more complications to blobs other than a slightly different
>>> backup procedure?
>> You need to think about how the blob data is shared between ZEO clients
>> (your appserver) and the ZEO storage server: opinions vary here, but I
>> would prefer to have the blobs living in a writable shared filesystem,
>> in order to avoid the necessity of fetching their data over ZEO on the
>> individual clients which were not the one "pushing" the blob into the
>> database.
>
> The zeo server and clients will be in different physical locations, so I'd
> probably have to employ some shared filesystem which can deal with that.
> Speaking of locations of server and clients, is it a problem - as in zeo
> will perform very badly under these circumstances as it was not designed
> for this - if they are not in the same location (typical latency 0-100ms)?
That depends on the mix of reads and writes in your application. I have
personnally witnessed a case where the clients stayed up and serving
pages over a whole weekend in a clusterfsck where both the ZEO server
and the monitoring infrastructure went belly up. This was for a large
corporate intranet, in case that helps: the problem surfaced
mid-morning on Monday when the employee in charge of updating the lunch
menu for the week couldn't save the changes.
>>> - Are there any performance penalties by using very large invalidation
>>> queues (i.e. 300,000 objects) to reduce client cache verification time?
>> At a minimum, RAM occupied by that queue might be better used elsewhere.
>> I just don't use persistent caches, and tend to reboot appservers in
>> rotation after the ZEO storage has been down for any significant period
>> (almost never happens).
>
> In my case the clients might be down for a couple of days (typically 1 or
> 2 days) and they should not spend 30 mins in cache verification time each
> time they reconnect. So if these 300k objects take up 1k each, then they
> occupy 300 MB of ram which I am fine with.
If the client is disconnected for any period of time, it is far more
likely that just dumping the cache and starting over fresh will be a
win. The 'invalidation_queue' is primarily to support clients which
remain up while the storage server is down or unreachable.
>>> From what I've read it only seems to consume memory.
>> Note that the ZEO storage server makes copies of that queue to avoid
>> race conditions.
>
> Ok, I can see how copying and storing 300k objects is slow and can take up
> excessive amounts of memory.
Tres.
- --
===================================================================
Tres Seaver +1 540-429-0999 tseaver at palladion.com
Palladion Software "Excellence by Design" http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkvFBRMACgkQ+gerLs4ltQ6D2QCeNJujDxrJ0cGxkzPH4tMfcE+r
t9IAoIj0J7f4DXGiNUdQ8nVXA4eAWQYT
=7Dsq
-----END PGP SIGNATURE-----
More information about the ZODB-Dev
mailing list