[ZODB-Dev] ZEO's stats.py and simul.py
Tim Peters
tim at zope.com
Tue Apr 5 14:49:18 EDT 2005
[Chris Withers, on ZEO caches]
> Yeah, I saw Jeremy's wiki page about that. It seemed you guys made a lot
> of progress, did anything get taken forward to a release?
The ZODB 3.2 ZEO cache got tweaked as a result. The ZODB 3.3 ZEO cache is
entirely different, but is still more of a first cut than one of the
advanced designs we were looking at.
> ...
> OK, that makes me think that maybe simul.py has just got out of sync, it
> seems to have attracted less attention...
I don't remember anything about the simul.py in Zope 2.7.2 (which you said
you were using) -- too many releases ago. Some version of simul.py was very
heavily used by Jeremy and me, but I don't recall where it lived (maybe it
was even on a now-forgotten branch).
simul.py should be fixed, but doing so isn't in my foreseeable plans.
>> Try various sizes and judge results against whatever function you're
>> trying to optimize.
> Urg. simul.py was supposed to provide an alternative *schniff*
If theoretical hit rate is all that matters to you, yes. That's all
simul.py can do when it works. It can't model effects due to your OS file
caching gimmicks, competition for RAM, competition for L1 and L2 HW memory
caches, competition for disk I/O, competition for CPU cycles, competition
for network bandwidth ... nothing "real world", just theoretical hit rate.
Even that ignores that some objects are much bigger than others, and so also
more expensive to refetch from the server. "A hit" on a 128-byte object is
treated the same as "a hit" on a million-byte object, and same for "a miss".
Etc. It's gross and unrealistic. Quite possibly "better than nothing", but
certainly worse than _trying_ changes.
>> The obvious one is more disk space required. If you use a persistent
>> ZEO cache, then cache verification time at ZEO client connect/reconnect
>> times may also increase proportionately. Other than those, bigger is
>> probably better, and the 20MB (? whatever) default size is much smaller
>> than usually desirable (it's left over from days when typical disks were
>> much smaller than they are now). Try, e.g., 200MB. Like the results
>> better? Iterate.
>>
>> Note that while the ZEO cache is disk-based, it does have in-memory
>> index structures taking space proportional to the number of objects
>> cached. I suppose that if the cache file were big enough to hold
>> millions of objects, the RAM consumed by those indices could become
>> burdensome. Haven't heard of that happening in real life, though.
> OK, so what would you recommend for acheiving best "zodb speed" (ie:
> don't care about disk or memory usage, unless they affect speed) on ZEO
> client servers that are dual processor boxes and have one zeo client per
> processor? How about 2 clients per processor?
The only realistic approach is what I already suggested: change the size
and measure results, on your data, your HW, your OS, your app's object
access patterns, and using your idea of what "better" means.
If you're serious, you also need to play with changing the target number of
objects in your ZODB (Connection; in-memory; "pickle") caches. If you have
enough RAM, boosting that can have a much bigger primary effect on "ZODB
speed" than fiddling the ZEO cache. All object requests go to the ZODB
cache first. The ZEO cache is consulted only when the ZODB cache misses.
The ZODB cache also has a semi-intelligentreplacement strategy (LRU); the
ZEO cache's replacement strategy (whether in 3.2 or 3.3) is more an artifact
of what's reasonably easy to implement using a total of one or two disk
files than it is a theoretically desirable strategy.
More information about the ZODB-Dev
mailing list