[Zope-dev] Re: [Zope-DB] KeyError from Shared.DC.DA.DA
Chris Withers
chris at simplistix.co.uk
Mon Sep 25 14:04:24 EDT 2006
This probably merits the Zope Dev list too, since this is part of "core
zope"...
cheers,
Chris
FAIRS, Dan, GBM wrote:
> Hi,
>
> Under load, we're seeing KeyErrors from the _cached_result method of the DA
> class in the code used to clear out cached results. We're using Python
> 2.3.5, Zope 2.8.5, mxODBCZopeDA-1.0.8 running on RHEL3 connecting to a
> remote Microsoft SQL Server 2000 database.
>
> Note this is a different issue to Collector #1745.
>
> Here's the salient part of the traceback:
>
> File "/opt/intranet/cbfmnet/zope_home/Products/Membership/Member.py", line
> 118, in _getDBValue
> values = self._getDBValues(attributes=[attribute])
> File "/opt/intranet/cbfmnet/zope_home/Products/Membership/Member.py", line
> 96, in _getDBValues
> , userid=login_elements[1]
> File "/opt/intranet/cbfmnet/zope/lib/python/Shared/DC/ZRDB/DA.py", line
> 453, in __call__
> result=self._cached_result(DB__, (query, self.max_rows_))
> File "/opt/intranet/cbfmnet/zope/lib/python/Shared/DC/ZRDB/DA.py", line
> 375, in _cached_result
> if int(cache[q][0]) == key:
> KeyError: ("execute dbo.EmployeeByNT 'DOMAIN', 'user'\n", 1000, '\nDBConnId:
> None')
>
> The ZSQL Method EmployeeByNT looks like this (it's actually an FSZSQL method
> in a skin):
>
> <dtml-comment>
> connection_id: DB_Connection
> arguments:domain userid
> cache_time: 60
> max_cache: 50000
> </dtml-comment>
> execute dbo.EmployeeByNT <dtml-sqlvar domain type="string">, <dtml-sqlvar
> userid type="string">
>
> We do have other ZSQL methods, but this one is by far the most heavily used.
>
>
> Here's (most of) the offending method, _cached_result (I've snipped some
> code which just appends the db connection id to the query. Note also that
> Bucket is also defined as lambda: {} at the top of DA.py):
>
> # Try to fetch from cache
> if hasattr(self,'_v_cache'): cache=self._v_cache
> else: cache=self._v_cache={}, Bucket()
> cache, tcache = cache
> max_cache=self.max_cache_
> now=time()
> t=now-self.cache_time_
> if len(cache) > max_cache / 2:
> keys=tcache.keys()
> keys.reverse()
> while keys and (len(keys) > max_cache or keys[-1] < t):
> key=keys[-1]
> q=tcache[key]
> del tcache[key]
> if int(cache[q][0]) == key:
> del cache[q]
> del keys[-1]
>
> if cache.has_key(query):
> k, r = cache[query]
> if k > t: return r
>
> # call the pure query
> result=apply(DB__.query, pure_query)
> if self.cache_time_ > 0:
> tcache[int(now)]=query
> cache[query]= now, result
>
> return result
>
> In this code, 'tcache' is a dict which maps integers representing the time
> to queries (the result of calling int(t) where t is the floating-point
> result from time.time()) , and 'cache' maps queries to a 2-tuple consisting
> of the floating-point timestamp of when the method was called, and the
> result of the query.
>
> I've been trying to figure out a circumstance which could cause the KeyError
> we're seeing in production, without success.
>
> Here are some observations which may give some clues:
>
> The result of time.time() is converted to an integer and used as a key into
> tcache. AFAICS, this means that only 1 entry per second will be stored in
> tcache, even though every query will be stored in cache. It also looks like
> cache could be the source of a memory leak.
>
> For example:
> - 'SELECT a FROM b' and 'SELECT c FROM b' run in the same 1-second window
> - 'SELECT c FROM b' will end up stored in tcache.
> - Both queries end up in cache.
> - 'SELECT a FROM b' will only ever be removed from cache if 'SELECT a FROM
> b' is executed once more later that second - and then 'SELECT c FROM b' will
> remain.
>
> We only see the KeyError after 'some time' in the application. Presumably
> this is the time it takes for the caches to fill and start actively removing
> entries as per the while loop. (We restarted the cluster last night, and
> have yet to see the errors return).
>
> The SQL query we are running maps an NT credential (domain and userid) to an
> internal user id from our corporate database. To give an idea of the usage
> pattern of this ZSQL method, it gets run at least once per request, with
> application serving approximately 150,000 - 180,000 requests per day, with
> 12,000 - 14,000 distinct users (and hence different arguments to the ZSQL
> method).
>
> I'm going to put some logging code around the line that causes the KeyError
> next time we do a release. However - has anyone seen this before? Can anyone
> see what might be going wrong here?
>
> Cheers,
> Dan
--
Simplistix - Content Management, Zope & Python Consulting
- http://www.simplistix.co.uk
More information about the Zope-Dev
mailing list