Re: [Zope-dev] SVN: Zope/branches/2.13/ fix `LazyMap` to avoid unnecessary function calls when not accessing items in order (fixes http://dev.plone.org/plone/ticket/9018)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/14/2010 09:12 AM, Andreas Zeidler wrote:
Log message for revision 118863: fix `LazyMap` to avoid unnecessary function calls when not accessing items in order (fixes http://dev.plone.org/plone/ticket/9018)
First, thanks for looking into this issue.
Changed: U Zope/branches/2.13/doc/CHANGES.rst U Zope/branches/2.13/src/Products/ZCatalog/Lazy.py U Zope/branches/2.13/src/Products/ZCatalog/tests/test_lazy.py
-=- Modified: Zope/branches/2.13/doc/CHANGES.rst =================================================================== --- Zope/branches/2.13/doc/CHANGES.rst 2010-12-14 13:24:24 UTC (rev 118862) +++ Zope/branches/2.13/doc/CHANGES.rst 2010-12-14 14:12:11 UTC (rev 118863) @@ -11,6 +11,8 @@ Bugs Fixed ++++++++++
+- Fix `LazyMap` to avoid unnecessary function calls. + - LP 686664: WebDAV Lock Manager ZMI view wasn't accessible.
2.13.1 (2010-12-07)
Modified: Zope/branches/2.13/src/Products/ZCatalog/Lazy.py =================================================================== --- Zope/branches/2.13/src/Products/ZCatalog/Lazy.py 2010-12-14 13:24:24 UTC (rev 118862) +++ Zope/branches/2.13/src/Products/ZCatalog/Lazy.py 2010-12-14 14:12:11 UTC (rev 118863) @@ -145,44 +145,28 @@ # Don't access data until necessary
def __init__(self, func, seq, length=None): - self._seq = seq - self._data = [] - self._func = func + self._seq=seq + self._func=func if length is not None: - self._len = length + self._len=length
Please don't un-PEP8 a cleaned-up module: leave spaces around operators. If that was unintentional (maybe you pasted from a copy of a file made before the module was cleaned up?), then you still need to review the diff and edit out unintended changes first.
else: self._len = len(seq) + self._marker = object() + self._data = [self._marker] * self._len
Have you measured the impact of pre-allocating here for very large sequences? You are trading off space in return for speed for the (relatively) rare case of random-access indexing off-the end. I'm *sure* that is a not a good trade-off for all cases: the catalog is often used for queries which return result sets in the order of 10^5 or more objects, and *very* rarely is it accessed randomly (I had never seen it before reading the bug entry you cite). The normal case is to take the first few entries (batch_size * batch_number) from the huge set. I think you would be better off finding a way to inject your style of LazyMap into the catalog for the random-access case, e.g., by passing '_lazy_map' into the methods you are using. BTW, another interesting alternative impplementation might be to use a dictionary instead of a list for 'data'. You could then avoid any pre-allocation at all.
- def __getitem__(self, index): - data = self._data + def __getitem__(self,index): + data=self._data try: - s = self._seq + s=self._seq except AttributeError: return data[index]
While we're at it, the 'try: ... except:' here is wasteful and slow. Lazy objects aren't persistent, and the '_seq' attribute is added unconditionally in '__init__'.
- i = index - if i < 0: - i = len(self) + i - if i < 0: - raise IndexError(index) + value = data[index] + if value is self._marker: + value = data[index] = self._func(s[index]) + return value - ind = len(data) - if i < ind: - return data[i] - ind = ind - 1
- func = self._func - while i > ind: - try: - ind = ind + 1 - data.append(func(s[ind])) - except IndexError: - del self._func - del self._seq - raise IndexError(index) - return data[i] - -
Finally, whatever cleanup we settle on also needs to be forward-ported to the trunk. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk0HpV0ACgkQ+gerLs4ltQ4FsQCg2zpfEn+Hih6lNBqboQJDDBl5 7cIAn1rnqKtXsBv7j/s+PlcK0Nals7m1 =b/uG -----END PGP SIGNATURE-----
hi tres, On 14.12.2010, at 18:11, Tres Seaver wrote:
On 12/14/2010 09:12 AM, Andreas Zeidler wrote:
Modified: Zope/branches/2.13/src/Products/ZCatalog/Lazy.py =================================================================== --- Zope/branches/2.13/src/Products/ZCatalog/Lazy.py 2010-12-14 13:24:24 UTC (rev 118862) +++ Zope/branches/2.13/src/Products/ZCatalog/Lazy.py 2010-12-14 14:12:11 UTC (rev 118863) @@ -145,44 +145,28 @@ # Don't access data until necessary
def __init__(self, func, seq, length=None): - self._seq = seq - self._data = [] - self._func = func + self._seq=seq + self._func=func if length is not None: - self._len = length + self._len=length
Please don't un-PEP8 a cleaned-up module: leave spaces around operators. If that was unintentional (maybe you pasted from a copy of a file made before the module was cleaned up?), then you still need to review the diff and edit out unintended changes first.
yes, in was indeed unintentional, or rather unfortunate. i've originally worked on the 2.12 branch (in a plone 4 buildout) where i think hanno's pep8-related changes haven't been applied/back-ported yet. after that i pasted the version into 2.13 (and yes, the commit order was the other way round :)) and was in fact wondering about the longer diff when hanno (who was actually sitting next to me) pointed out he had changed the one-line if: and else: expressions on that branch. however, my alias for "svn diff" tells diff to not show white space changes so i missed the now again removed spaces around the =. anyway, it's fixed in c118895.
else: self._len = len(seq) + self._marker = object() + self._data = [self._marker] * self._len
Have you measured the impact of pre-allocating here for very large sequences?
i had not, but have now. turns out the allocation makes a difference earlier than i would have expected: accessing the first 20 items of the map is slower with the new version starting at sequences of a little more than 2000 items. and of course it (relatively) gets slower the longer the sequence is. setting up a 10^5 "empty" sequence takes about 0.5 ms on my machine.
I'm *sure* that is a not a good trade-off for all cases: the catalog is often used for queries which return result sets in the order of 10^5 or more objects, and *very* rarely is it accessed randomly (I had never seen it before reading the bug entry you cite). The normal case is to take the first few entries (batch_size * batch_number) from the huge set.
true, but the fix wasn't only about this use-case. it's mainly about avoiding extra function calls when fetching a batch other than the first. so far the map function wall called for every item up to the one accessed, e.g. for the 5th batch of 20 items each, you'd end up with 80 unnecessary (and possible expensive) calls. or iow, the old version wasn't really as lazy as it should have been… :)
I think you would be better off finding a way to inject your style of LazyMap into the catalog for the random-access case, e.g., by passing '_lazy_map' into the methods you are using.
i disagree. i think it should be fixed properly upstream. we should really avoid all that monkey-patching all over the place. but see below…
BTW, another interesting alternative impplementation might be to use a dictionary instead of a list for 'data'. You could then avoid any pre-allocation at all.
with the benchmark in place, i was curious and tried that as well: it beats the other versions independently of the sequence length and the number of accessed items, both sequentially and repeatedly over a single batch. in the benchmark they're accessed from the start, but skipping a few batches would make it even faster anyway. hence i've updated the code again in c118896. please let me know what you think, or if you want the test script i've been using.
While we're at it, the 'try: ... except:' here is wasteful and slow. Lazy objects aren't persistent, and the '_seq' attribute is added unconditionally in '__init__'.
you're right, this should be cleaned up as well. but since this isn't "my code" i was trying to keep the diff minimal (like on 2.12, i.e. in c118864, it obviously didn't work in the other one :)). in any case, it's also gone with c118896. cheers, andi -- pgp key at http://zitc.de/pgp - http://wwwkeys.de.pgp.net/ plone 4.0 released! -- http://plone.org/4
participants (2)
-
Andreas Zeidler -
Tres Seaver