After seeing Chris McDonough's excellent paper at the Plone conference on optimizing content delivery using the new IStreamIterator iterface, I began experimenting with implementing such an optimization in ZopePageTemplates. I played around with having 'ZopePageTemplate._exec' request that it receive an iterator, instead of the usual big string. I hoped that such a change might enable greater concurrency and memory footprint, by avoiding creation of the big string at all; instead, medusa to push out the "chunk stream" represented by the StringIO buflist, while the appserver would be free to handle a new request without needing to malloc / copy the data. Here are the timings I have seen so far, using 'zopectl debug', with the following template: --------------- Template source -------------------------------- <html> <body> <div tal:repeat="item python:[x for x in range(1000)]" tal:content="item">ITEM</div> </body> </html> ---------------------------------------------------------------- --------------- Before the patch -------------------------------
Zope.debug('/test_iter', t=1) 250.2 milliseconds Zope.debug('/test_iter', t=1) 106.7 milliseconds Zope.debug('/test_iter', t=1) 106.5 milliseconds Zope.debug('/test_iter', t=1) 124.6 milliseconds
--------------- After the patch --------------------------------
Zope.debug('/test_iter', t=1) 249.2 milliseconds Zope.debug('/test_iter', t=1) 107.2 milliseconds Zope.debug('/test_iter', t=1) 125.0 milliseconds Zope.debug('/test_iter', t=1) 162.1 milliseconds
Given that the performance looks similar in this context, which doesn't benefit from the medusa / concurrency intent of the patch, it seems as though it might be a win (of *course* there aren't any tests for it!) I am attaching the patch I have so far for review and comment. Tres. -- =============================================================== Tres Seaver tseaver@zope.com Zope Corporation "Zope Dealers" http://www.zope.com
Tres Seaver wrote:
After seeing Chris McDonough's excellent paper at the Plone conference on optimizing content delivery using the new IStreamIterator iterface, I began experimenting with implementing such an optimization in ZopePageTemplates.
I played around with having 'ZopePageTemplate._exec' request that it receive an iterator, instead of the usual big string. I hoped that such a change might enable greater concurrency and memory footprint, by avoiding creation of the big string at all; instead, medusa to push out the "chunk stream" represented by the StringIO buflist, while the appserver would be free to handle a new request without needing to malloc / copy the data.
Here are the timings I have seen so far, using 'zopectl debug', with the following template:
--------------- Template source -------------------------------- <html> <body> <div tal:repeat="item python:[x for x in range(1000)]" tal:content="item">ITEM</div> </body> </html> ----------------------------------------------------------------
--------------- Before the patch -------------------------------
Zope.debug('/test_iter', t=1)
250.2 milliseconds
Zope.debug('/test_iter', t=1)
106.7 milliseconds
Zope.debug('/test_iter', t=1)
106.5 milliseconds
Zope.debug('/test_iter', t=1)
124.6 milliseconds ----------------------------------------------------------------
--------------- After the patch --------------------------------
Zope.debug('/test_iter', t=1) 249.2 milliseconds Zope.debug('/test_iter', t=1) 107.2 milliseconds Zope.debug('/test_iter', t=1) 125.0 milliseconds Zope.debug('/test_iter', t=1) 162.1 milliseconds
Given that the performance looks similar in this context, which doesn't benefit from the medusa / concurrency intent of the patch, it seems as though it might be a win (of *course* there aren't any tests for it!)
I am attaching the patch I have so far for review and comment.
Tres.
------------------------------------------------------------------------
This patch causes ZopePageTemplates to return an IStreamIterator when called (published), potentially allowing medusa to return the same response payload (via the iterator) without needing to have the appserver thread join it into a big string.
--- PageTemplate.py 2004-09-25 21:39:50.595938847 -0400 +++ PageTemplate.py.new 2004-09-25 21:39:31.983326567 -0400 @@ -83,7 +84,7 @@ c['root'] = self return c
- def pt_render(self, source=0, extra_context={}): + def pt_render(self, source=0, extra_context={}, iter_handler=None): """Render this Page Template""" if not self._v_cooked: self._cook() @@ -100,7 +101,10 @@ getEngine().getContext(c), output, tal=not source, strictinsert=0)() - return output.getvalue() + if iter_handler is not None: + return iter_handler(output) + else: + return output.getvalue()
def __call__(self, *args, **kwargs): if not kwargs.has_key('args'): --- ZopePageTemplate.py 2004-09-25 21:37:07.956803011 -0400 +++ ZopePageTemplate.py.new 2004-09-25 21:37:23.024870003 -0400 @@ -34,6 +34,8 @@ from OFS.Cache import Cacheable from OFS.Traversable import Traversable from OFS.PropertyManager import PropertyManager +from ZPublisher.Iterators import IStreamIterator + from PageTemplate import PageTemplate from Expressions import SecureModuleImporter from PageTemplateFile import PageTemplateFile @@ -253,7 +255,8 @@ # Execute the template in a new security context. security.addContext(self) try: - result = self.pt_render(extra_context=bound_names) + result = self.pt_render(extra_context=bound_names, + iter_handler=StringIOIterator) if keyset is not None: # Store the result in the cache. self.ZCacheable_set(result, keywords=keyset) @@ -331,6 +334,26 @@ setattr(ZopePageTemplate, 'source.xml', ZopePageTemplate.source_dot_xml) setattr(ZopePageTemplate, 'source.html', ZopePageTemplate.source_dot_xml)
+ +class StringIOIterator: + """ Adapt a StringIO object to IStreamIterator. + """ + + __implements__ = (IStreamIterator,) + + def __init__(self, stringio): + self._buflist = stringio.buflist + self._index = 0 + + def next(self): + + if self._index >= len(self._buflist): + raise StopIteration + + data, self._index = self._buflist[self._index], self._index + 1 + + return data + # Product registration and Add support manage_addPageTemplateForm = PageTemplateFile( 'www/ptAdd', globals(), __name__='manage_addPageTemplateForm')
------------------------------------------------------------------------
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
On Sat, 25 Sep 2004 22:08:06 -0400, Tres Seaver <tseaver@zope.com> wrote:
--------------- Before the patch -------------------------------
Zope.debug('/test_iter', t=1) 250.2 milliseconds Zope.debug('/test_iter', t=1) 106.7 milliseconds Zope.debug('/test_iter', t=1) 106.5 milliseconds Zope.debug('/test_iter', t=1) 124.6 milliseconds
--------------- After the patch --------------------------------
Zope.debug('/test_iter', t=1) 249.2 milliseconds Zope.debug('/test_iter', t=1) 107.2 milliseconds Zope.debug('/test_iter', t=1) 125.0 milliseconds Zope.debug('/test_iter', t=1) 162.1 milliseconds
Are these properly labelled? The way I read this is that things are just slightly faster without the patch. Is there anything to indicate that memory usage is actually improved? -Fred -- Fred L. Drake, Jr. <fdrake at gmail.com> Zope Corporation
Fred Drake wrote:
Are these properly labelled? The way I read this is that things are just slightly faster without the patch.
Yes, you read it correctly (although I think the real answer is "identical performance, within margin of error").
Is there anything to indicate that memory usage is actually improved?
I'm not there yet. This is essentially a "zero-copy" optimization; if I can show that the functionality is equivalent, with approximately the same performance under unloaded sites, *then* I can look into whether it helps. I envision two possible benefits, which will materialize only "at scale": - Removing the need for the appserver thread to malloc and copy the Big String *may* be a win on a memory-constrained system (large allocations can induce funny non-linearities on the underlying malloc implementation). Not having to copy values, even in C, should be a "pure" win, in any case. - Returning the iterator to medusa means that the application thread becomes available to service other threads that much more quickly. Even if the malloc issues can't be demonstrated, the increase in concurrency *should* be a win here. Tres. -- =============================================================== Tres Seaver tseaver@zope.com Zope Corporation "Zope Dealers" http://www.zope.com
Tres Seaver wrote at 2004-9-27 11:25 -0400:
... - Returning the iterator to medusa means that the application thread becomes available to service other threads that much more quickly. Even if the malloc issues can't be demonstrated, the increase in concurrency *should* be a win here.
Unless you generate huge results (which you probably should not do in the first place as the browser, too, will need even more memory) allocation and copying will be so fast that you should not see a significant effect. On my computer allocating and copying a string costed 10 MB 0.048 s 100 MB 0.487 s measured with the following function:
from time import time def copy(n): ... s = 'a'*n ... st = time(); sx = s[:-1]; return time()-st
You do not need this optimization for PageTemplates -- only for large files. -- Dieter
Dieter Maurer wrote:
Tres Seaver wrote at 2004-9-27 11:25 -0400:
... - Returning the iterator to medusa means that the application thread becomes available to service other threads that much more quickly. Even if the malloc issues can't be demonstrated, the increase in concurrency *should* be a win here.
Unless you generate huge results (which you probably should not do in the first place as the browser, too, will need even more memory) allocation and copying will be so fast that you should not see a significant effect.
On my computer allocating and copying a string costed
10 MB 0.048 s 100 MB 0.487 s
measured with the following function:
from time import time def copy(n):
... s = 'a'*n ... st = time(); sx = s[:-1]; return time()-st
You do not need this optimization for PageTemplates -- only for large files.
As I said earlier in the thread, I don't believe the win will show up at all in the "easy test rig" case: only systems which are already either memory-constrained or else have fragmented heaps show the non-linearity for allocating the large blocks. Tres. -- =============================================================== Tres Seaver tseaver@zope.com Zope Corporation "Zope Dealers" http://www.zope.com
On Mon, 27 Sep 2004 11:25:16 -0400, Tres Seaver <tseaver@zope.com> wrote:
Yes, you read it correctly (although I think the real answer is "identical performance, within margin of error").
With other benefits, that's good enough, too.
Is there anything to indicate that memory usage is actually improved?
I'm not there yet. This is essentially a "zero-copy" optimization; if I can show that the functionality is equivalent, with approximately the same performance under unloaded sites, *then* I can look into whether it helps.
Ok; understood.
I envision two possible benefits, which will materialize only "at scale":
- Removing the need for the appserver thread to malloc and copy the Big String *may* be a win on a memory-constrained system (large allocations can induce funny non-linearities on the underlying malloc implementation). Not having to copy values, even in C, should be a "pure" win, in any case.
Agreed. Doing less is always a win, eventually. Often in combination with doing just a little less in lots of places, but that's ok.
- Returning the iterator to medusa means that the application thread becomes available to service other threads that much more quickly. Even if the malloc issues can't be demonstrated, the increase in concurrency *should* be a win here.
This is interesting. If you're pushing content out the socket earlier, you need to guard against failure much earlier, or be willing to deal with it in different ways. This is likely less of an issue for sites that have been well-tested and are considered ready for production. I'll need to review the patch more carefully, especially since I don't know much about the Medusa layer. Does the modified code deal properly with errors encountered late in the transformation of a template? -Fred -- Fred L. Drake, Jr. <fdrake at gmail.com> Zope Corporation
Fred Drake wrote:
On Mon, 27 Sep 2004 11:25:16 -0400, Tres Seaver <tseaver@zope.com> wrote:
Yes, you read it correctly (although I think the real answer is "identical performance, within margin of error").
With other benefits, that's good enough, too.
Is there anything to indicate that memory usage is actually improved?
I'm not there yet. This is essentially a "zero-copy" optimization; if I can show that the functionality is equivalent, with approximately the same performance under unloaded sites, *then* I can look into whether it helps.
Ok; understood.
I envision two possible benefits, which will materialize only "at scale":
- Removing the need for the appserver thread to malloc and copy the Big String *may* be a win on a memory-constrained system (large allocations can induce funny non-linearities on the underlying malloc implementation). Not having to copy values, even in C, should be a "pure" win, in any case.
Agreed. Doing less is always a win, eventually. Often in combination with doing just a little less in lots of places, but that's ok.
- Returning the iterator to medusa means that the application thread becomes available to service other threads that much more quickly. Even if the malloc issues can't be demonstrated, the increase in concurrency *should* be a win here.
This is interesting. If you're pushing content out the socket earlier,
I'm not looking for that; only to avoid constructing the big result string just to hand off to the publisher. you need to guard against failure much earlier, or be willing
to deal with it in different ways. This is likely less of an issue for sites that have been well-tested and are considered ready for production.
I'll need to review the patch more carefully, especially since I don't know much about the Medusa layer. Does the modified code deal properly with errors encountered late in the transformation of a template?
Transformation is already complete at that point. The only difference is the type of the result returned (eventually) to the publisher. BTW, I looked again at where StringIO is used, and it seems that ZPT constructs extra StringIO objects for (at least some) nested blocks. In that case, we might be able to extend the win by having the "calling" bit use 'list.extend' for its own buflist, instead of calling 'getvalue'. Tres. -- =============================================================== Tres Seaver tseaver@zope.com Zope Corporation "Zope Dealers" http://www.zope.com
On Mon, 27 Sep 2004 14:18:30 -0400, Tres Seaver <tseaver@zope.com> wrote:
Transformation is already complete at that point. The only difference is the type of the result returned (eventually) to the publisher.
Ok, that sounds good.
BTW, I looked again at where StringIO is used, and it seems that ZPT constructs extra StringIO objects for (at least some) nested blocks. In that case, we might be able to extend the win by having the "calling" bit use 'list.extend' for its own buflist, instead of calling 'getvalue'.
That shouldn't be hard to do neatly, and seems an easy win. -Fred -- Fred L. Drake, Jr. <fdrake at gmail.com> Zope Corporation
Tres Seaver wrote: ...
- Returning the iterator to medusa means that the application thread becomes available to service other threads that much more quickly. Even if the malloc issues can't be demonstrated, the increase in concurrency *should* be a win here.
I don't think this would affect application threads. The application thread is freed as soon as the ZPT template completed rendering. This will be the same whether we dribbled data out as we go, or not. The potential benefit is that we could get the page back to the browser faster and thus reduce the number of open connection that asyncore has to manage. This benefit will be difficult to obtain unless the publisher supports streaming outbut via chunks. (I doubt that it does, but I don't really know.) Without output chunking, the server can't start sending output until the content length is known and it can't know that until the template has been fully generated. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
Jim Fulton wrote:
Tres Seaver wrote: ...
- Returning the iterator to medusa means that the application thread becomes available to service other threads that much more quickly. Even if the malloc issues can't be demonstrated, the increase in concurrency *should* be a win here.
I don't think this would affect application threads. The application thread is freed as soon as the ZPT template completed rendering. This will be the same whether we dribbled data out as we go, or not.
The appserver thread still has to the "assembly" (malloc + copy) work to return a Big String response to medusa; I'm trying to avoid that.
The potential benefit is that we could get the page back to the browser faster and thus reduce the number of open connection that asyncore has to manage. This benefit will be difficult to obtain unless the publisher supports streaming outbut via chunks. (I doubt that it does, but I don't really know.)
lib/python/ZPublisher/Iterators.py does that (but it isn't for "lazy" rendering, because, as you point out, it needs to know the content length).
Without output chunking, the server can't start sending output until the content length is known and it can't know that until the template has been fully generated.
My proposal isn't to begin sending content back before the rendering is complete; I just want to avoid copying the content from the list-of-many-small-strings (StringIO's 'buflist') to a single BigString. The IStreamIterator interface allows me to avoid the malloc+copy part; the appserver thread would be done as soon as the last ZPT opcode completes. Tres. -- =============================================================== Tres Seaver tseaver@zope.com Zope Corporation "Zope Dealers" http://www.zope.com
participants (5)
-
Dieter Maurer -
Fred Drake -
Jim Fulton -
Niels Mache, struktur AG -
Tres Seaver