TAL page whitespace removal
Due to the structure of TAL, the output of TAL pages often includes a lot of whitespace that are added to make the source readable, but once processed the lines they are on can disappear completely. It would be nice if the TAL parser had an option that could remove all whitespace at the beginning and end of lines, and remove all blank lines. For example, I downloaded the www.plone.org home page and it was 47704 bytes. I removed all whitespace from the beginning and end of lines, and then removed blank lines, and it was down to 35087 bytes - that's a more than 25% saving, and the output renders exactly the same in web browsers. This saving will improve load times for the end user, both by being less to download and less for their browser to hold in memory. For the server, the improvements are huge. Obviously less bandwidth is required, but also less space will be needed in the various caches meaning the caches will expire less, and/or be equally functional will less RAM or disc space. The ability to switch off this feature and leave whitespace in for debugging may be useful, but not very, because you can always use an XML indenting tool to get readable output from TAL. Robert (Jamie) Munro
Robert (Jamie) Munro wrote at 2006-4-10 13:14 +0100:
.... For example, I downloaded the www.plone.org home page and it was 47704 bytes. I removed all whitespace from the beginning and end of lines, and then removed blank lines, and it was down to 35087 bytes - that's a more than 25% saving, and the output renders exactly the same in web browsers.
A much more efficient way is to activate "gzip" compression for your pages. It not only handles white space efficiently (and correct for e.g. pages with "pre" tags or similar CSS directives) it also compresses other text. You not only gain 25 % but about 70 % (in the size of requests). You will not save RAM on the server side (of course), but hopefully (and very likely) the scripts are not the major user of RAM on your site. -- Dieter
Dieter Maurer wrote:
Robert (Jamie) Munro wrote at 2006-4-10 13:14 +0100:
.... For example, I downloaded the www.plone.org home page and it was 47704 bytes. I removed all whitespace from the beginning and end of lines, and then removed blank lines, and it was down to 35087 bytes - that's a more than 25% saving, and the output renders exactly the same in web browsers.
A much more efficient way is to activate "gzip" compression for your pages. It not only handles white space efficiently (and correct for e.g. pages with "pre" tags or similar CSS directives) it also compresses other text. You not only gain 25 % but about 70 % (in the size of requests).
gzip will add enormous processing overhead to the server. Striping spaces will add negligible overhead, likely less overhead than it saves.
You will not save RAM on the server side (of course), but hopefully (and very likely) the scripts are not the major user of RAM on your site.
I have written TAL that produces very large dumps of XML data in the past, even whole sites. It's a really nice way to dump data from a database (SQL or Zope DB), but Zope has to build the whole output in RAM before sending any of it, so it can cause the site to crash. I would hope in this kind of case that the TAL is the major user of RAM on the site, so any saving would be really good, but it all cases (except <pre> tags, which I would never use) it seems like a possibly significant gain. Robert Munro
Robert (Jamie) Munro wrote:
Dieter Maurer wrote:
Robert (Jamie) Munro wrote at 2006-4-10 13:14 +0100:
.... For example, I downloaded the www.plone.org home page and it was 47704 bytes. I removed all whitespace from the beginning and end of lines, and then removed blank lines, and it was down to 35087 bytes - that's a more than 25% saving, and the output renders exactly the same in web browsers. A much more efficient way is to activate "gzip" compression for your pages. It not only handles white space efficiently (and correct for e.g. pages with "pre" tags or similar CSS directives) it also compresses other text. You not only gain 25 % but about 70 % (in the size of requests).
gzip will add enormous processing overhead to the server. Striping spaces will add negligible overhead, likely less overhead than it saves.
Doubtful. If high-throughput servers can encrypt and decrypt SSL traffic, gzip isn't going to be a problem. The difference between running gzip versus htmltidy shouldn't even be significantly noticeable on a sufficiently-powered server. I know that super-high throughput sites (e.g. Google) intentionally break some standards to save bandwidth (where omitting '</body></html>' could save gigabytes per day), but if you're in a situation like that, Zope probably isn't for you anyway.
You will not save RAM on the server side (of course), but hopefully (and very likely) the scripts are not the major user of RAM on your site.
I have written TAL that produces very large dumps of XML data in the past, even whole sites. It's a really nice way to dump data from a database (SQL or Zope DB), but Zope has to build the whole output in RAM before sending any of it, so it can cause the site to crash.
One solution I've found is to buffer the writes to REQUEST.RESPONSE by using a python script which the calls granular page templates rather than a single monolithic template, and outputting the results 25k at a time or so; it gives the rest of the server some time to catch up. However, the point you bring up has nothing to do with whether or not the output has significant whitespace in it -- double or triple the amount of data sent, and you're still in the same boat. If you're this concerned about bandwidth, you're probably using the wrong tool for the job.
I would hope in this kind of case that the TAL is the major user of RAM on the site, so any saving would be really good, but it all cases (except <pre> tags, which I would never use) it seems like a possibly significant gain.
There's a reason for TAL being rendered and stored before it's sent. Zope is an *Application Server*. If there's an error in rendering, it gives the server an opportunity to handle this case gracefully in an *Application-Specific* way, rather than sending a half-finished page to the browser followed by an error message that will be rendered God-knows-how to the user. If you render & store before sending, you can catch rendering errors and handle them gracefully. -- Floyd May Senior Systems Analyst CTLN - CareerTech Learning Network fmay@okcareertech.org
On Wed, Apr 12, 2006 at 01:56:58PM -0500, Floyd May wrote:
One solution I've found is to buffer the writes to REQUEST.RESPONSE by using a python script which the calls granular page templates rather than a single monolithic template, and outputting the results 25k at a time or so; it gives the rest of the server some time to catch up.
Note that this doesn't buy you any improved responsiveness if you're running behind e.g. apache, because apache has to read the entire response from Zope before it starts sending it back to the client. -- Paul Winkler http://www.slinkp.com
Paul Winkler wrote:
On Wed, Apr 12, 2006 at 01:56:58PM -0500, Floyd May wrote:
One solution I've found is to buffer the writes to REQUEST.RESPONSE by using a python script which the calls granular page templates rather than a single monolithic template, and outputting the results 25k at a time or so; it gives the rest of the server some time to catch up.
Note that this doesn't buy you any improved responsiveness if you're running behind e.g. apache, because apache has to read the entire response from Zope before it starts sending it back to the client.
Wasn't aware of that, but I've tested it from behind Squid, and it works like a charm. -- Floyd May Senior Systems Analyst CTLN - CareerTech Learning Network fmay@okcareertech.org
On Thu, Apr 13, 2006 at 09:04:38AM -0500, Floyd May wrote:
Paul Winkler wrote:
On Wed, Apr 12, 2006 at 01:56:58PM -0500, Floyd May wrote:
One solution I've found is to buffer the writes to REQUEST.RESPONSE by using a python script which the calls granular page templates rather than a single monolithic template, and outputting the results 25k at a time or so; it gives the rest of the server some time to catch up.
Note that this doesn't buy you any improved responsiveness if you're running behind e.g. apache, because apache has to read the entire response from Zope before it starts sending it back to the client.
Wasn't aware of that, but I've tested it from behind Squid, and it works like a charm.
Actually I really should qualify that; it depends what you're trying to do. The only "problem" I have with streaming behind mod_proxy / mod_rewrite is that it does some buffering, and AFAIK there's no way to turn that off on a per-request basis. Even on a global basis it looks like the ProxyReceiveBufferSize can't be set to less than 512 bytes. (Which would probably make performance suck for everything else anyway.) So if you're trying to do some quick-and-dirty pre-AJAX-style status information, where you're streaming small bits of text to the browser, as I did in the ZSyncer UI, then you're out of luck. It's trivial to verify this with a particular reverse proxy setup by visiting a script something like: # Assuming you've made time importable... import time response = context.REQUEST.RESPONSE msgs = '<br/>hello\n' * 20 response.setHeader('content-type', 'text/html') response.setHeader('content-length', str(len(msgs))) for line in msgs.split(): response.write(line + '\n') time.sleep(0.5) If I view this directly at the zope server, I see each "hello" appear after a short delay. If I view it via apache with mod_rewrite, I see nothing for 10 seconds, then the whole page at once. (Note you can simply leave out the content-length header to get response.write() to use http 1.1-style chunking, which is convenient if you can't pre-calculate an accurate size. This has the same buffering issue behind Apache, and additionally requires the client to be using http 1.1.) OTOH, if you're streaming large blobs in chunks of e.g. 64kb, streaming through apache seems to work just fine. This is probably a more common case. -- Paul Winkler http://www.slinkp.com
One more correction for the archives: On Thu, Apr 13, 2006 at 11:45:41AM -0400, Paul Winkler wrote:
On Thu, Apr 13, 2006 at 09:04:38AM -0500, Floyd May wrote:
Paul Winkler wrote:
Note that this doesn't buy you any improved responsiveness if you're running behind e.g. apache, because apache has to read the entire response from Zope before it starts sending it back to the client.
That sentence is flat wrong. It only looks that way if your data size is smaller than apache's buffer size. -- Paul Winkler http://www.slinkp.com
Robert (Jamie) Munro wrote:
gzip will add enormous processing overhead to the server. Striping spaces will add negligible overhead, likely less overhead than it saves.
I hope you've got a full set of tests that prove these sweeping statements you're making ;-)
I have written TAL that produces very large dumps of XML data in the past, even whole sites. It's a really nice way to dump data from a database (SQL or Zope DB), but Zope has to build the whole output in RAM before sending any of it, so it can cause the site to crash.
Then write your code better. While it's easiest to do this all in memory, it doesn't scale, as you're explaining...
I would hope in this kind of case that the TAL is the major user of RAM on the site,
Actually, unless you're careful, you're likely to be dragging all the zope objects into memory too, and that'll be what's really killing it ;-) _p_deactivate and zodb object cache minimisation are your friends ;-)
so any saving would be really good, but it all cases (except <pre> tags, which I would never use) it seems like a possibly significant gain.
I'm afraid I think you're mistaken. Speed-wise, I'll bet the security architecture will cost you much more than gzip _and_ space stripping combined ;-) For memory, doing the whole thing in one go simply won't scale, so you'll have to re-think... cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
participants (5)
-
Chris Withers -
Dieter Maurer -
Floyd May -
Paul Winkler -
Robert (Jamie) Munro