[Zope3-dev] Re: [Bug 98024] Re: Add big files

Sat Apr 21 12:24:08 EDT 2007

On Apr 21, 2007, at 11:47 AM, Christian Theune wrote:

> Hi,
>
> (moving it to zope3-dev for discussion)
>
> Am Samstag, den 21.04.2007, 13:50 +0000 schrieb Jim Fulton:
>> On Apr 21, 2007, at 9:24 AM, Christian Theune wrote:
>>
>>> *sigh*
>>>
>>> This is a can of worms. It looks like email.FeedParser.FeedParser is
>>> probably what we want to use.
>>
>> Why?
>
> Ah. I was working on the integration of Blobs in Zope 3 and noticed  
> that
> we should try harder at some points to allow a more time-efficient
> handling of large data. As soon as a file is available as a
> NamedTemporaryFile today, a Blob can consume them doing a rename which
> works in O(1).

Only if it is on the same file system.  Of course, if not, it is no  
worse than the current situation.

> The only remaining issue is during the upload of large data via HTTP.
> What happens there is that the upload is first streamed to a temporary
> file in the server and then handed over to the publisher. The  
> publisher
> then parses the mime data and creates new files.
>
> This takes up the application thread for a task that doesn't require
> application resources, so this could be done before handing the  
> request
> into the publisher. For files of about 50 MB this already takes 5-10
> seconds, eventually slower, depending on the machine.

You likely don't want to use the select-lope thread (aka reactor  
thread) for this either.  Trying to improve this is likely to involve  
a tricky thread dance.

> Typically, the upload speed is relatively slow compared to the CPU  
> time
> required to unpack the mime data. IMHO we can make good use of that  
> time
> by unpacking it as early as possible.

I don't understand this.  While uploads are occurring, select threads  
and application threads can be doing other work.

> To avoid blocking, the server could feed the body into something like
> the feedparser while getting the data from the client. The publisher
> then can build on what was parsed already.

That's a huge change and one I'm dubious of without some additional  
thread gymnastics. I think there are simpler alternatives, although  
they to require a different thread-management strategy to get what I  
think you want.

> Dieter Maurer earlier pointed out an alternative (that would actually
> work with the current cgi module): just introduce a new thread that
> pre-processes the request before handing it into the publisher.

Right.  I also think we could adjust thread management in other ways  
to provide better application control.  For example,  I suspect that  
something interesting could be done at the publication layer.

There are really two different sets of resources:

1. Threads

2. "application resources" such as database conections.

Historically, we've limited application resources by limiting  
threads.  We could decouple these, allowing more threads, but making  
threads queue up to get application resources.

> That way we can free up the application threads for smaller  
> requests to
> go through in between. However, the user would still see the lag after
> uploading.

It's not clear to me that the user would see any longer delay with  
this strategy, although the strategy would need to be fleshed out  
more to be sure.

>
>>> And the Python guys have been talking
>>> about getting rid of cgi.FieldStorage in it's current implementation
>>> since 2005 but nothing has happened. :(
>>>
>>> Some issues that would be nice as a preparation:
>>>
>>> - make a cleaner alternative interface as a replacement for
>>> FieldStorage
>>
>> Could you describe the problems you percieve with FieldStorage.  I'm
>> not necessarily opposed to a change, but you need to present the
>> technical reasons.
>
> I'm trying hard, but it takes a lot of time to understand what's
> happening there. The public interface is badly underdocumented and  
> a lot
> of implicit behaviour is happening all over the place. I find it
> unmaintainable - and as I'd like to adjust it to my needs, I feel like
> rewriting would be better.

I'm not so sure about the implicit part. I agree it's complex.  
Unfortunately, a lot of that complexity is essential so it's  
unattractive to do it over since it works. Of course, better tests  
would help a lot.

...

>
> Hmm. I assume you're pointing out it's hairy but it works so nobody
> really got down to it. I see that when reading through various old
> discussions on the lists too.

Yup

Jim

--
Jim Fulton			mailto:jim at zope.com		Python Powered!
CTO 				(540) 361-1714			http://www.python.org
Zope Corporation	http://www.zope.com		http://www.zope.org