[Checkins] SVN: zc.async/trunk/src/zc/async/README.txt Add more
discussion. Checkpoint.
Gary Poster
gary at zope.com
Wed Aug 23 23:48:16 EDT 2006
Log message for revision 69753:
Add more discussion. Checkpoint.
Changed:
U zc.async/trunk/src/zc/async/README.txt
-=-
Modified: zc.async/trunk/src/zc/async/README.txt
===================================================================
--- zc.async/trunk/src/zc/async/README.txt 2006-08-24 03:43:52 UTC (rev 69752)
+++ zc.async/trunk/src/zc/async/README.txt 2006-08-24 03:48:15 UTC (rev 69753)
@@ -2,30 +2,274 @@
zc.async
========
-The zc.async package provides a way to make asynchronous application
-calls.
+Goals
+=====
-Calls are handled by worker processes, each of which is typically a
-standard Zope process, identified by a UUID [#uuid]_, that
-may simultaneously perform other tasks (such as handle standard web
-requests). Each worker is responsible for claiming and performing calls
-in its main thread or additional threads. To have multiple workers on
-the same queue of tasks, share the database with ZEO.
+The zc.async package provides a way to make scalable asynchronous application
+calls. Here are some example core use cases.
-Pending calls and worker data objects are stored in a data manager
-object in the ZODB. This is typically stored in the root of the ZODB,
-alongside the application object, with a key of 'zc.async.datamanager',
-but the adapter that obtains the data manager can be replaced to point
-to a different location.
+- You want to let users create PDFs through your application. This can take
+ quite a bit of time, and will use both system resources and one of the
+ precious application threads until it is done. Naively done, six or seven
+ simultaneous PDF requests could make your application unresponsive to any
+ other users. Using zc.async, the job can be sent to other machines; you
+ can write AJAX to poll for job completion, or you might have the end of the
+ job send an email to deliver the PDF.
-Worker data objects have queues representing potential or current tasks
-for the worker, in the main thread or a secondary thread. Each worker
-has a virtual loop, part of the Twisted main loop, for every worker
-process, which is responsible for responding to system calls (like
-pings) and for claiming pending main thread calls by moving them from
-the datamanager async queue to their own. Each worker thread queue also
-represents spots for claiming and performing pending thread calls.
+- You want to let users spider a web site; communicate with a credit card
+ company; query a large, slow LDAP database on another machine; or do
+ some other action that generates network requests from the server.
+ Again, if something goes wrong, several requests could make your
+ application unresponsive. With zc.async, this could be serialized,
+ with objects indicating that a spidering is in-progress; and performed
+ out of the main application threads.
+- You have an application job in the ZODB that you discover is taking
+ longer than users can handle, even after you optimize it. You want a
+ quick fix to move the work out-of-band.
+
+A common thread with these core use cases is that end-users need to be
+able to start expensive processes on demand. These are not scheduled
+tasks. You may want to have a bank of workers that can perform the task.
+You may want to be able to quickly and easily convert an application
+server process to a worker process, or vice versa, on the basis of
+traffic and demand.
+
+Another Zope 3 package that approaches somewhat similar use cases to
+these is lovely.remotetask (http://svn.zope.org/lovely.remotetask/).
+There appear to be some notable differences; describing them
+authoritatively and without possible misrepresentation will have to
+wait for another day.
+
+Another set of use cases center around scheduling: we need to retry an
+asynchronous task after a given amount of time; or we want to have a
+requested job happen late at night; or we want to have a job happen
+regularly. The Zope 3 scheduler
+(http://svn.zope.org/Zope3/trunk/src/scheduler/) approaches the last of
+these tasks with more infrastructure than zc.async, as arguably does a
+typical "cron wget" approach. However, both approaches are prone to
+serious problems when the scheduled task takes more time than expected,
+and one instance of a task overlaps the previous one, sometimes causing
+disastrous problems. By using zc.async partials to represent the
+pending result, and even to schedule the next call, this problem can be
+alleviated.
+
+History
+=======
+
+This is a second-generation design. The first generation was `zasync`,
+a mission-critical and successful Zope 2 product in use for a number of
+high-volume Zope 2 installations. It had the following goals:
+
+- be scalable, so that another process or machine could do the asynchronous
+ work;
+
+- support lengthy jobs outside of the ZODB;
+
+- support lengthy jobs inside the ZODB;
+
+- be recoverable, so that crashes would not lose work;
+
+- be discoverable, so that logs and web interfaces give a view into the work
+ being done asynchronously;
+
+- be easily extendible, to do new jobs; and
+
+- support graceful job expiration and cancellation.
+
+It met its goals well in some areas and adequately in others.
+
+Based on experience with the first generation, this second generation
+identifies several areas of improvement from the first design, and adds
+several goals.
+
+- Improvements
+
+ * More carefully delineate the roles of the comprising components.
+
+ The zasync design has three main components, as divided by their
+ roles: persistent deferreds, now called partials; persistent
+ deferred queues (the original zasync's "asynchronous call manager");
+ and asynchronous workers (the original zasync ZEO client). The
+ zasync 1.x design blurred the lines between the three components
+ such that the component parts could only be replaced with
+ difficulty, if at all. A goal for the 2.x design is to clearly
+ define the role for each of three components such that, for
+ instance, a user of a persistent deferred does not need to know
+ about the persistent deferred queue.
+
+ * Improve scalability of asynchronous workers.
+
+ The 1.x line was initially designed for a single asynchronous worker,
+ which could be put on another machine thanks to ZEO. Tarek Ziad of
+ Nuxeo wrote zasyncdispatcher, which allowed multiple asynchronous workers
+ to accept work, allowing multiple processes and multiple machines to
+ divide and conquer. It worked around the limitations of the original
+ zasync design to provide even more scalability. However, it was forced to
+ divide up work well before a given worker looks at the queue.
+
+ While dividing work earlier allows guesses and heuristics a chance to
+ predict what worker might be more free in the future, a more reliable
+ approach is to let the worker gauge whether it should take a job at the
+ time the job is taken. Perhaps the worker will choose based on the
+ worker's load, or other concurrent jobs in the process, or other details.
+ A goal for the 2.x line is to more directly support this type of
+ scalability.
+
+ * Improve scalability of registering deferreds.
+
+ The 1.x line initially wasn't concerned about very many concurrent
+ asynchronous requests. When this situation was encountered, it caused
+ ConflictErrors between the worker process reading the deferred queue
+ and the code that was adding the deferreds. Thanks to Nuxeo, this
+ problem was addressed in the 1.x line. A goal for the new version
+ is to include and improve upon the 1.x solution.
+
+ * Make it even simpler to provide new jobs.
+
+ In the first version, `plugins` performed jobs. They had a specific
+ API and they had to be configured. A goal for the new version is to
+ require no specific API for jobs, and to not require any configuration.
+
+ * Improve report information, especially through the web.
+
+ The component that the first version of zasync provided to do the
+ asynchronous work, the zasync client, provided very verbose logs of the
+ jobs done, but they were hard to read and also did not have a through-
+ the-web parallel. Two goals for the new version are to improve the
+ usefulness of the filesystem logs and to include more complete
+ through-the-web visibility of the status of the provided asynchronous
+ clients.
+
+ * Make it easier to configure and start, especially for small deployments.
+
+ A significant barrier to experimentation and deployment of the 1.x line
+ was the difficulty in configuration. The 1.x line relied on ZConfig
+ for zasync client configuration, demanding non-extensible
+ similar-yet-subtly-different .conf files like the Zope conf files.
+ The 2.x line plans to provide code that Zope 3 can configure to run in
+ the same process as a standard Zope 3 application. This means that
+ development instances can start a zasync quickly and easily. It also
+ means that processes can be reallocated on the fly during production use,
+ so that a machine being used as a zasync process can quickly be converted
+ to a web server, if needed, and vice versa. It further means that the
+ Zope web server can be used for through-the-web reports of the current
+ zasync process state.
+
+- New goals
+
+ * Support intermediate return calls so that jobs can report back how they
+ are doing.
+
+ A frequent request from users of zasync 1.x was the ability for a long-
+ running asynchronous process to report back progress to the original
+ requester. The 2.x line addresses this with three changes:
+
+ + persistent deferreds are annotatable;
+
+ + persistent deferreds should not be modified in an asynchronous
+ job that does work (though they may be read);
+
+ + jobs can request another deferred in a synchronous process that
+ annotates the deferred with progress status or other information.
+
+ Because of relatively recent changes in ZODB--multi version concurrency
+ control--this simple pattern should not generate conflict errors.
+
+ * Support time-delayed calls.
+
+ Retries and other use cases make time-delayed deferred calls desirable.
+ The new design supports these sort of calls.
+
+It's worthwhile noting that zc.async has absolutely no backwards
+comapatibility with zasync.
+
+Status
+======
+
+This is alpha software. Tests range for various components range from
+very good to preliminary. All existing tests pass. Even the first
+"final" release is not scheduled to meet all goals identified in the
+history discussion above. See TODO.txt in this package.
+
+Dependencies
+============
+
+zc.async relies on the uuid module as currently found in the Python 2.5
+library (this can be used with earlier versions of Python). See
+http://svn.python.org/view/python/trunk/Lib/uuid.py?view=auto.
+
+It currently relies on the Twisted reactor running, but does not require
+that the Twisted reactor be used for any particular task (such as being
+used as a web server, for instance). That said, when used with Zope,
+zc.async currently requires that a Twisted server be used. Future
+revisions may provide alternate worker engines for Medusa, as used by
+ZServer; though, of course, reactor tasks would either not be supported
+or be reduced to Medusa capabilities.
+
+zc.twist (http://svn.zope.org/zc.twist/) is used for the Twisted
+reactor/ZODB interactions.
+
+zc.set (http://svn.zope.org/zc.set/) and zc.queue
+(http://svn.zope.org/zc.queue/) are used for internal data structures.
+
+Design Overview
+===============
+
+An application typically has a single zc.async datamanager, which is the
+object that client code will obtain to use the primary zc.async
+capabilities. Expected spelling to get the datamanager is something
+like ``dm = zc.async.interfaces.IDataManager(context)``. The
+datamanager is typically stored in the root of the ZODB, alongside the
+application object, with a key of 'zc.async.datamanager', but the
+adapter that obtains the data manager can be replaced to point to a
+different location.
+
+Clients of zc.async need to identify the job they want done. It must be
+a single pickleable callable: a global function, a callable persistent
+object, a method of a persistent object, or a special
+zc.async.partial.Partial object, discussed later. This job should be
+passed to the `put` method of one of two queues on the datamanager: a
+`thread` queue and a `reactor` queue.
+
+- A `thread` job is one that is to be performed in a thread with a
+ dedicated ZODB connection. It's the simplest to use for typical
+ tasks. A thread job also may be overkill for some jobs that don't need
+ a connection constantly. It also is not friendly to Twisted services.
+
+- A `reactor` job is a non-blocking job to be performed in the main thread,
+ in a call scheduled by the Twisted reactor. It has some gotchas (see
+ zc.twist's README), but it can be good for jobs that don't need a
+ constant connection, and for jobs that can leverage Twisted code.
+
+The `put` call will return an object that represents the job--both the
+information about the job requested, and the state and result of
+performing the job. An example spelling for this might be
+``self.pending_result = dm.thread.put(self.performSpider)``. The
+returned object can be simply persisted and polled to see when the job
+is complete; or it can be set to do tasks when it completes.
+
+If nothing else is done, the call will be done as soon as possible:
+zc.async client code shouldn't typically have to worry further. The
+client code could have specifed that the job should `begin_after` a
+certain datetime; or that the job should `begin_by` a duration after
+`begin_after` or else it will fail; or that the job should be performed
+by a certain process, or not by a certain process.
+
+While client code can forget about the rest of the design, people
+configuring a production system using zc.async need to know a bit more.
+Calls are handled by workers, each of which is a persistent object
+matched with a software instance identified by a UUID [#uuid]_. A
+worker is responsible for keeping track of current jobs and for
+implementing a policy for selecting jobs from the main datamanager
+queues. A worker's software instance may simultaneously perform other
+tasks (such as handle standard web requests). Each worker is
+responsible for claiming and performing calls in its main thread or
+additional threads. To have multiple workers on the same queue of
+tasks, share the database with ZEO. Workers are driven by `engines`,
+non-persistent objects that are alive only for a single process, and
+that are responsible for providing the heart-beat for a matched worker.
+
Set Up
======
More information about the Checkins
mailing list