External methods & persistance and thread safety
I have a Zope that is being used for content development. Much of the content is maintained in the local file system (e.g. CVS, temporary files, and the like). Some of the operations can take a bit of time. And I have a number of folks banging on the system at the same time. Much of the heavy lifiting is done by python external methods and calls to systems programs through the os.system() method. It all works swimmingly but for a couple of instances of anomolous behavior--the wrong file getting written, for example. It could be a program error, but I don't think so. I'm wondering if I am running afoul of some persistance or threading problem. (And to make things suitably complex, this is all running on a dual processor machine.) Any thoughts? -d
On Wed, Jun 04, 2003 at 12:51:39AM -0700, Dennis Allison wrote:
I have a Zope that is being used for content development. Much of the content is maintained in the local file system (e.g. CVS, temporary files, and the like). Some of the operations can take a bit of time. And I have a number of folks banging on the system at the same time.
Much of the heavy lifiting is done by python external methods and calls to systems programs through the os.system() method. It all works swimmingly but for a couple of instances of anomolous behavior--the wrong file getting written, for example. It could be a program error, but I don't think so. I'm wondering if I am running afoul of some persistance or threading problem.
I'm no thread guru, but that certainly sounds like a threading issue. Hard to say without knowing your external method code, what system programs you run, what arguments you give them... Can you give us some idea of what these external methods do?
(And to make things suitably complex, this is all running on a dual processor machine.)
i dunno if that matters - python never uses more than one CPU at a time due to the global interpreter lock. OTOH, if you had two zope instances using zeo... might be an issue. -- Paul Winkler http://www.slinkp.com Look! Up in the sky! It's POOPER RADIOACTIVE SHOCK OIL BARON! (random hero from isometric.spaceninja.com)
Paul, I'm not doing anything particularly fancy--I have a number of users, each with a sandbox workspace in the local file system, not exposed in Zope. I manage through the web editing of the sandbox along with validation of the XML that's input, a bit of spell checking, strange character checking, and the like. Everything works out of the user's private space. Inputs are from CVS, commits are to CVS. Most of the heavy lifting is done in External Methods rather than a Product or by firing off a process and waiting for it to complete. Given what I know about Zope scheduling (one thread at a time, GIU, etc) I don't think it's threading--but it could be, particularly if there's something strange in the SMP side of things. Any thought on how to track down this heissenbug? I've seen the effects twice, but it always could be ascribed to user error. The predecessor system use ParsedXML and ran on a single processor system and occasionally had similar problems which never got tracked down to a root cause. On Wed, 4 Jun 2003, Paul Winkler wrote:
On Wed, Jun 04, 2003 at 12:51:39AM -0700, Dennis Allison wrote:
I have a Zope that is being used for content development. Much of the content is maintained in the local file system (e.g. CVS, temporary files, and the like). Some of the operations can take a bit of time. And I have a number of folks banging on the system at the same time.
Much of the heavy lifiting is done by python external methods and calls to systems programs through the os.system() method. It all works swimmingly but for a couple of instances of anomolous behavior--the wrong file getting written, for example. It could be a program error, but I don't think so. I'm wondering if I am running afoul of some persistance or threading problem.
I'm no thread guru, but that certainly sounds like a threading issue. Hard to say without knowing your external method code, what system programs you run, what arguments you give them... Can you give us some idea of what these external methods do?
(And to make things suitably complex, this is all running on a dual processor machine.)
i dunno if that matters - python never uses more than one CPU at a time due to the global interpreter lock. OTOH, if you had two zope instances using zeo... might be an issue.
--
Paul Winkler http://www.slinkp.com Look! Up in the sky! It's POOPER RADIOACTIVE SHOCK OIL BARON! (random hero from isometric.spaceninja.com)
Dennis Allison wrote at 2003-6-4 00:51 -0700:
I have a Zope that is being used for content development. Much of the content is maintained in the local file system (e.g. CVS, temporary files, and the like). Some of the operations can take a bit of time. And I have a number of folks banging on the system at the same time.
Much of the heavy lifiting is done by python external methods and calls to systems programs through the os.system() method. It all works swimmingly but for a couple of instances of anomolous behavior--the wrong file getting written, for example. It could be a program error, but I don't think so. I'm wondering if I am running afoul of some persistance or threading problem. (And to make things suitably complex, this is all running on a dual processor machine.)
Any thoughts?
A question best asked to an oracle... Apparently, you have a deep problem which occurs non-deterministically. Your problem description is very shallow. Only an oracle (or other mythical being) can provide hints... Dieter
Dieter, you and the others on the list are the oracle..... And you are right, the problem description is shallow--but that's because I see only the effect--a file corrupted with the wrong data in the managed CVS. I suspect the problem lies in the persistance mechanism. External methods do inherit from the right classes for persistance, but I'm not sure I have the right song & dance for mutable lists and dictionary to guarantee persistance. If I don't, that may be the source of my problem. Beyond careful inspection of the code, do you have any suggestions. On Wed, 4 Jun 2003, Dieter Maurer wrote:
Dennis Allison wrote at 2003-6-4 00:51 -0700:
I have a Zope that is being used for content development. Much of the content is maintained in the local file system (e.g. CVS, temporary files, and the like). Some of the operations can take a bit of time. And I have a number of folks banging on the system at the same time.
Much of the heavy lifiting is done by python external methods and calls to systems programs through the os.system() method. It all works swimmingly but for a couple of instances of anomolous behavior--the wrong file getting written, for example. It could be a program error, but I don't think so. I'm wondering if I am running afoul of some persistance or threading problem. (And to make things suitably complex, this is all running on a dual processor machine.)
Any thoughts?
A question best asked to an oracle...
Apparently, you have a deep problem which occurs non-deterministically. Your problem description is very shallow. Only an oracle (or other mythical being) can provide hints...
Dieter
On Wed, Jun 04, 2003 at 12:03:12PM -0700, Dennis Allison wrote:
Dieter, you and the others on the list are the oracle.....
And you are right, the problem description is shallow--but that's because I see only the effect--a file corrupted with the wrong data in the managed CVS. I suspect the problem lies in the persistance mechanism. External methods do inherit from the right classes for persistance,
... for storing the External Method object itself in the ZODB.
but I'm not sure I have the right song & dance for mutable lists and dictionary to guarantee persistance.
ok, now I'm really confused. What does ZODB persistence have to do with checking stuff in & out of CVS? What are these lists & dictionaries, and why do you want them to be persistent? How are you trying to make them persistent?
If I don't, that may be the source of my problem. Beyond careful inspection of the code, do you have any suggestions.
Not enough information. -- Paul Winkler http://www.slinkp.com Look! Up in the sky! It's AGENT AMOEBA! (random hero from isometric.spaceninja.com)
Sorry if I was confusing. Basically we have a through-the-web editor for XML and other things which works with forms. Users work within a pre-defined structure (an outline) and create and edit portions of the larger document. Every user has their own sandbox where they do their document development. Document content is either created by a template or checked out from CVS by the user. The user can also commit their workproduct to CVS. These transactions are managed by External Methods. The XML may be viewed, validated, spellchecked, and used to drive some additonal process, the results of which may be viewed through Zope. All these functions are managed through External Methods. The problem I am trying to track down is subtle. Sometimes (twice in three months) a file has been overwritten by another file--these are files in the local file system. The logs all look right, but data has been corrupted. I'm trying to understand where to look for the problem--in the past such Heissenbugs have been tied to problems in the concurrency of the system--so I am looking there first. On Wed, 4 Jun 2003, Paul Winkler wrote:
On Wed, Jun 04, 2003 at 12:03:12PM -0700, Dennis Allison wrote:
Dieter, you and the others on the list are the oracle.....
And you are right, the problem description is shallow--but that's because I see only the effect--a file corrupted with the wrong data in the managed CVS. I suspect the problem lies in the persistance mechanism. External methods do inherit from the right classes for persistance,
... for storing the External Method object itself in the ZODB.
but I'm not sure I have the right song & dance for mutable lists and dictionary to guarantee persistance.
ok, now I'm really confused.
What does ZODB persistence have to do with checking stuff in & out of CVS?
What are these lists & dictionaries, and why do you want them to be persistent?
How are you trying to make them persistent?
If I don't, that may be the source of my problem. Beyond careful inspection of the code, do you have any suggestions.
Not enough information.
--
Paul Winkler http://www.slinkp.com Look! Up in the sky! It's AGENT AMOEBA! (random hero from isometric.spaceninja.com)
On Wed, Jun 04, 2003 at 01:08:06PM -0700, Dennis Allison wrote:
Sorry if I was confusing.
(snip) OK, I have a better general sense of what kind of app you have, but this is still much too general. For example, I still don't know the answers to my previous questions:
What are these lists & dictionaries, and why do you want them to be persistent?
How are you trying to make them persistent?
-- Paul Winkler http://www.slinkp.com Look! Up in the sky! It's PHALLIC SINFUL CONSTABLE! (random hero from isometric.spaceninja.com)
Dennis Allison wrote at 2003-6-4 13:08 -0700:
... Document content is either created by a template or checked out from CVS by the user. The user can also commit their workproduct to CVS. These transactions are managed by External Methods.
Each user has his own CVS working directory, right? Where do you make the "chdir" to the working directory? In the External Method or in the external process? When you do it in the External Method, you get non-deterministic behaviour: The working directory is a process resource shared by all threads. When the working directory is modified concurrently in different thread, the resulting working directory in undefined. Dieter
The necessary directory changes are managed by passing a compound command to the external system, e.g. os.system( 'cd some/path; cvs checkout foo' ) as I was aware of the directory problem. All file paths are full absolute paths. On Thu, 5 Jun 2003, Dieter Maurer wrote:
Dennis Allison wrote at 2003-6-4 13:08 -0700:
... Document content is either created by a template or checked out from CVS by the user. The user can also commit their workproduct to CVS. These transactions are managed by External Methods.
Each user has his own CVS working directory, right?
Where do you make the "chdir" to the working directory? In the External Method or in the external process?
When you do it in the External Method, you get non-deterministic behaviour:
The working directory is a process resource shared by all threads. When the working directory is modified concurrently in different thread, the resulting working directory in undefined.
Dieter
Dennis Allison wrote:
Much of the heavy lifiting is done by python external methods and calls to systems programs through the os.system() method. It all works swimmingly but for a couple of instances of anomolous behavior--the wrong file getting written, for example.
Maybe your application suffers from ZODB-Read/Write-Conflicts. Have a look into the error-log and search the archive for more info. In short: If an object is currently in a transaction and another transaction is started for this object, a conflict-error occurs. Zope/ZODB tries 3 times to get around... Your external methods with os-system-calls should handle this situation or you will encounter "wired" problems from time to time. It's no fun to leave the holy world of ZODB/Zope... Cheers, Maik
participants (4)
-
Dennis Allison -
Dieter Maurer -
Maik Jablonski -
Paul Winkler