Leaking Oracle connections & processes
Hi guys, We are trying to build a site with Zope and Oracle, and we see again and again a situation where Oracle connections and processes are being leaked: the more we work, the more there are, until we reach Oracle's limit for number of processes. At this point, the Connection objects close on their own, and you can't even open SQL*Plus until Zope is closed or restarted. We use Oracle's default limit on the number of processes, which is 50. We have about three users (developers), two Oracle connections in the application (one for SqlSessions and one for other data), and although we use some devious multithreading (some of our transactions call an external method which essentially URLRetrieve()s a URL from our own Zope server in a separate thread), 50 open connections should be more than enough. I have checked a little before the disaster, and it seems that not all those processes actually have connections; the problem may be unrelated to open connections, but just to processes. We do one fishy thing that may also be related: We keep some long raw data; we keep it all in one table, and other tables which need long columns hold pointers (indices) into this table. To put data into this table from Zope, we use an external method SaveLONG, which uses the database connection to access the DB (in order to use the transactionality of the whole thing). We use it in ZSQL methods as follows: #################################################################### INSERT INTO items ( field, long_field_ptr ) VALUES ( <dtml-sqlvar value type=string> <dtml-var "SaveLONG(DB_CONNECTION, long_value)"> ) #################################################################### SaveLONG saves the long value in the long data table and returns its index in the table. The source for SaveLONG is this. It relies on a DB trigger to populate the LONG_ID field from a sequence. #################################################################### long_table_name = 'LONG_DATA' long_column_name = 'LONG_DATA' long_seq_name = 'SEQ_LONG_DATA' index_column_name = 'LONG_ID' save_command = ''' INSERT INTO %s (%s) VALUES (:content) ''' % (long_table_name,long_column_name) get_key_command = 'SELECT %s.currval FROM dual' % long_seq_name # # Put a long raw on the DB # def SaveLONG(self,connection, content): connection = connection._v_database_connection.db # the real DB connection object connection.execute(save_command, content=dbi.dbiRaw(content)) connection.execute(get_key_command) return connection.fetchone()[0] #################################################################### We're using Zope 2.2.4 with ZOracleDA 2.1.0 and DCOracle 1.3.2 with Oracle 8.0.5 on an intel-linux (RH). Can anyone see anything wrong with what we're doing? Is anyone else experiencing this kind of problems? Thanks in advance, Shai
Shai Berger wrote:
I have checked a little before the disaster, and it seems that not all those processes actually have connections; the problem may be unrelated to open connections, but just to processes.
A little progress: We just had it happen again, and checked to see how many live processes and how many zombies were involved. pstree showed 108 Zope-spawned Oracle processes, but ps showed only 55 zombies. I assume, therefore, that the problem is not because of zombies; there were over 50 live processes accessing oracle. The accumulation of zombies is a problem in itself, of course, but I understand it is also a known problem. And from what I see, it is a separate problem. Our impression here -- based on not much more than gut feeling -- is that the problem is related to abandoned transactions, where an exception is raised. Thanks, Dieter, for your earlier response -- we are trying a larger cache now, and we'll see how things go. Have fun, Shai.
Shai Berger writes:
... excess Oracle processes and zombies ... I have checked our installation and can confirm that both problems have gone.
We do not yet know what caused the excess Oracle processes. After an increase of Zope's ZODB cache and an upgrade to the newest DCOracle, the problem simply went away. We do know what caused our zombie processes: a bug in the Oracle Java runtime library: they had added a "NOWAIT" flag to their "wait" calls. This ensured that a "wait" call does not release the Zombie child but keeps it in its state. As a consequence, each process started from Java became a Zombie process. As Oracle reported, the problem was introduced in 8.1.6 and should be fixed in 8.1.7. Dieter
Dieter Maurer wrote:
Shai Berger writes:
... excess Oracle processes and zombies ... I have checked our installation and can confirm that both problems have gone.
We do not yet know what caused the excess Oracle processes. After an increase of Zope's ZODB cache and an upgrade to the newest DCOracle, the problem simply went away.
Do you have a DCOracle version later than 1.3.2? We use 1.3.2; we have increased the cache from 400 (the default) to 1000, to no avail. Our ZODB size, last I checked, was ~40M. I remember you mentioning that yours was less then 10M, I think; how much cache do you use?
We do know what caused our zombie processes: a bug in the Oracle Java runtime library: they had added a "NOWAIT" flag to their "wait" calls. This ensured that a "wait" call does not release the Zombie child but keeps it in its state. As a consequence, each process started from Java became a Zombie process.
Zope starts Oracle processes from Java??? All the zombies we see are Zope-spawned... a Zope zombie-collection problem has been reported independently of Oracle.
As Oracle reported, the problem was introduced in 8.1.6 and should be fixed in 8.1.7.
...which confuses me, because we use 8.0.5. Thanks, Shai.
Shai Berger writes:
Do you have a DCOracle version later than 1.3.2? No, we use DCOracle 1.3.2
We use 1.3.2; we have increased the cache from 400 (the default) to 1000, to no avail. Our ZODB size, last I checked, was ~40M. I remember you mentioning that yours was less then 10M, I think; how much cache do you use? I have to check at work, but I think it was about 10 % of the objects in the ZODB with the time set to 30 minutes.
We do know what caused our zombie processes: a bug in the Oracle Java runtime library: ... Zope starts Oracle processes from Java??? No, it does not. Apparently, your Zombie processes have a different cause then ours. Our Zombies have not been Zope children but Oracle children, spawned by Oracle stored procedures implemented in Java.
Sorry, if I responded to a message in a thread without a clear understanding of the basic problem. Dieter
participants (2)
-
Dieter Maurer -
Shai Berger