[ZODB-Dev] Debugging RelStorage hang-ups

Thu Oct 20 08:41:45 UTC 2011

On 10/20/2011 05:41 AM, Martijn Pieters wrote:
> On a test server with a Plone 4.1 upgrade of a client setup, we are
> experience regular lock-ups of the 2 instances we run. After a
> restart, after a few hours at least one of the instances will be
> waiting on Oracle to roll back:
>
>    File "/srv/test-plone4/eggs/RelStorage-1.5.0-py2.6.egg/relstorage/storage.py",
> line 1228, in poll_invalidations
>      changes, new_polled_tid = self._restart_load_and_poll()
>    File "/srv/test-plone4/eggs/RelStorage-1.5.0-py2.6.egg/relstorage/storage.py",
> line 1202, in _restart_load_and_poll
>      self._adapter.poller.poll_invalidations, prev, ignore_tid)
>    File "/srv/test-plone4/eggs/RelStorage-1.5.0-py2.6.egg/relstorage/storage.py",
> line 254, in _restart_load_and_call
>      self._load_conn, self._load_cursor)
>    File "/srv/test-plone4/eggs/RelStorage-1.5.0-py2.6.egg/relstorage/adapters/oracle.py",
> line 322, in restart_load
>      conn.rollback()
>
> I am a bit at a loss at where to start debugging this. Any hints from anyone?

Some ideas:

- Is Oracle running out of space somewhere, such as the undo/redo logs?

- Do rollbacks in Oracle acquire some kind of lock?

- Could RAC be the culprit?  (Synchronous replication always has weird 
edge cases.)

Shane