[Zodb-checkins] CVS: Packages/StorageGC - CyclicGC.py:1.6 GCableGraph.py:1.7

tim@digicool.com tim@digicool.com
Sat, 21 Apr 2001 01:41:02 -0400 (EDT)


Update of /cvs-repository/Packages/StorageGC
In directory korak:/tmp/cvs-serv17853/storagegc

Modified Files:
	CyclicGC.py GCableGraph.py 
Log Message:
Added more docstrings, revised existing comments.
Created internal _TopologyError exception as a cleaner way to give up on
the current phase in the face of inconsistent info coming back from the
storage.
First stab at making .resume()'s 'effort' argument mean *something*.  For
now it means something silly:  the number of "virtual collection machine"
instructions to be executed before returning.  The code was restructured
to execute from a list of "opcodes", driven by a virtual instruction
counter.  I'm afraid that when you have to fake coroutines by hand, that's
what happens:  you end up inventing *some* sort of convoluted "state
machine", and this one is clean as such things go.
The "instructions" vary now in expense from trivial to potentially
unbounded (for example, building the entire transitive closure of the root
set is one "opcode").
Instead of counting # of instructions, 'effort' should measure the
expense of each opcode.  This is a dynamic thing, and finishing that
requires inventing subsidiary virtual machines for the potentially
unbounded-expense opcodes, so that they too can be suspended and resumed
from mid-stream.
Since that will make the code virtually impossible to understand without
major effort, I'm going to stop this part of the crusade now until we
better understand what's *needed* in practice.  Will learn how to use
unittest next and make a good test suite.



--- Updated File CyclicGC.py in package Packages/StorageGC --
--- CyclicGC.py	2001/04/20 07:18:24	1.5
+++ CyclicGC.py	2001/04/21 05:41:01	1.6
@@ -3,6 +3,8 @@
 # named "storage" to reflect its expected use.  storage must implement
 # the GCable interface, and objects must be hashable.
 
+import sys as _sys
+
 ###########################################################################
 # The GCable interface.
 # A storage implementing GCable implements the following methods.  "oid"
@@ -30,9 +32,6 @@
 # means they must not raise an exception when passed an oid that no longer
 # exists.  It's suggested, but not required, that .gcTrash() ignore a non-
 # existent oid, .gcRefcount() return 0, and .gcReferences() an empty list.
-# XXX We're skating on thin ice here, under the assumption that CycleGC
-# XXX *can* be made safe despite never being told about any topology
-# XXX changes.  It remains unclear whether that's actually possible.
 #
 # Alternatively, the storage could refrain from acting on .gcTrash()
 # callbacks (just queuing up the intermediate results) until CycleGC.done()
@@ -80,7 +79,7 @@
 #     return until analysis is complete (i.e., until .done() returns true).
 #     It's OK to call .resume() when .done() returns true, although there's
 #     no point to doing so.
-#     XXX TODO All effort values are treated as -1 right now.
+#     XXX TODO "effort" has a very crude meaning now:  # of internal stages
 #     XXX TODO Define what effort *means*; e.g., perhaps an approximation
 #     XXX      to the maximum number of .gcReferences() calls the storage
 #     XXX      wants to endure before regaining control?  milliseconds
@@ -130,6 +129,32 @@
 
 FALSE, TRUE = range(2)
 
+class _TopologyError(Exception):
+    """Internal exception raised when CycleGC detects changes in topology.
+
+    Since CycleGC runs in stages, the storage may add new references at any
+    time between stages.  But the storage doesn't notify a running CycleGC
+    instance of any changes.  As a result, CycleGC may get inconsistent
+    information across calls to GCable.gcReferences() and
+    GCable.gcRefcount(), and that in turn may cause CycleGC's own notion of
+    current refcounts to "go negative", or cause refcounts to hit 0 that
+    really aren't.
+
+    CycleGC detects just enough of this to "be safe", in the sense that it
+    won't call live things trash, but may miss some trash.
+
+    _TopologyError is raised and handled internally when bad things happen,
+    as a clean way for CycleGC to abort what it's doing and give up.
+    """
+
+    def __init__(self, msg):
+        Exception.__init__(self, msg)
+
+class _VM_is_done(Exception):
+    """Internal exception raised to halt the virtual collection machine."""
+
+    pass
+
 class CycleGC:
 
     def __init__(self, storage):
@@ -147,7 +172,26 @@
         # was still in progress.
         self._pending = []
 
+        # List of "virtual machine instructions" to execute, in order.
+        self._workflow = [self._build_transitive_closure,
+                          self._subtract_internal_refs,
+                          self._propagate_external_refs,
+                          self._extract_garbage,
+                          self._start_next_batch,
+                          self._stop_if_finished,
+                          self._back_to_stage0]
+
+        # The virtual machine's "instruction pointer", an index into
+        # self._workflow.
+        self._stage = 0
+
     def start(self, objs):
+        """Objs is a list of oids that *may* be in trash cycles.
+
+        .start() begins the process of analyzing the graph structure.  When
+        .start() returns, use .resume() repeatedly until .done() is true.
+        """
+
         if self._allobjds:
             # Work in progress -- add to pending instead.
             if self._pending:
@@ -156,34 +200,70 @@
                 self._pending = objs[:]
             return
 
-        # Build descriptors for the passed-in objects.
-        # Note that _get_initial_info() automatically weeds out duplicates.
-        assert len(self._obj2index) == 0
-        for x in objs:
-            self._get_initial_info(x)
+        self._build_descriptors(objs)
+        self._back_to_stage0()
 
     def resume(self, effort=-1):
-        if not self._allobjds:
+        """Resume analyzing the graph reachable from the .start() objects.
+
+        Optional arg 'effort' is an int.  If <= 0 (the default), don't
+        return until analysis is complete.  XXX Else ...
+        """
+
+        # print "Entering", effort, self._stage
+        if self.done():
             return
 
-        self._build_transitive_closure()
-        self._subtract_internal_refs()
-        if self._propagate_external_refs():
-            result = self._extract_garbage()
-            if result:
-                self._storage.gcTrash(result)
+        if effort <= 0:
+            worktodo = _sys.maxint
+        else:
+            worktodo = effort
+
+        try:
+            while worktodo > 0:
+                worktodo -= 1
+                # Bump the virtual program counter and dispatch on the next
+                # "opcode".
+                stage = self._stage
+                self._stage = stage + 1
+                self._workflow[stage]()
+
+        except _VM_is_done:
+            pass
+
+        except _TopologyError, e:
+            # Simply give up.
+            # Alternative:  Save a copy of the initial objs in .start(),
+            # and when we get here arrange to start over again.  I'm
+            # leery of trying that, though, because the storage *may*
+            # end up passing a "popular object" to .start(), to which
+            # new pointers get frequently created.  Then we'd keep
+            # detecting _TopologyErrors, and possibly end up trying to
+            # process that node forever more.
+            # So a better strategy here requires deeper cooperation between
+            # CycleGC and the storage than has been agreed to so far.
+
+            self._start_next_batch()
+            if not self.done():
+                self._back_to_stage0()
+                self.resume(effort <= 0 and -1 or worktodo)
 
-        # Start next batch of work (if any).
-        self._allobjds = []
-        self._obj2index.clear()
-        if self._pending:
-            temp = self._pending
-            self._pending = []
-            self.start(temp)
+        # print "Leaving", self._stage
 
     def done(self):
+        "Return true when the analysis is complete, false if more to do."
+
         return len(self._allobjds) == 0
 
+    def _build_descriptors(self, objs):
+        "Build descriptors for the passed-in objects."
+
+        assert len(self._allobjds) == 0
+        assert len(self._obj2index) == 0
+        for x in objs:
+            # Note that _get_initial_info() weeds out duplicates.
+            self._get_initial_info(x)
+
     def _get_initial_info(self, obj):
         """Build descriptor for obj.
 
@@ -260,10 +340,8 @@
                 # If an adjusted refcount goes negative, the topology of
                 # the subgraph has changed since we first captured the
                 # refcount info.  Just give up.
-                # XXX possible to do better?
-                return FALSE
-
-        return TRUE
+                raise _TopologyError("Adjusted refcount %d < 0 for OID %s"
+                                     % (adjrc, repr(d[OID])))
 
     def _increment_children(self, obj, mistakes):
         "Increment obj's immediate successors' refcounts by 1 each."
@@ -297,7 +375,7 @@
                 mistakes.append(d[OBJ])
 
     def _extract_garbage(self):
-        """Return list of all trash objects in the transitive closure.
+        """Find trash in the transitive closure and pass on to gcTrash().
 
         This consists of objects in trash cycles, and of objects reachable
         only from trash cycles.
@@ -309,19 +387,40 @@
                 continue
             obj = d[OBJ]
             result.append(obj)
-            if d[RC] != self._get_storage_rc(obj):
+            storagerc = self._get_storage_rc(obj)
+            if d[RC] != storagerc:
                 # Oops!  Abort.  The refcount info changed since we first
                 # captured it.  This may or may not still be trash, but we'd
                 # have to start over to say for sure.
-                # XXX Possible to do better?  We certainly can't assume it's
-                # XXX trash safely, the question is if we assume it's not
-                # XXX trash, and propagate that info to its kids (etc),
-                # XXX whether it's possible to still consider something
-                # XXX *else* to be trash by mistake. This gets hairy fast.
-                # XXXX It's certainly *safe* to give up.
-                result = []
-                break
-        return result
+                # We certainly can't safely assume it's trash, but we
+                # *could* safely assume it's reachable and propagate that
+                # info throughout the graph.  It's safer and easier to just
+                # give up.
+                raise _TopologyError("Cached refcount %d != storage "
+                                     "refcount $d for OID %s" %
+                                     (d[RC], storagerc, repr(obj)))
+
+        if result:
+            self._storage.gcTrash(result)
+
+    def _start_next_batch(self):
+        "Start next batch of work (if any)."
+
+        self._allobjds = []
+        self._obj2index.clear()
+        if self._pending:
+            temp = self._pending
+            self._pending = []
+            self._build_descriptors(temp)
+
+    def _stop_if_finished(self):
+        if self.done():
+            raise _VM_is_done
+
+    def _back_to_stage0(self):
+        "Restart the virtual machine."
+
+        self._stage = 0
 
     def _getkids(self, obj):
         "Return list of obj's immediate successors."

--- Updated File GCableGraph.py in package Packages/StorageGC --
--- GCableGraph.py	2001/04/19 21:21:49	1.6
+++ GCableGraph.py	2001/04/21 05:41:01	1.7
@@ -129,7 +129,7 @@
     g.clearTrash()
     gc.start(s)
     while not gc.done():
-        gc.resume()
+        gc.resume(1)
     trash = g.getTrash()
     print "    trash found:", trash
     alt_trash = g.altTrash(s)