[ZCM] [ZC] 1401/25 Comment "SystemError in SessionDataManager"

Collector: Zope Bugs, Features, and Patches ... zope-coders-admin at zope.org
Mon Aug 30 18:39:03 EDT 2004


Issue #1401 Update (Comment) "SystemError in SessionDataManager"
 Status Accepted, Zope/bug critical
To followup, visit:
  http://zope.org/Collectors/Zope/1401

==============================================================
= Comment - Entry #25 by tim_one on Aug 30, 2004 6:39 pm

Maybe this is relevant too:
    http://collector.zope.org/Zope/789

It could be useful for someone who's seeing failures to build a Zope from current Zope-2_7-branch.  The ZODB you pick up from there now will raise (the new) ConnectionStateError if someone tries to close a connection with pending modifications.  As explained in the links in Issue 789, horrible problems of all kinds can occur if someone does that, including data "changing by magic" on you(*).

Note too that Transaction.begin() failed to abort the previous transaction if the only changes pending were in a subtransaction.  That bug is also fixed on current Zope-2_7-branch.  But across a connection close/reopen, begin() is still in general an incorrect way to clear a previous transaction.

(*) Especially for ChrisM:  Say some transaction changes a BTree bucket in a subtransaction, and then the connection is closed and reopened without a main transaction commit() or abort().  The subtransaction changes are still in the connection's cache then, but closing the connection destroys the temporary storage they appear in:  the changes are *only* in cache.  Time passes, and sooner or later the cache starts ghostifying objects that haven't been used "recently".  Until that point, the changes made by the abandoned subtxn are still visible.  After that point, if a ghostified object needs to be materialized again, there's no trace of the subtransaction changes anymore, and so the data is fetched from the real storage again.  This can cause a POSKeyError (if the subtransaction created this object), or can fetch old data (as it existed before the subtransaction changed it).

In Zope-2_7-branch this scenario is cut off, because trying to close the connection will fail.
________________________________________
= Comment - Entry #24 by dunny on Aug 30, 2004 6:10 pm

I've been able to recreate many of the problems that have been reported with sessions. 

A lot of the problems seem to be resolved with a patch to Publisher. This patch ensures that all failed transactions are explicitly aborted by the current transaction. (Rather than implicitly from within begin of the next transaction.) See

http://mail.zope.org/pipermail/zope-dev/attachments/20040825/e5d7a8e9/publisher-abort.obj

This is the same that I posted to bug 789 (http://collector.zope.org/Zope/789)

I'm still seeing a few other errors with transience and testing out various fixes for those too.
________________________________________
= Comment - Entry #23 by corvus on Aug 30, 2004 11:59 am

We've been getting these errors recently as well.  Once they started, the errors show up for anyone accessing the site.  Restarting the server or clearing the session data container (by resetting the timeout values) caused the errors to stop.  32 hours later they started again.  It's only one data point, but I think the interval may be significant.

________________________________________
= Comment - Entry #22 by bill_ on Aug 26, 2004 1:00 pm

FYI, I'm still getting this error even after increasing my maximum-number-of-session-objects from 1k to 10k.  I guess I'm going to try the patch mentioned in comment #11 now.
________________________________________
= Comment - Entry #21 by jdoyon on Aug 24, 2004 11:36 am

I may be able to help a little bit ...

I make use of PTS WITHOUT Plone ...

I'm currently testing Zope 2.7.2 CMF 1.4.7 PTS 1.0rc8 (And other products), as an upgrade to my currently running fine Zope 2.7.0 CMF 1.4.2 PTS 1.0fork-rc7 (And other products). (Well I still have that memory leak, but now I just restart Zope every night :( )

Seems to be working fine, though I haven't load tested it.

I did run into problems when I first implemented PTS, which I solved by increasing the value of maximum-number-of-session-objects to 2000 from the 1000 default.  This particular fix was for errors to that seemed to be load driven.  Apparently PTS' use of SESSION was so intense that when the site got busy it would reach the limit.

I'm curious as to whether I will suffer the same bug you guys have. I can always go back to 2.7.0 as well and apply the hotfixes instead.

I'll go do some load testing on it just for fun ...

Thx
________________________________________
= Comment - Entry #20 by morphex on Aug 13, 2004 5:00 am

Here's another traceback related to transcience:

Traceback (most recent call last):
  File "/home/shared/zope270/lib/python/ZPublisher/Publish.py", line 100, in publish
    request, bind=1)
  File "/home/shared/zope270/lib/python/ZPublisher/mapply.py", line 88, in mapply
    if debug is not None: return debug(object,args,context)
  File "/home/shared/zope270/lib/python/ZPublisher/Publish.py", line 40, in call_object
    result=apply(object,args) # Type s<cr> to step into published object.
  File "/home/shared/zope270/instance/Products/CMFCore/PortalContent.py", line 116, in __call__
    return view()
  File "/home/shared/zope270/lib/python/Shared/DC/Scripts/Bindings.py", line 306, in __call__
    return self._bindAndExec(args, kw, None)
  File "/home/shared/zope270/lib/python/Shared/DC/Scripts/Bindings.py", line 343, in _bindAndExec
    return self._exec(bound_data, args, kw)
  File "/home/shared/zope270/instance/Products/CMFCore/FSPageTemplate.py", line 191, in _exec
    result = self.pt_render(extra_context=bound_names)
  File "/home/shared/zope270/instance/Products/CMFCore/FSPageTemplate.py", line 140, in pt_render
    raise RuntimeError, msg
RuntimeError: FS Page Template document_view has errors: exceptions.RuntimeError.<br>&lt;pre&gt;the bucket being iterated changed size&lt;/pre&gt;
________________________________________
= Comment - Entry #19 by philps on Aug 9, 2004 3:18 pm

The patch didn't really work for us... we're running Zope 2.7.1 and Plone 2.0.3 with Python 2.3.3 on Gentoo Linux (2.4.20-gentoo-r7). Now there are AssertionErrors all over the place.
Here's a traceback.
------------------
    *  Module ZPublisher.Publish, line 101, in publish
    * Module ZPublisher.mapply, line 88, in mapply
    * Module ZPublisher.Publish, line 39, in call_object
    * Module Shared.DC.Scripts.Bindings, line 306, in __call__
    * Module Shared.DC.Scripts.Bindings, line 343, in _bindAndExec
    * Module Products.PageTemplates.ZopePageTemplate, line 222, in _exec
    * Module Products.PageTemplates.PageTemplate, line 96, in pt_render
      <ZopePageTemplate at /portal_unesco/master>
    * Module TAL.TALInterpreter, line 189, in __call__
    * Module TAL.TALInterpreter, line 233, in interpret
    * Module TAL.TALInterpreter, line 642, in do_defineMacro
    * Module TAL.TALInterpreter, line 233, in interpret
    * Module TAL.TALInterpreter, line 663, in do_useMacro
    * Module TAL.TALInterpreter, line 233, in interpret
    * Module TAL.TALInterpreter, line 552, in do_insertTranslation
    * Module TAL.TALInterpreter, line 615, in translate
    * Module Products.PageTemplates.TALES, line 264, in translate
    * Module Products.PlacelessTranslationService.PlacelessTranslationService, line 113, in translate
    * Module Products.PlacelessTranslationService.PlacelessTranslationService, line 504, in translate
    * Module Products.PlacelessTranslationService.PlacelessTranslationService, line 416, in getCatalogsForTranslation
    * Module Products.PlacelessTranslationService.PlacelessTranslationService, line 541, in negotiate_language
    * Module Products.PlacelessTranslationService.Negotiator, line 261, in negotiate
    * Module Products.PlacelessTranslationService.Negotiator, line 266, in _negotiate
    * Module Products.PlacelessTranslationService.Negotiator, line 54, in getLangPrefs
    * Module Products.PlacelessTranslationService.Negotiator, line 172, in getAccepted
    * Module ZPublisher.HTTPRequest, line 1218, in __getattr__
    * Module ZPublisher.HTTPRequest, line 1178, in get
    * Module Products.Sessions.SessionDataManager, line 94, in getSessionData
    * Module Products.Sessions.SessionDataManager, line 181, in _getSessionDataObject
    * Module Products.Transience.Transience, line 765, in new_or_existing
    * Module Products.Transience.Transience, line 476, in get
    * Module Products.Transience.Transience, line 280, in _move_item
    * Module Products.Transience.Transience, line 691, in _gc
--------------------

Plus, the KeyErrors are back. We'll downgrade to 2.7.0 as soon as possible.
Did anybody experience these session issues on v2.7.0?
________________________________________
= Comment - Entry #18 by ajung on Aug 9, 2004 2:35 pm

Looks like this bug might justify a 2.7.3 release if the patch should really work.
________________________________________
= Comment - Entry #17 by philps on Aug 9, 2004 2:00 pm

We were facing the same problems described here previously in a v2.7.1 production server.  It has been patched 40 minutes ago, and the SessionDataManager errors have apparently gone away. Anyway, it's a bit too soon to say anything, but we'll keep watching the server for the rest of the day.
________________________________________
= Comment - Entry #16 by mcdonc on Aug 9, 2004 9:42 am

Thanks very much for the report!

This appears to be caused by an improper result coming from a BTrees range search, which is something that I'll need some help with from other Zope folks.

The bit that fails is basically:

to_finalize = list(self._data.keys(start_finalize, max_ts))
for key in finalize:
    assert(start_finalize <= key <= max_ts)

Where "self._data" is an IOBTree and "start_finalize" and "max_ts" are invariably integers.

________________________________________
= Comment - Entry #15 by Fred on Aug 9, 2004 9:34 am

After applying the patch mentioned below my Zope/Plone instance ran OK for a day and then starting taking AssertionErrors on every page access, bottoming out here:

  File "/usr/local/Zope-2.7.2/lib/python/Products/Transience/Transience.py", line 592, in _finalize
    assert(start_finalize <= key <= max_ts)
  AssertionError

This is happening on a production system, so I'm rolling back to Zope 2.7.0 immediately.

________________________________________
= Comment - Entry #14 by Fred on Aug 7, 2004 11:17 am

I started getting this SystemError exception when I upgraded to Zope 2.7.2 and Plone 2.0.3.  I've just now applied the patch from chrism.  Restarting Zserver clears the problem even without the patch, so I'll let this soak for a day or so and report back if I still have trouble.

________________________________________
= Comment - Entry #13 by dhart on Aug 6, 2004 7:24 pm

Sorry.. have applied patch and will report on results.
________________________________________
= Comment - Entry #12 by dhart on Aug 6, 2004 7:14 pm

Zope (2.7.2 compiled from source, python 2.3.4, linux2, ZServer/1.1 Plone/2.0.3) is failing in the same way on my servers (it did not fail with Zope 2.7.1). I cannot reliably reproduce the error, but running 10 lightly used Plone instances from one Zope instance fails every 2-3 days. I expect that repoducing involves complex use of multiple sessions.
________________________________________
= Comment - Entry #11 by mcdonc on Aug 4, 2004 10:49 pm

Sorry, that's http://mail.zope.org/pipermail/zope-checkins/2004-August/028135.html .
________________________________________
= Accept - Entry #10 by mcdonc on Aug 4, 2004 10:21 pm

 Status: Deferred => Accepted

 Supporters added: mcdonc

Dieter provides what might be a clue, and I checked in a change he suggested to the 2.7 branch and HEAD.  See http://mail.zope.org/pipermail/zope-checkins/2004-August/028135.htmlfor the patch.  If the folks who are getting these errors could try to apply this patch to their running Zopes and report if they continue to see these errors, it would be helpful. 
________________________________________
= Comment - Entry #9 by mcdonc on Aug 3, 2004 2:19 pm

> Proposal:
> Don't create a new session unless some is safing data in a session. A
>   operation like session = request.SESSION or request.SESSION.get('foo',
>   None) shouldn't create a new session

That's an interesting idea, although I think it may not quite work for all use cases.  In the meantime, however, there's not a lot of evidence that decreasing the sessioning load in this way would prevent the reported error from happening, so my request for people to bust out gdb when they see it still stands...
________________________________________
= Comment - Entry #8 by Tiran on Aug 3, 2004 1:33 pm

Plone is using PTS and PTS made very heavy use of SESSIONs because the author (me :|) didn't know that even the test for the existence of a session var did create a new session.

Proposal:
Don't create a new session unless some is safing data in a session. A operation like session = request.SESSION or request.SESSION.get('foo', None) shouldn't create a new session
________________________________________
= Comment - Entry #7 by mcdonc on Aug 3, 2004 12:30 pm

Thanks for the report(s).

I have heard from four people, all of whom use Plone, that they have this problem.  I haven't been able to replicate it, but I also haven't used Plone in a while.  FWIW, no non-Plone users have reported this error.

The SystemError itself is indicative of a bad piece of C code somewhere (although it's not clear where).

The way this will get fixed is for someone to outline a repeatable set of configuration and usage steps for the systemerror to be reproduced, and then research the cause of the systemerror (maybe using gdb).  I may be able to do this, but it likely won't be soon due to other committments, so help is appreciated in the meantime.
________________________________________
= Comment - Entry #6 by hrma on Aug 3, 2004 10:58 am

I couldn't attach my event log (Insufficient Privs) so I placed it in my folder.  It can be found here:

http://zope.org/Members/hrma/event.zip


________________________________________
= Comment - Entry #5 by hrma on Aug 3, 2004 10:46 am

Hey folks...  I'm having the same issue after upgrading to Zope 2.7.2.  I'm running Python 2.3.3, Zope 2.7.2, Plone 2.0.3 frontended by Apache2 on a Windows 2000 server.  There are 2 zope instances running on this server.  I seem to be having this problem (so far) with only one of them.  Here's the traceback I'm getting:

Traceback (innermost last):
  Module ZPublisher.Publish, line 101, in publish
  Module ZPublisher.mapply, line 88, in mapply
  Module ZPublisher.Publish, line 39, in call_object
  Module Products.CMFCore.PortalContent, line 116, in __call__
  Module Shared.DC.Scripts.Bindings, line 306, in __call__
  Module Shared.DC.Scripts.Bindings, line 343, in _bindAndExec
  Module Products.CMFCore.FSPageTemplate, line 191, in _exec
  Module Products.CMFCore.FSPageTemplate, line 124, in pt_render
  Module Products.PageTemplates.PageTemplate, line 96, in pt_render
   - <FSPageTemplate at /www_hrma_org/document_view used for /www_hrma_org/index_html>
  Module TAL.TALInterpreter, line 189, in __call__
  Module TAL.TALInterpreter, line 233, in interpret
  Module TAL.TALInterpreter, line 670, in do_useMacro
  Module TAL.TALInterpreter, line 233, in interpret
  Module TAL.TALInterpreter, line 408, in do_optTag_tal
  Module TAL.TALInterpreter, line 393, in do_optTag
  Module TAL.TALInterpreter, line 388, in no_tag
  Module TAL.TALInterpreter, line 233, in interpret
  Module TAL.TALInterpreter, line 636, in do_condition
  Module TAL.TALInterpreter, line 233, in interpret
  Module TAL.TALInterpreter, line 559, in do_insertTranslation
  Module TAL.TALInterpreter, line 622, in translate
  Module Products.PageTemplates.TALES, line 264, in translate
  Module Products.PlacelessTranslationService.PlacelessTranslationService, line 113, in translate
  Module Products.PlacelessTranslationService.PlacelessTranslationService, line 504, in translate
  Module Products.PlacelessTranslationService.PlacelessTranslationService, line 416, in getCatalogsForTranslation
  Module Products.PlacelessTranslationService.PlacelessTranslationService, line 541, in negotiate_language
  Module Products.PlacelessTranslationService.Negotiator, line 261, in negotiate
  Module Products.PlacelessTranslationService.Negotiator, line 266, in _negotiate
  Module Products.PlacelessTranslationService.Negotiator, line 54, in getLangPrefs
  Module Products.PlacelessTranslationService.Negotiator, line 172, in getAccepted
  Module ZPublisher.HTTPRequest, line 1218, in __getattr__
  Module ZPublisher.HTTPRequest, line 1178, in get
  Module Products.Sessions.SessionDataManager, line 94, in getSessionData
  Module Products.Sessions.SessionDataManager, line 183, in _getSessionDataObject
SystemError: error return without exception set



I am also seeing ZODB conflict errors in the event log.  I get KeyErrors all over the place, and when I try to access the Session Data container within /temp_folder I get:

Traceback (innermost last):
  Module ZPublisher.Publish, line 101, in publish
  Module ZPublisher.mapply, line 88, in mapply
  Module ZPublisher.Publish, line 39, in call_object
  Module App.special_dtml, line 62, in __call__
  Module DocumentTemplate.DT_String, line 474, in __call__
  Module Products.Transience.Transience, line 838, in nudge
  Module Products.Transience.Transience, line 694, in _gc
KeyError: 1091222580

I've attached a zipped snapshot of my event log detailing the numerous errors since my last instance startup on the 29th.

I hope all this is A) related and B) helps.  Let me know if I can be of any assistance.

Mick Drevyanko
mick at portofhamptonroads.com
________________________________________
= Defer - Entry #4 by chrisw on Jul 30, 2004 9:46 am

 Status: Pending => Deferred

If this can't be reproduced within a month, it'll get rejected as something specific to your setup.
________________________________________
= Comment - Entry #3 by morphex on Jul 21, 2004 6:19 am

Not sure if it can be reproduced, I'll try to create a test that reproduces the bug this week
________________________________________
= Comment - Entry #2 by ajung on Jul 20, 2004 12:35 am

Can this error be reproduced somehow?
________________________________________
= Request - Entry #1 by morphex on Jul 3, 2004 10:03 am

2004-07-03T09:00:04 ERROR(200) SiteError http://www.blogologue.com/index_html
Traceback (most recent call last):
  File "/home/shared/zope271/lib/python/ZPublisher/Publish.py", line 101, in publish
    request, bind=1)
  File "/home/shared/zope271/lib/python/ZPublisher/mapply.py", line 88, in mapply
    if debug is not None: return debug(object,args,context)
  File "/home/shared/zope271/lib/python/ZPublisher/Publish.py", line 39, in call_object
    result=apply(object,args) # Type s<cr> to step into published object.
  File "/home/shared/zope271/lib/python/Shared/DC/Scripts/Bindings.py", line 306, in __call__
    return self._bindAndExec(args, kw, None)
  File "/home/shared/zope271/lib/python/Shared/DC/Scripts/Bindings.py", line 343, in _bindAndExec
    return self._exec(bound_data, args, kw)
  File "/home/shared/zope271/lib/python/Products/PageTemplates/PageTemplateFile.py", line 106, in _exec
    return self.pt_render(extra_context=bound_names)
  File "/home/shared/zope271/lib/python/Products/PageTemplates/PageTemplate.py", line 96, in pt_render
    tal=not source, strictinsert=0)()
  File "/home/shared/zope271/lib/python/TAL/TALInterpreter.py", line 189, in __call__
    self.interpret(self.program)
  File "/home/shared/zope271/lib/python/TAL/TALInterpreter.py", line 233, in interpret
    handlers[opcode](self, args)
  File "/home/shared/zope271/lib/python/TAL/TALInterpreter.py", line 605, in do_loop_tal
    self.interpret(block)
  File "/home/shared/zope271/lib/python/TAL/TALInterpreter.py", line 233, in interpret
    handlers[opcode](self, args)
  File "/home/shared/zope271/lib/python/TAL/TALInterpreter.py", line 561, in do_insertStructure_tal
    structure = self.engine.evaluateStructure(expr)
  File "/home/shared/zope271/lib/python/Products/PageTemplates/TALES.py", line 221, in evaluate
    return expression(self)
  File "/home/shared/zope271/lib/python/Products/PageTemplates/ZRPythonExpr.py", line 47, in __call__
    return eval(code, g, {})
  File "Python expression "here.render_issue_contents(item['issue'])"", line 1, in <expression>
  File "/home/shared/zope271/instance/Products/IssueDealerWeblog/local_weblog_publisher.py", line 249, in render_issue_contents
    return issue.render_contents_weblog()
  File "/home/shared/zope271/instance/Products/IssueDealer/issue.py", line 485, in render_contents_weblog
    return self.render_contents()
  File "/home/shared/zope271/instance/Products/IssueDealer/issue.py", line 457, in render_contents
    return self.render_text_as_html()
  File "/home/shared/zope271/instance/Products/IssueDealer/issue.py", line 200, in render_text_as_html
    size = self.get_user_preferences().issue_contents_size
  File "/home/shared/zope271/instance/Products/IssueDealer/base.py", line 169, in get_user_preferences
    return self.get_issue_dealer()[self.REQUEST.SESSION['preferences']]
  File "/home/shared/zope271/lib/python/ZPublisher/HTTPRequest.py", line 1218, in __getattr__
    v = self.get(key, default, returnTaints=returnTaints)
  File "/home/shared/zope271/lib/python/ZPublisher/HTTPRequest.py", line 1178, in get
    if callable(v): v = v()
  File "/home/shared/zope271/lib/python/Products/Sessions/SessionDataManager.py", line 94, in getSessionData
    return self._getSessionDataObject(key)
  File "/home/shared/zope271/lib/python/Products/Sessions/SessionDataManager.py", line 183, in _getSessionDataObject
    if getattr(ob, '__of__', None) and getattr(ob, 'aq_parent', None):
SystemError: error return without exception set

==============================================================



More information about the Zope-Collector-Monitor mailing list