[ZODB-Dev] B-Tree Concurrency Issue (_OOBTree.pyd segfaults)
Tim Peters
tim at zope.com
Thu Apr 21 18:08:28 EDT 2005
[Gfeller, Martin]
> ...
> You are correct that we're accessing one connection with multiple
> threads.
That would appear to explain it then.
> The reason is that we have many very long-running calculations which
> also use a large portion of objects in our DBs. The app is most
> efficient when we use almost all memory space as ZODB cache - i.e., ~1.5
> GB. If we multiply that per connection, we'd either go into massive
> trashing or out of address space.
>
> So I will try (i) to pinpoint the crash more closely (by using a debug
> build) and (ii) try to lock around our accesses (which is difficult, as
> loading a ghostified object is an access as well).
>
> In the longer run, I assume you're suggesting we should look for another
> way to access ZODB, or for another DBMS altogether (not to our liking)?
I don't know enough about your app to suggest something concrete here. All
I can tell you, based on what I know, is a good guess about why your current
approach segfaults.
Suggesting something concrete would require knowing a _great_ deal more.
For example, I don't know why you're using multiple threads to begin with
(e.g., if the app is CPU-bound, throwing multiple threads at it is likely
counterproductive). Don't know how much data you have (e.g., perhaps it
would all fit in RAM, and you could get away with loading it all immediately
after opening the connection). Don't know the data access patterns; e.g.,
perhaps a small amount of data is mutable, but a lot more doesn't change, in
which case perhaps you could load the unchanging data from one database and
the mutable data from another. I don't know what kinds of computations
you're doing. For example, many large-scale numerical computations in pure
Fortran are deliberately designed to be "cache friendly"; e.g., applying
divide-and-conquer blocking strategies to large matrix multiplications, to
minimize mean working-set size over a sequence of computational stages.
There's often far more code arranging to "live with" limited fast cache than
there is to do actual number-crunching (matrix multiplication can be written
in 4 lines of code; a _useful_ large-scale matrix multiplication routine can
easily consume 4 pages of code). Don't even know if you _are_ doing
significant number-crunching; if you are, perhaps recoding some of it in C
would be a major win.
And so on. If your app requires truly random access to almost all of an
enormous collection of data, then I have no suggestion that's likely to
help. I'm not sure I've ever seen such an app. But if there are
exploitable properties in your app's algorithm and/or in its data, I don't
know enough about your app to guess what kind they may be. It's possible
(even likely) that there isn't a quick refactoring that would help enough.
More information about the ZODB-Dev
mailing list