[ZODB-Dev] what's the latest on zodb/zeo+memcached?

Mon Jan 21 15:38:24 UTC 2013

On Sat, Jan 19, 2013 at 10:00 AM, Jim Fulton <jim at zope.com> wrote:

> - ZODB doesn't simply load your database into memory.
>   It loads objects when you try to access their state.
>   If you're using ZEO (or relstorage, or neo), each load requires a
>   round-trip to the server.  That's typically a millisecond or two,
>   depending on your network setup.  (Your database is small, so disk
>   access shouldn't be an issue as it is, presumably in your disk
>   cache.

I understand. It seems to be able to unghost about 10000 catalog-related
objects/minute - does that sound about right?

> - You say it often takes you a couple of minutes to handle requests.
>   This is obviously very long.  It sounds like there's an issue
>   with the way you're using the catalog.  It's not that hard get this
>   wrong.  I suggest either hiring someone with experience in this
>   area to help you or consider using another tool, like solr.
>   (You could put more details of your application here, but I doubt
>   people will be willing to put in the time to really analyze it and
>   tell you how to fix it.  I know I can't.)

That's alright, I won't ask for such a time investment. As it is I greatly
appreciate everyone for replying and helping out already - thanks guys!

- solr is so fast it almost makes me want to cry.  At ZC, we're
>   increasingly using solr instead of the catalog.  As the original
>   author of the catalog, this makes me sad, but we just don't have the
>   time to put in the effort to equal solr/lucene.
> - A common mistake when using ZODB is to use it like a relational
>   database, puting most data in catalog-like data structures and
>   querying to get most of your data.  The strength of a OODB is that
>   you don't have to query to get data from a well-designed object
>   model.

My use case is basically this: I have 400,000 'documents' with 17 attributes
that I want to search on. One of them is the date of the document. This
index
I could easily do away with as the documents are organized roughly by date.
However, if I want to get a 'document' made at any date but with a certain
attribute
in a certain range, I don't have a good way to do it based on how they are
stored,
now. I could try making my own indexing scheme but I figured ZCatalog would
be well-suited for this...

On Thu, Jan 17, 2013 at 12:31 PM, Claudiu Saftoiu <csaftoiu at gmail.com>
> wrote:
> ...
> > One potential thing is this: after a zeopack the index database .fs file
> is
> > about 400 megabytes, so I figure a cache of 3000 megabytes should more
> than
> > cover it. Before a zeopack, though - I do one every 3 hours - the file
> grows
> > to 7.6 gigabytes.
>
> In scanning over this thread while writing my last message, I noticed
> this.
>
> This is a ridiculous amount of churn. There is likely something
> seriously out of whack with your application.  Every application is
> different, but we typically see *weekly* packs reduce database size by
> at most 50%.
>

All that database contains is: a catalog with 17 indices of 400,000 objects,
the root object, a document map, and an object to hold the catalog. The
document map itself I put as a 'document_map' attribute of the catalog.
Because
of the nature of my app I have to add and re-index those objects quite
often (they
change a lot). This seems to cause the index .fs file to grow by a
ridiculous
amount... is there anything obviously wrong with the above picture?

The main database does not have quite so much churn. Right after a pack
just now, it
was 5715MB, and it gets to at most 6000MB or so after 3 hours (often just
up to 5800MB).
I don't have to run the pack quite so often - is there a significant
downside to packing often?

> > Shouldn't the relevant objects - the entire set of latest
> > versions of the objects - be the ones in the cache, thus it doesn't
> matter
> > that the .fs file is 7.6gb as the actual used bits of it are only 400mb
> or
> > so?
>
> Every object update invalidates cached versions of the obejct in all
> caches except the writer's.  (Even the writer's cached value is
> invalidated of conflict-resolution was performed.)
>
> > Another question is, does zeopacking destroy the cache?
>
> No, but lots of writing does.
>

I see. After all the above it really sounds like if I want fast indexing I
should
just drop zcatalog and go ahead and use solr. It doesn't seem zcatalog +
zodb,
the way they are now, are really made to handle many objects with many
indices
that get updated often...

Thanks for all the help,
- Claudiu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.zope.org/pipermail/zodb-dev/attachments/20130121/c3437f0a/attachment.html>