[ZODB-Dev] what's the latest on zodb/zeo+memcached?
Claudiu Saftoiu
csaftoiu at gmail.com
Mon Jan 21 15:38:24 UTC 2013
On Sat, Jan 19, 2013 at 10:00 AM, Jim Fulton <jim at zope.com> wrote:
> - ZODB doesn't simply load your database into memory.
> It loads objects when you try to access their state.
> If you're using ZEO (or relstorage, or neo), each load requires a
> round-trip to the server. That's typically a millisecond or two,
> depending on your network setup. (Your database is small, so disk
> access shouldn't be an issue as it is, presumably in your disk
> cache.
I understand. It seems to be able to unghost about 10000 catalog-related
objects/minute - does that sound about right?
> - You say it often takes you a couple of minutes to handle requests.
> This is obviously very long. It sounds like there's an issue
> with the way you're using the catalog. It's not that hard get this
> wrong. I suggest either hiring someone with experience in this
> area to help you or consider using another tool, like solr.
> (You could put more details of your application here, but I doubt
> people will be willing to put in the time to really analyze it and
> tell you how to fix it. I know I can't.)
That's alright, I won't ask for such a time investment. As it is I greatly
appreciate everyone for replying and helping out already - thanks guys!
- solr is so fast it almost makes me want to cry. At ZC, we're
> increasingly using solr instead of the catalog. As the original
> author of the catalog, this makes me sad, but we just don't have the
> time to put in the effort to equal solr/lucene.
> - A common mistake when using ZODB is to use it like a relational
> database, puting most data in catalog-like data structures and
> querying to get most of your data. The strength of a OODB is that
> you don't have to query to get data from a well-designed object
> model.
My use case is basically this: I have 400,000 'documents' with 17 attributes
that I want to search on. One of them is the date of the document. This
index
I could easily do away with as the documents are organized roughly by date.
However, if I want to get a 'document' made at any date but with a certain
attribute
in a certain range, I don't have a good way to do it based on how they are
stored,
now. I could try making my own indexing scheme but I figured ZCatalog would
be well-suited for this...
On Thu, Jan 17, 2013 at 12:31 PM, Claudiu Saftoiu <csaftoiu at gmail.com>
> wrote:
> ...
> > One potential thing is this: after a zeopack the index database .fs file
> is
> > about 400 megabytes, so I figure a cache of 3000 megabytes should more
> than
> > cover it. Before a zeopack, though - I do one every 3 hours - the file
> grows
> > to 7.6 gigabytes.
>
> In scanning over this thread while writing my last message, I noticed
> this.
>
> This is a ridiculous amount of churn. There is likely something
> seriously out of whack with your application. Every application is
> different, but we typically see *weekly* packs reduce database size by
> at most 50%.
>
All that database contains is: a catalog with 17 indices of 400,000 objects,
the root object, a document map, and an object to hold the catalog. The
document map itself I put as a 'document_map' attribute of the catalog.
Because
of the nature of my app I have to add and re-index those objects quite
often (they
change a lot). This seems to cause the index .fs file to grow by a
ridiculous
amount... is there anything obviously wrong with the above picture?
The main database does not have quite so much churn. Right after a pack
just now, it
was 5715MB, and it gets to at most 6000MB or so after 3 hours (often just
up to 5800MB).
I don't have to run the pack quite so often - is there a significant
downside to packing often?
> > Shouldn't the relevant objects - the entire set of latest
> > versions of the objects - be the ones in the cache, thus it doesn't
> matter
> > that the .fs file is 7.6gb as the actual used bits of it are only 400mb
> or
> > so?
>
> Every object update invalidates cached versions of the obejct in all
> caches except the writer's. (Even the writer's cached value is
> invalidated of conflict-resolution was performed.)
>
> > Another question is, does zeopacking destroy the cache?
>
> No, but lots of writing does.
>
I see. After all the above it really sounds like if I want fast indexing I
should
just drop zcatalog and go ahead and use solr. It doesn't seem zcatalog +
zodb,
the way they are now, are really made to handle many objects with many
indices
that get updated often...
Thanks for all the help,
- Claudiu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.zope.org/pipermail/zodb-dev/attachments/20130121/c3437f0a/attachment.html>
More information about the ZODB-Dev
mailing list