[ZODB-Dev] Cache warm up time

Sat Mar 9 14:02:38 UTC 2013

On Sat, Mar 9, 2013 at 5:50 AM, Vincent Pelletier <plr.vincent at gmail.com> wrote:
> Le Friday 08 March 2013 18:50:09, Laurence Rowe a Ã©crit :
>> It would be great if there was a way to advise ZODB in advance that
>> certain objects would be required so it could fetch multiple object
>> states in a single request to the storage server.
>
> +1
>
> I can see this used to process a large tree, objects being be processed as
> they are loaded (loadds being pipelined).
>
> Pseudo-code interface suggestion:
>
> class IPipelinedStorage:
>   def loadMany(oid_list, callback, tid=None, before_tid=None):
>   callback being along the lines of:
>     def callback(oid, data_record, tid, next_tid):
>       if stop_condition:
>         raise ... (StopIteration ? just anything ?)
>       return more_oids_to_queue_for_loading
>   tid and before_tid (mutualy exclusive) specify the snapshot to use, to
>   implement equivalent of loadSerial and loadBefore.
>
> class IPipelinedConnection:
>   def walk(ob, callback):
>   callback being along the lines of:
>     def callback(just_loaded_object, referee_list):
>       # do womething on just_loaded_object
>       return filtered_referee_list
>   referee_list would expose at least referee's class (name ?), and hold their
>   oid for Connection.walk internal use (only ?).
>   Or maybe just ghosts, but callback would have to take care of not
>   unghostifying them - it would void the purpose of pipelining loads.
>
> Above ZODB (persistent containers with internal persistent objects, like
> BTree):
>   Implement an iterator over subobjects ignoring intermediate internal
>   structure (think BTree.*Bucket classes).
>
> Specific iteration order could probably be specified to be able to implement
> iterkeys and such in BTree for example, but storage may have to implement load
> reordering when they happen in parallel (like NEO, and as could probably be
> implemented for zeoraid and relStorage configured with multiple mirrored
> databases), limiting latency/processing parallelism and possibly leading to
> memory footprint explosion.
> So I think it should be possible to also request no special loading order to
> get lowest latency backend can provide and somewhat constant memory footprint.
>
> Any thought/comment ?

I think this is more complicated than necessary.

I think a simple method on a storage that gives a hint that a set of
object ids will be loaded is enough.  A network storage could then
issue a pipelined request for those oids. The application can then
proceed as usual.  I think I've proposed such an API before, but am
too lazy to look it up. Something like:

    load_hint(*oids)

I'd like to see this functionality, but I don't have time to do it soon.

I must say that I think this API is more likely to be abused
than used effectively.  Prefetching catalog indexes is a sort of
anti-pattern than only makes sense for small catalogs.  It
would likely make more sense to have a dedicated catalog
server that returned oids and possibly object records in
response to queries (or whimper, use solr ).

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
Jerky is better than bacon! http://zo.pe/Kqm