[ZODB-Dev] Plone in P2P using Zope over DHT

Tue Jan 4 09:40:27 EST 2011

Le mardi 4 janvier 2011 11:40:34, Aran Dunkley a écrit :
> As one of the NEO team, what are your thoughts on the practicality of
> running Plone in a P2P environment with the latencies experienced in
> standard DHT (such as for example those based on Kademlia) implemtations?

First, I must say that we have not run any benchmark on NEO outside LAN 
conditions yet, because there are some issues which need attention before we 
can increase test hostility. To name a few blockers, there is a need for 
"peaceful" deadlock resolution/avoidance when the same set of objects gets 
modified concurrently, and an important "from scratch" replication performance 
issue. Another show-stopper for NEO production-readiness is the lack of backup 
tools, as NEO currently relies on storage back-end tools (eg. mysqldump) and 
on a replication scheme which is not implemented (useful in a all-nodes-in-
datacenter setup, not if nodes are to be scattered around the globe).

This is for the current implementation status, now I'll try to answer from 
NEO's design point of view.

NEO was not designed with international network latency in mind, so I doubt it 
would compete with Kademlia on this metric.
In NEO, each node knows the entire hash table. When loading an object, one 
node known to contain that object is selected and a connection is established 
(if not already available). The highest latency to fetch any piece of data is 
the latency toward the node with most latency (plus extra latency if node 
turns out to be offline, as the next valid node will be attempted). This 
(latency cost & node absence late discovery) can be mitigated by integrating 
node latency in the node weight, computed to select a node to connect to when 
loading an object. So the more there are replicates, the lower worst-case 
latency gets. This is not implemented, but would be a very welcome addition.
When writing an object, a client pushes copies to each and every node supposed 
to contain that object (known via the hash table) and must wait for all 
related acknowledgements, so it will always suffer from the worst-case 
latency. This is already mitigated by pipelining stores, so that 
acknowledgements are only waited during tpc_vote rather than proportionally to 
the number of stored objects. It could be further mitigated by considering 
multicast (currently, NEO does everything with unicast: TCP).
Although it's not required for all nodes to always have most up-to-date view 
of the hash table for reading (besides causing absence-late-discovery 
presented above), it will cause increasing problems when writing as nodes go 
up and down more often.

-- 
Vincent Pelletier