[ZODB-Dev] Plone in P2P using Zope over DHT
Vincent Pelletier
vincent at nexedi.com
Tue Jan 4 09:40:27 EST 2011
Le mardi 4 janvier 2011 11:40:34, Aran Dunkley a écrit :
> As one of the NEO team, what are your thoughts on the practicality of
> running Plone in a P2P environment with the latencies experienced in
> standard DHT (such as for example those based on Kademlia) implemtations?
First, I must say that we have not run any benchmark on NEO outside LAN
conditions yet, because there are some issues which need attention before we
can increase test hostility. To name a few blockers, there is a need for
"peaceful" deadlock resolution/avoidance when the same set of objects gets
modified concurrently, and an important "from scratch" replication performance
issue. Another show-stopper for NEO production-readiness is the lack of backup
tools, as NEO currently relies on storage back-end tools (eg. mysqldump) and
on a replication scheme which is not implemented (useful in a all-nodes-in-
datacenter setup, not if nodes are to be scattered around the globe).
This is for the current implementation status, now I'll try to answer from
NEO's design point of view.
NEO was not designed with international network latency in mind, so I doubt it
would compete with Kademlia on this metric.
In NEO, each node knows the entire hash table. When loading an object, one
node known to contain that object is selected and a connection is established
(if not already available). The highest latency to fetch any piece of data is
the latency toward the node with most latency (plus extra latency if node
turns out to be offline, as the next valid node will be attempted). This
(latency cost & node absence late discovery) can be mitigated by integrating
node latency in the node weight, computed to select a node to connect to when
loading an object. So the more there are replicates, the lower worst-case
latency gets. This is not implemented, but would be a very welcome addition.
When writing an object, a client pushes copies to each and every node supposed
to contain that object (known via the hash table) and must wait for all
related acknowledgements, so it will always suffer from the worst-case
latency. This is already mitigated by pipelining stores, so that
acknowledgements are only waited during tpc_vote rather than proportionally to
the number of stored objects. It could be further mitigated by considering
multicast (currently, NEO does everything with unicast: TCP).
Although it's not required for all nodes to always have most up-to-date view
of the hash table for reading (besides causing absence-late-discovery
presented above), it will cause increasing problems when writing as nodes go
up and down more often.
--
Vincent Pelletier
More information about the ZODB-Dev
mailing list