Apache ZooKeeper is a distributed service configuration repository. Java and C bindings are available as part of the project, multiple other bindings are provided by community.
ZooKeeper servers maintain synchronized memory state with persistent journal and snapshots. Clients specify a server list and connect to a single server at a time with fail over.
Servers. Replicated server cluster
Each server stores complete state in memory
Updates are also stored in persistent log
Persistent snapshot done when updates accumulate
Atomic communication protocol
All updates pass through leader server
Leader collects majority quorum for each update
Leader election triggered in case of cluster failure
Clients.
Provided with a list of servers to use
Connected to a single server at a time
Connection failure handled by switching to another server
Data.
Tree of named nodes navigated by string paths
Support for unique node naming
Node data is array of bytes
Updates increment version
Some data objects in the interface are generated from platform independent specification.
module org.apache.zookeeper.data { ... class Stat { long czxid; // ZXID of transaction that created this node long mzxid; // ZXID of transaction that last modified this node long pzxid; // ZXID of transaction that last modifined node children long ctime; // Node creation time long mtime; // Node last modification time int version; // Node version int aversion; // Node ACL version int cversion; // Node child version int dataLength; // Node data length int numChildren; // Node child count long ephemeralOwner; // Owner identifier for ephemeral nodes } ... }
public class ZooKeeper { public ZooKeeper (String connectString, int sessionTimeout, Watcher watcher) { ... } public ZooKeeper (String connectString, int sessionTimeout, Watcher watcher, boolean canBeReadOnly) { ... } ... public String create (String path, byte data [], List<ACL> acl, CreateMode createMode) { ... } public void delete (String path, int version) { ... } public Stat exists (String path, boolean watch) { ... } public Stat exists (String path, Watcher watcher) { ... } public byte [] getData (String path, boolean watch, Stat stat) { ... } public byte [] getData (String path, Watcher watcher, Stat stat) { ... } public Stat setData (String path, byte data [], int version) { ... } public List<String> getChildren (String path, boolean watch) { ... } public List<String> getChildren (String path, boolean watch, Stat stat) { ... } public List<String> getChildren (String path, Watcher watcher) { ... } public List<String> getChildren (String path, Watcher watcher, Stat stat) { ... } // Make sure the server is current with the leader. public void sync (String path, VoidCallback cb, Object ctx) { ... } public synchronized void close () { ... } }
public class ZooKeeper { ... public void create ( String path, byte data [], List<ACL> acl, CreateMode createMode, StringCallback cb, Object ctx) { ... } public void delete(String path, int version, VoidCallback cb, Object ctx) { ... } public void exists (String path, boolean watch, StatCallback cb, Object ctx) { ... } public void exists (String path, Watcher watcher, StatCallback cb, Object ctx) { ... } public void getData (String path, boolean watch, DataCallback cb, Object ctx) { ... } public void getData (String path, Watcher watcher, DataCallback cb, Object ctx) { ... } public void setData (String path, byte data [], int version, StatCallback cb, Object ctx) { ... } public void getChildren (String path, boolean watch, ChildrenCallback cb, Object ctx) { ... } public void getChildren (String path, boolean watch, Children2Callback cb, Object ctx) { ... } public void getChildren (String path, Watcher watcher, ChildrenCallback cb, Object ctx) { ... } public void getChildren (String path, Watcher watcher, Children2Callback cb, Object ctx) { ... } ... } public interface StatCallback extends AsyncCallback { public void processResult (int rc, String path, Object ctx, Stat stat); } public interface DataCallback extends AsyncCallback { public void processResult (int rc, String path, Object ctx, byte data [], Stat stat); } public interface ChildrenCallback extends AsyncCallback { public void processResult (int rc, String path, Object ctx, List<String> children); } public interface Children2Callback extends AsyncCallback { public void processResult (int rc, String path, Object ctx, List<String> children, Stat stat); } ...
public class ZooKeeper { ... // Execute multiple operations atomically. public List<OpResult> multi (Iterable<Op> ops) { ... } public void multi (Iterable<Op> ops, MultiCallback cb, Object ctx) { ... } ... } public abstract class Op { private int type; private String path; private Op (int type, String path) { this.type = type; this.path = path; } public static Op create (String path, byte [] data, List<ACL> acl, int flags) { return new Create (path, data, acl, flags); } public static class Create extends Op { private byte [] data; private List<ACL> acl; private int flags; private Create (String path, byte [] data, List<ACL> acl, int flags) { super (ZooDefs.OpCode.create, path); this.data = data; this.acl = acl; this.flags = flags; } ... } ... } public abstract class OpResult { private int type; private OpResult (int type) { this.type = type; } public static class CreateResult extends OpResult { private String path; public CreateResult (String path) { super (ZooDefs.OpCode.create); this.path = path; } ... } ... }
public class ZooKeeper { ... // Manage watches with explicit mode. void addWatch (String basePath, AddWatchMode mode); void removeWatches (String path, Watcher watcher, Watcher.WatcherType watcherType, boolean local); ... } public enum AddWatchMode { PERSISTENT (0), PERSISTENT_RECURSIVE (1); } public interface Watcher { abstract public void process (WatchedEvent event); public interface Event { public enum EventType { None (-1), NodeCreated (1), NodeDeleted (2), NodeDataChanged (3), NodeChildrenChanged (4); ... } } } public class WatchedEvent { ... public KeeperState getState () { ... } public EventType getType () { ... } public String getPath () { ... } }
One shot watches are removed after every event
Persistent watches stay until removed explicitly
Recursive watches also report events on children
Watchers will receive notification on connection failures but non delivered events are considered lost afterwards.
The atomicity and consistency guarantees provided by Apache ZooKeeper can be used to implement multiple high level recipes. Such implementations are provided by the Apache Curator project.
Agreement.
group membership tracking
leader election with polling interface
leader election with callback interface
Synchronization.
barrier with explicit state setting calls
barrier with node count condition
recursive lock
non recursive lock
recursive read write lock
semaphore
wrapper for acquiring multiple locks atomically
Communication.
backwards compatible queue
ordered queue with optional item identities
queue with delayed delivery
queue with priorities
shared integer counter
shared long integer counter
Resiliency.
generic local path cache
connection loss resistant node interface
connection loss resistant node interface with keepalive
connection loss resistant watch interface
The Apache ZooKeeper Project Home Page. https://zookeeper.apache.org
The Apache Curator Project Home Page. https://curator.apache.org