2.9. JGroups

JGroups is a middleware for reliable multicast communication in Java. JGroups provides both low level communication primitives, such as message transport and group membership, and high level communication functions, such as synchronous message exchange or distributed mutual exclusion. The architecture of JGroups is configurable to allow tailoring to application requirements.

2.9.1. Channels

The low level functions of the communication mechanism, such as group membership and message transport, are provided by channels.

Channel Class

public class JChannel implements Closeable {

    // Initialization accepts configuration options
    public JChannel ();
    public JChannel (String url);
    public JChannel (InputStream stream);

    // Join a group with a given name
    public void connect (String cluster);
    public String clusterName ();
    public void disconnect ();

    // View is the current list of members
    public View getView ();

    // Send a message to all or one group member.
    public void send (Message msg);
    public void send (Address dst, Object obj);
    public void send (Address dst, byte [] buf);
    public void send (Address dst, byte [] buf, int offset, int length);

    // Asynchronous notification about messages and membership is available
    public void setReceiver (Receiver r);
    public Receiver getReceiver ();

    ...
}

The addresses used in the channel methods are internal identifiers typically assigned by the transport protocol modules.

Receiver Interface

public interface Receiver {

    // Receive individual messages or batches of messages
    default void receive (Message msg) { ... }
    default void receive (MessageBatch batch) { ... }

    // Notification about membership view change
    default void viewAccepted (View new_view) { ... }

    // Notification to temporarily suspend sending messages
    default void block () { ... }
    default void unblock () { ... }

    // Group members can share state
    default void getState (OutputStream output) { ... }
    default void setState (InputStream input) { ... }
}

Message Classes

public interface Message ... {

    short BYTES_MSG   = 0,
        NIO_MSG       = 1,
        EMPTY_MSG     = 2,
        OBJ_MSG       = 3,
        LONG_MSG      = 4,
        COMPOSITE_MSG = 5,
        FRAG_MSG      = 6;

    short getType ();

    Address getDest ();
    Message setDest (Address new_dest);
    Address getSrc ();
    Message setSrc (Address new_src);

    // Headers are internal and interpreted by individual protocol modules
    Message putHeader (short id, Header hdr);
    <T extends Header> T getHeader (short id);
    Map<Short,Header> getHeaders ();

    // Flags are interpreted by individual protocol modules
    // Examples include disabling flow control or reliability
    short getFlags (boolean transient_flags);
    Message setFlag (short flag, boolean transient_flags);

    // Convenience methods on the interface
    // May not make sense for all message classes

    byte [] getArray ();
    int getOffset ();
    int getLength ();
    public Message setBuffer (byte [] b);
    Message setArray (byte [] b, int offset, int length);

    <T extends Object> T getObject ();
    Message setObject (Object obj);

    <T extends Object> T getPayload ();
    Message setPayload (Object pl);

    ...
}

public class BytesMessage ... {
    public BytesMessage (Address dest, byte [] array) { ... }
    public BytesMessage (Address dest, byte [] array, int offset, int length) { ... }
    ...
}

public class NioMessage ... {
    // Uses java.nio.ByteBuffer that can reduce copying overhead
    public NioMessage (Address dest, ByteBuffer buf) { ... }
    public ByteBuffer getBuf () { ... }
    public NioMessage setBuf (ByteBuffer b) { ... }
    ...
}

public class ObjectMessage ... {
    public ObjectMessage(Address dest, Object obj) {
    ...
}

public class CompositeMessage ... implements Iterable<Message> {
    public CompositeMessage (Address dest, Message ... messages) { ... }
    public CompositeMessage add (Message msg) { ... }
    public <T extends Message> T get (int index) { ... }
    public Iterator<Message> iterator () { ... }
    ...
}

2.9.2. Building Blocks

Somewhat inaptly named, building blocks use channels to provide high level functions of the communication mechanism, such as synchronous message exchange or group mutual exclusion.

Message Dispatcher Building Block

public class MessageDispatcher implements ... {

    // Message dispatcher needs channel for communication and request handler for message delivery.
    public MessageDispatcher (JChannel channel) { ... }
    public MessageDispatcher (JChannel channel, RequestHandler req_handler) { ... }

    // Casting sends to multiple destinations or all members when none specified.
    public <T> RspList<T> castMessage (final Collection<Address> dests, Message msg, RequestOptions opts) { ... }
    public <T> CompletableFuture<RspList<T>> castMessageWithFuture (final Collection<Address> dests, Message msg, RequestOptions opts) { ... }

    // Sending sends to single destination.
    public <T> T sendMessage (Message msg, RequestOptions opts) { ... }
    public <T> CompletableFuture<T> sendMessageWithFuture (Message msg, RequestOptions opts) { ... }

    // Request handler interface if none provided externally.
    @Override public Object handle (Message msg) { ... }
    @Override public void handle (Message request, Response response) { ... }

    ...
}

public class RequestOptions {

    // Can wait for none, one or all responses.
    public ResponseMode getMode () { ... }
    public RequestOptions setMode (ResponseMode mode) { ... }

    // Can specify response filter if response expected.
    public RspFilter getRspFilter () { ... }
    public RequestOptions setRspFilter (RspFilter filter) { ... }

    // Can specify response timeout if response expected.
    public long getTimeout () { ... }
    public RequestOptions setTimeout (long timeout) { ... }

    ...
}

public class RspList<T> extends HashMap<Address,Rsp<T>> implements Iterable<Rsp<T>> {

    public int numReceived () { ... }
    public boolean isReceived (Address sender) { ... }

    // Standard get inherited.
    public T getFirst () { ... }
    public List<T> getResults () { ... }

    // Response is not expected from failed members.
    public int numSuspectedMembers () { ... }
    public List<Address> getSuspectedMembers () { ... }
    public boolean isSuspected (Address sender) { ... }

    ...
}

Atomic Counter Building Block

public class CounterService {

    // Channel stack must include COUNTER protocol.
    public CounterService (JChannel ch) { ... }
    public SyncCounter getOrCreateSyncCounter (String name, long initial_value) { ... }
    public CompletionStage<AsyncCounter> getOrCreateAsyncCounter (String name, long initial_value) { ... }
    public void deleteCounter (String name) { ... }
    ...
}

public interface SyncCounter extends BaseCounter {

    long get ();
    void set (long new_value);

    long addAndGet (long delta);
    long incrementAndGet ();
    long decrementAndGet ();

    long compareAndSwap (long expect, long update);
    boolean compareAndSet (long expect, long update);

    // Useful for complex updates under high contention.
    <T extends Streamable> T update (CounterFunction<T> updateFunction);
}

public interface AsyncCounter extends BaseCounter {

    CompletionStage<Long> get ();
    CompletionStage<Void> set (long new_value);

    ...
}
  • Cluster coordinator stores and updates counter values

  • Cluster coordinator can have backup coordinators

  • Counter values include version also sent to clients

  • Client value with latest version used to recover from coordinator failure

2.9.3. Protocol Modules

A stack of protocol modules is used to implement various aspects of the reliable multicast communication mechanism.

The transport modules are responsible for transporting messages. The UDP module uses IP multicast to deliver multicast messages and IP unicast to deliver unicast messages. The TCP and TCP_NIO2 modules use a mesh of TCP connections to deliver both multicast and unicast messages, with thread per connection and asynchronous single thread models. The TUNNEL module can tunnel other transport to a specialized router.

Transport Protocol Modules

UDP

uses IP multicast to deliver multicast messages

TCP

uses mesh of TCP connections, thread per connection model

TCP_NIO2

uses mesh of TCP connections, asynchronous single thread model

TUNNEL

tunnels transport to specialized router

The discovery modules are responsible for locating the group upon initialization. The PING, MPING and BPING modules use IP multicast or IP broadcast over UDP. The TCPPING module attempts to contact members from a given list. The TCPGOSSIP module attempts to contact members using a specialized router. The FILE_PING, JDBC_PING, RACKSPACE_PING, SWIFT_PING and S3_PING keep track of members in various places ranging from shared file systems and shared database tables to cloud storage services. The DNS_PING module relies on A and SRV records in DNS. The PDC module provides persistent cache of discovered members.

Discovery Protocol Modules

PING

uses IP multicast over existing UDP transport

MPING

uses IP multicast over separate UDP transport

BPING

uses IP broadcast

TCPPING

uses list of member addresses

TCPGOSSIP

uses specialized router

FILE_PING

uses shared directory to keep track of members

JDBC_PING

uses shared database to keep track of members

RACKSPACE_PING

uses Rackspace Cloud File Storage

SWIFT_PING

uses Openstack Swift object storage

S3_PING

uses Amazon Simple Storage Service

DNS_PING

uses A and SRV records in DNS

PDC

caches discovered members

The merge modules are responsible for merging groups during recovery from network partitioning failures. The MERGE2 module has group coordinators periodically multicast presence and membership information, distinct subgroups are merged upon discovery (versions 3.X only). The MERGE3 module has all members periodically multicast membership information hash, inconsistent membership information is retrieved and merged upon discovery.

Merge Protocol Modules

MERGE2

group coordinator multicasts presence and membership view (3.X)

MERGE3

all members multicast presence and membership view

The failure detection modules are responsible for detecting failed members. The FD module uses periodic ping with acknowledgment between neighboring members in a ring. The FD_ALL and FD_ALL2 modules use multicast heartbeat among all members in a group. The FD_SOCK module uses a TCP socket ring, socket close indicates suspect. The FD_HOST module augments member failure detection with host failure detection through internal library method (version 4.X only). The VERIFY_SUSPECT module provides additional verification of suspect members.

Failure Detection Modules

FD

uses periodic ping in logical ring

FD_ALL

uses multicast heartbeat

FD_ALL2

uses multicast heartbeat

FD_SOCK

uses TCP socket ring

FD_HOST

uses internal library method to ping hosts (4.X)

VERIFY_SUSPECT

verify suspect members additionally

The reliable message transmission modules are responsible for providing reliable ordered message delivery.

Reliable Message Transmission Modules

NAKACK

uses negative acknowledgments and sequence numbering, old version (3.X)

NAKACK2

uses negative acknowledgments and sequence numbering, new version

UNICAST

uses positive acknowledgments and sequence numbering, for unicast messages

UNICAST2

uses negative acknowledgments and sequence numbering, for unicast messages (3.X)

UNICAST3

uses both positive and negative acknowledgments and sequence numbering, for unicast messages (4.X)

Other modules provide functions such as authentication, encryption, compression, fragmentation, flow control, atomic delivery, totally ordered delivery, and other.

Miscellaneous Modules

UFC

rate limiting flow control for unicast

MFC

rate limiting flow control for multicast

FRAG

message fragmentation

FRAG2

message fragmentation (4.X)

STABLE

atomic delivery in group

BARRIER

helper for shared state transfer

SEQUENCER

totally ordered delivery through coordinator

RELAY2

bridge between multiple directly reachable clusters

RELAY3

bridge between multiple clusters with routing rules

AUTH

member authentication

ENCRYPT

message body encryption

COMPRESS

message body compression

2.9.4. References

  1. The JGroups Project Home Page. https://www.jgroups.org