2.13. Message Passing Interface (MPI)

MPI (Message Passing Interface) is a standard interface for high performance computing middleware. MPI uses typed messages and provides a variety of message exchange functions for point-to-point communication, collective communication functions designed for coordinated multiparty message exchange, optionally optimized for particular communication topologies, remote memory access functions, and additional functions for coordinated I/O. The standard defines the interface in a language agnostic notation, language bindings are defined for C and Fortran.

2.13.1. Architecture

Initialization assumes either a world model or a session model. The world model assumes a flat application architecture where all processes interact. The session model assumes a modular application architecture where communication is restricted to processes executing individual modules. Cluster resources in session model are identified using URI names configured by cluster administrator.

MPI Initialization

World Model. For programs where all processes coordinate together

int MPI_Init (int *argc, char ***argv);
int MPI_Init_thread (int *argc, char ***argv, int required, int *provided);
int MPI_Query_thread (int *provided);
int MPI_Is_thread_main (int *flag);
int MPI_Initialized (int *flag);
int MPI_Finalize (void);

MPI_THREAD_SIGNLE

single thread in this process

MPI_THREAD_FUNNELED

multiple threads but only main thread calls MPI

MPI_THREAD_SERIALIZED

multiple threads but only one MPI call at a time

MPI_THREAD_MULTIPLE

multiple threads and multiple MPI calls at a time

Session Model. For programs where processes coordinate within components

int MPI_Session_init (MPI_Info info, MPI_Errhandler errhandler, MPI_Session *session);
int MPI_Session_get_num_psets (MPI_Session session, MPI_Info info, int *npset_names);
int MPI_Session_get_nth_pset (MPI_Session session, MPI_Info info, int n, int *pset_len, char *pset_name);
int MPI_Session_finalize (MPI_Session *session);

Process sets are implementation defined groups of processes

  • arbitrary overlap possible

  • intended to express shared resource scopes

  • mpi://SELF and mpi://WORLD always exist

Dynamic Process Model. For explicit control over process lifecycle

int MPI_Comm_spawn (
    const char *command, char *argv[], int maxprocs, MPI_Info info,
    int root, MPI_Comm comm, MPI_Comm *intercomm,
    int array_of_errcodes []);

int MPI_Comm_spawn_multiple (
    int count, char *array_of_commands [], char **array_of_argv [],
    const int array_of_maxprocs [], const MPI_Info array_of_info [],
    int root, MPI_Comm comm, MPI_Comm *intercomm,
    int array_of_errcodes [])

int MPI_Comm_get_parent (MPI_Comm *parent);

  • children have separate MPI_COMM_WORLD

Configuration. A set of key value pairs used to provide additional configuration

int MPI_Info_create (MPI_Info *info);
int MPI_Info_free (MPI_Info *info);
int MPI_Info_dup (MPI_Info info, MPI_Info *newinfo);

int MPI_Info_set (MPI_Info info, const char *key, const char *value);
int MPI_Info_get_nkeys (MPI_Info info, int *nkeys);
int MPI_Info_get_nthkey (MPI_Info info, int n, char *key);
int MPI_Info_get_string (MPI_Info info, const char *key, int *buflen, char *value, int *flag);
int MPI_Info_delete (MPI_Info info, const char *key);

MPI Addressing

Groups. A group contains processes belonging to an application

MPI_GROUP_EMPTY

int MPI_Group_size (MPI_Group group, int *size);
int MPI_Group_rank (MPI_Group group, int *rank);

int MPI_Group_translate_ranks (
    MPI_Group group1, int n, const int ranks1 [],
    MPI_Group group2, int ranks2 []);

int MPI_Group_compare (MPI_Group group1, MPI_Group group2, int *result);

int MPI_Group_union (MPI_Group group1, MPI_Group group2, MPI_Group *newgroup);
int MPI_Group_difference (MPI_Group group1, MPI_Group group2, MPI_Group *newgroup);
int MPI_Group_intersection (MPI_Group group1, MPI_Group group2, MPI_Group *newgroup);

int MPI_Group_incl (MPI_Group group, int n, const int ranks [], MPI_Group *newgroup);
int MPI_Group_excl (MPI_Group group, int n, const int ranks [], MPI_Group *newgroup);
int MPI_Group_range_incl (MPI_Group group, int n, int ranges [][3], MPI_Group *newgroup);
int MPI_Group_range_excl (MPI_Group group, int n, int ranges [][3], MPI_Group *newgroup);

int MPI_Comm_group (MPI_Comm comm, MPI_Group *group);

int MPI_Group_from_session_pset (MPI_Session session, const char *pset_name, MPI_Group *newgroup);

int MPI_Group_free (MPI_Group *group);

  • created from other groups in the world model

  • created from process sets in the session model

  • processes addressed using rank from 0 to size - 1

Communicators. A communicator represents one or two groups in communication context

MPI_COMM_SELF
MPI_COMM_WORLD

int MPI_Comm_size (MPI_Comm comm, int *size);
int MPI_Comm_rank (MPI_Comm comm, int *rank);

int MPI_Comm_compare (MPI_Comm comm1, MPI_Comm comm2, int *result);

int MPI_Comm_dup (MPI_Comm comm, MPI_Comm *newcomm);
int MPI_Comm_dup_with_info (MPI_Comm comm, MPI_Info info, MPI_Comm *newcomm);

int MPI_Comm_create (MPI_Comm comm, MPI_Group group, MPI_Comm *newcomm);
int MPI_Comm_create_group (MPI_Comm comm, MPI_Group group, int tag, MPI_Comm *newcomm);
int MPI_Comm_create_from_group (MPI_Group group, const char *stringtag, MPI_Info info, MPI_Errhandler errhandler, MPI_Comm *newcomm);
int MPI_Intercomm_create (
    MPI_Comm local_comm, int local_leader,
    MPI_Comm peer_comm, int remote_leader,
    int tag, MPI_Comm *newintercomm);
int MPI_Intercomm_create_from_groups (
    MPI_Group local_group, int local_leader,
    MPI_Group remote_group, int remote_leader,
    const char *stringtag, MPI_Info info, MPI_Errhandler errhandler, MPI_Comm *newintercomm);

int MPI_Comm_split (MPI_Comm comm, int color, int key, MPI_Comm *newcomm);
int MPI_Comm_split_type (MPI_Comm comm, int split_type, int key, MPI_Info info, MPI_Comm *newcomm);

int MPI_Intercomm_merge (MPI_Comm intercomm, int high, MPI_Comm *newintracomm);

int MPI_Comm_free (MPI_Comm *comm);

int MPI_Comm_set_info (MPI_Comm comm, MPI_Info info);
int MPI_Comm_get_info (MPI_Comm comm, MPI_Info *info_used);

int MPI_Comm_test_inter (MPI_Comm comm, int *flag);
int MPI_Comm_remote_size (MPI_Comm comm, int *size);
int MPI_Comm_remote_group (MPI_Comm comm, MPI_Group *group);

inter-communicator

communication within single group

intra-communicator

communication between two groups

2.13.2. Point-To-Point Communication

MPI_Send Function

int MPI_Send (
    const void *buf, int count, MPI_Datatype datatype,
    int dest, int tag, MPI_Comm comm);
int MPI_Send_c (
    const void *buf, MPI_Count count, MPI_Datatype datatype,
    int dest, int tag, MPI_Comm comm);
buf

address of send buffer

count

number of elements in send buffer

datatype

datatype of each send buffer element

dest

rank of destination

tag

message tag

comm

communicator

MPI_Recv Function

int MPI_Recv (
    void *buf, int count, MPI_Datatype datatype,
    int source, int tag, MPI_Comm comm,
    MPI_Status *status);
int MPI_Recv_c (
    void *buf, MPI_Count count, MPI_Datatype datatype,
    int source, int tag, MPI_Comm comm,
    MPI_Status *status);
buf

address of receive buffer

count

maximum number of elements in receive buffer

datatype

datatype of each receive buffer element

source

rank of source or MPI_ANY_SOURCE

tag

message tag or MPI_ANY_TAG

comm

communicator

status

status object

int MPI_Get_count (const MPI_Status *status, MPI_Datatype datatype, int *count);
int MPI_Get_count_c (const MPI_Status *status, MPI_Datatype datatype, MPI_Count *count);

MPI_Sendrecv Function

int MPI_Sendrecv (
    const void *sendbuf, int sendcount, MPI_Datatype sendtype,
    int dest, int sendtag,
    void *recvbuf, int recvcount, MPI_Datatype recvtype,
    int source, int recvtag, MPI_Comm comm,
    MPI_Status *status);
int MPI_Sendrecv_c (
    const void *sendbuf, MPI_Count sendcount, MPI_Datatype sendtype,
    int dest, int sendtag,
    void *recvbuf, MPI_Count recvcount, MPI_Datatype recvtype,
    int source, int recvtag, MPI_Comm comm,
    MPI_Status *status);

Point-To-Point Communication Modes

int MPI_Send (const void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm);
int MPI_Bsend (const void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm);
int MPI_Ssend (const void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm);
int MPI_Rsend (const void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm);

int MPI_Buffer_attach (void *buffer, int size);
int MPI_Buffer_attach_c (void *buffer, MPI_Count size);

int MPI_Buffer_detach (void *buffer_addr, int *size);
int MPI_Buffer_detach_c (void *buffer_addr, MPI_Count *size);
Send

may block, buffer available on return, asynchronous

BSend

does not block, uses supplied buffers, buffer available on return, asynchronous

SSend

may block, target must be receiving otherwise function fails

RSend

may block, target must be receiving otherwise undefined

int MPI_Isend (
    const void *buf, int count, MPI_Datatype datatype,
    int dest, int tag, MPI_Comm comm,
    MPI_Request *request);
int MPI_Isend_c (
    const void *buf, MPI_Count count, MPI_Datatype datatype,
    int dest, int tag, MPI_Comm comm,
    MPI_Request *request);

int MPI_Ibsend (...);
int MPI_Issend (...);
int MPI_Irsend (...);

int MPI_Irecv (
    void *buf, int count, MPI_Datatype datatype,
    int source, int tag, MPI_Comm comm,
    MPI_Request *request);
int MPI_Irecv_c (
    void *buf, MPI_Count count, MPI_Datatype datatype,
    int source, int tag, MPI_Comm comm,
    MPI_Request *request);

int MPI_Iprobe (int source, int tag, MPI_Comm comm, int *flag, MPI_Status *status);
int MPI_Improbe (int source, int tag, MPI_Comm comm, int *flag, MPI_Message *message, MPI_Status *status);
int MPI_Imrecv (void *buf, int count, MPI_Datatype datatype, MPI_Message *message, MPI_Request *request);
int MPI_Imrecv_c (void *buf, MPI_Count count, MPI_Datatype datatype, MPI_Message *message, MPI_Request *request);

int MPI_Wait (MPI_Request *request, MPI_Status *status);
int MPI_Waitany (int count, MPI_Request array_of_requests [], int *index, MPI_Status *status);
int MPI_Waitall (int count, MPI_Request array_of_requests [], MPI_Status array_of_statuses []);
int MPI_Waitsome (int incount, MPI_Request array_of_requests [], int *outcount, int array_of_indices [], MPI_Status array_of_statuses []);

int MPI_Test (MPI_Request *request, int *flag, MPI_Status *status);
int MPI_Testany (int count, MPI_Request array_of_requests [], int *index, int *flag, MPI_Status *status);
int MPI_Testall (int count, MPI_Request array_of_requests [], int *flag, MPI_Status array_of_statuses []);
int MPI_Testsome (int incount, MPI_Request array_of_requests [], int *outcount, int array_of_indices [], MPI_Status array_of_statuses []);

int MPI_Request_free (MPI_Request *request);

int MPI_Request_get_status (MPI_Request request, int *flag, MPI_Status *status);

int MPI_Cancel (MPI_Request *request);
int MPI_Psend_init (
    const void *buf, int partitions, MPI_Count count, MPI_Datatype datatype,
    int dest, int tag, MPI_Comm comm, MPI_Info info,
    MPI_Request *request);

int MPI_Precv_init (
    void *buf, int partitions, MPI_Count count, MPI_Datatype datatype,
    int source, int tag, MPI_Comm comm, MPI_Info info,
    MPI_Request *request);

int MPI_Start (MPI_Request *request);

int MPI_Pready (int partition, MPI_Request request);
int MPI_Pready_range (int partition_low, int partition_high, MPI_Request request);
int MPI_Pready_list (int length, const int array_of_partitions [], MPI_Request request);

int MPI_Wait (...);

Data Types

Selected Basic Types. 

MPI_SHORT, MPI_INT, MPI_LONG, MPI_LONG_LONG

signed integer types

MPI_UNSIGNED_SHORT, MPI_UNSIGNED, MPI_UNSIGNED_LONG, MPI_UNSIGNED_LONG_LONG

unsigned integer types

MPI_FLOAT, MPI_DOUBLE, MPI_LONG_DOUBLE

floating point types

MPI_CHAR, MPI_SIGNED_CHAR, MPI_UNSIGNED_CHAR, MPI_WCHAR

character data types

MPI_INT8_T, MPI_INT16_T, MPI_INT32_T, MPI_INT64_T

exact size signed integer types

MPI_UINT8_T, MPI_UINT16_T, MPI_UINT32_T, MPI_UINT64_T

exact size unsigned integer types

MPI_BYTE

buffer with raw data

MPI_PACKED

buffer with packed data

Local Derived Types. 

int MPI_Type_commit (MPI_Datatype *datatype);
int MPI_Type_free (MPI_Datatype *datatype);
int MPI_Type_dup (MPI_Datatype oldtype, MPI_Datatype *newtype);

int MPI_Type_contiguous (int count, MPI_Datatype oldtype, MPI_Datatype *newtype);
int MPI_Type_vector (int count, int blocklength, int stride, MPI_Datatype oldtype, MPI_Datatype *newtype);

int MPI_Type_indexed (
    int count,
    const int array_of_blocklengths [],
    const int array_of_displacements [],
    MPI_Datatype oldtype,
    MPI_Datatype *newtype);

int MPI_Type_create_struct (
    int count,
    const int array_of_blocklengths [],
    const MPI_Aint array_of_displacements [],
    const MPI_Datatype array_of_types [],
    MPI_Datatype *newtype);

int MPI_Type_create_subarray (
    int ndims,
    const int array_of_sizes [], const int array_of_subsizes [],
    const int array_of_starts [],
    int order,
    MPI_Datatype oldtype,
    MPI_Datatype *newtype);
  • elements of basic types

  • offset for each element

  • also versions with byte offsets

  • also functions for introspecting derived types

  • also functions for data import and export in canonical format

MPI_ORDER_C

row major order

MPI_ORDER_FORTRAN

column major order

Distributed Derived Types. 

int MPI_Type_create_darray (
    int size, int rank,
    int ndims,
    const int array_of_gsizes [],
    const int array_of_distribs [], const int array_of_dargs [],
    const int array_of_psizes [],
    int order,
    MPI_Datatype oldtype,
    MPI_Datatype *newtype);
MPI_DISTRIBUTE_BLOCK

sequential block distribution (AAABBBCCC...)

MPI_DISTRIBUTE_CYCLIC

cyclic element distribution (ABC...ABC...ABC...)

Conversions. 

  • MPI types provided by communication parties must be the same

  • representation conversion for portability is performed as necessary

  • representation of derived types with byte offsets may not be portable

2.13.3. Collective Communication

Collective Communication Primitives

int MPI_Gather (
    const void *sendbuf, int sendcount, MPI_Datatype sendtype,
    void *recvbuf, int recvcount, MPI_Datatype recvtype,
    int root, MPI_Comm comm);
int MPI_Gather_c(
    const void *sendbuf, MPI_Count sendcount, MPI_Datatype sendtype,
    void *recvbuf, MPI_Count recvcount, MPI_Datatype recvtype,
    int root, MPI_Comm comm);

int MPI_Gatherv (
    const void *sendbuf, int sendcount, MPI_Datatype sendtype,
    void *recvbuf, const int recvcounts [], const int displs [], MPI_Datatype recvtype,
    int root, MPI_Comm comm);
int MPI_Gatherv_c (...);

int MPI_Igather (
    const void *sendbuf, int sendcount, MPI_Datatype sendtype,
    void *recvbuf, int recvcount, MPI_Datatype recvtype,
    int root, MPI_Comm comm,
    MPI_Request *request);
int MPI_Igather_c (...);

int MPI_Igatherv (
    const void *sendbuf, int sendcount, MPI_Datatype sendtype,
    void *recvbuf, const int recvcounts [], const int displs [],
    MPI_Datatype recvtype, int root, MPI_Comm comm,
    MPI_Request *request);
int MPI_Igatherv_c (...);

int MPI_Gather_init (
    const void *sendbuf, int sendcount, MPI_Datatype sendtype,
    void *recvbuf, int recvcount, MPI_Datatype recvtype,
    int root, MPI_Comm comm, MPI_Info info,
    MPI_Request *request);
int MPI_Gather_init_c (...);

int MPI_Gatherv_init (
    const void *sendbuf, int sendcount, MPI_Datatype sendtype,
    void *recvbuf, const int recvcounts [], const int displs [], MPI_Datatype recvtype,
    int root, MPI_Comm comm, MPI_Info info,
    MPI_Request *request);
int MPI_Gatherv_init_c (...);
Bcast

sender A, receivers A, A, A

Gather

senders A, B, C, receiver ABC

Scatter

sender ABC, receivers A, B, C

Allgather

senders A, B, C, receivers ABC, ABC, ABC

Alltoall

senders ABC, DEF, GHI, receivers ADG, BEH, CFI

Reduce

senders A, B, C, receiver A+B+C

Allreduce

senders A, B, C, receivers A+B+C, A+B+C, A+B+C

Reduce_scatter

senders ABC, DEF, GHI, receivers A+D+G, B+E+H, C+F+I

Scan

senders A, B, C, receivers A, A+B, A+B+C

Exscan

senders A, B, C, receivers N/A, A, A+B

Barrier

rendez vous

  • semantics differ for intra and inter communication

  • intra communication can use special argument for single buffer

Reduction Operations

MPI_SUM, MPI_PROD
MPI_MIN, MPI_MINLOC
MPI_MAX, MPI_MAXLOC
MPI_LAND, MPI_LOR, MPI_LXOR
MPI_BAND, MPI_BOR, MPI_BXOR

int MPI_Op_create (MPI_User_function *user_fn, int commute, MPI_Op *op);
int MPI_Op_free (MPI_Op *op);

typedef void MPI_User_function (void *invec, void *inoutvec, int *len, MPI_Datatype *datatype);
  • must be associative

  • may be commutative

2.13.4. Virtual Process Topologies

Virtual Topology Creation

int MPI_Cart_create (
    MPI_Comm comm_old,
    int ndims, const int dims [], const int periods [],
    int reorder, MPI_Comm *comm_cart);

int MPI_Cart_sub (MPI_Comm comm, const int remain_dims [], MPI_Comm *newcomm);

int MPI_Graph_create(
    MPI_Comm comm_old,
    int nnodes, const int index [], const int edges [],
    int reorder, MPI_Comm *comm_graph);

int MPI_Dist_graph_create_adjacent (
    MPI_Comm comm_old,
    int indegree, const int sources [], const int sourceweights [],
    int outdegree, const int destinations [], const int destweights [],
    MPI_Info info, int reorder, MPI_Comm *comm_dist_graph);

int MPI_Dist_graph_create (
    MPI_Comm comm_old,
    int n, const int sources [], const int degrees [], const int destinations [], const int weights [],
    MPI_Info info, int reorder, MPI_Comm *comm_dist_graph);
cartesian

cartesian grid with optionally periodic dimensions

centralized graph

centralized graph definition with node degrees and flattened edge list

adjacent distributed graph

distributed graph definition which specifies incoming and outgoing edges at each node

general distributed graph

distributed graph definition which specifies arbitrary subset of edges at each node

Virtual Topology Addressing

int MPI_Cart_rank (MPI_Comm comm, const int coords [], int *rank);
int MPI_Cart_coords (MPI_Comm comm, int rank, int maxdims, int coords []);

int MPI_Cart_shift (MPI_Comm comm, int direction, int disp, int *rank_source, int *rank_dest);

int MPI_Graph_neighbors (MPI_Comm comm, int rank, int maxneighbors, int neighbors []);

int MPI_Dist_graph_neighbors (
    MPI_Comm comm,
    int maxindegree, int sources [], int sourceweights [],
    int maxoutdegree, int destinations [], int destweights []);

Virtual Topology Communication

int MPI_Neighbor_allgather (
    const void *sendbuf, int sendcount, MPI_Datatype sendtype,
    void *recvbuf, int recvcount, MPI_Datatype recvtype,
    MPI_Comm comm);
int MPI_Neighbor_allgatherv (
    const void *sendbuf, int sendcount, MPI_Datatype sendtype,
    void *recvbuf, const int recvcounts [], const int displs [], MPI_Datatype recvtype,
    MPI_Comm comm);

int MPI_Neighbor_alltoall (
    const void *sendbuf, int sendcount, MPI_Datatype sendtype,
    void *recvbuf, int recvcount, MPI_Datatype recvtype,
    MPI_Comm comm);
int MPI_Neighbor_alltoallv (
    const void *sendbuf, const int sendcounts [], const int sdispls [], MPI_Datatype sendtype,
    void *recvbuf, const int recvcounts [], const int rdispls [], MPI_Datatype recvtype,
    MPI_Comm comm);

int MPI_Neighbor_alltoallw (
    const void *sendbuf, const int sendcounts [], const MPI_Aint sdispls [], const MPI_Datatype sendtypes [],
    void *recvbuf, const int recvcounts [], const MPI_Aint rdispls [], const MPI_Datatype recvtypes [],
    MPI_Comm comm);

2.13.5. Remote Memory Access

Memory Window Initialization

int MPI_Win_create (
    void *base, MPI_Aint size, int disp_unit,
    MPI_Info info, MPI_Comm comm, MPI_Win *win);
int MPI_Win_create_c (
    void *base, MPI_Aint size, MPI_Aint disp_unit,
    MPI_Info info, MPI_Comm comm, MPI_Win *win);

int MPI_Win_allocate (
    MPI_Aint size, int disp_unit,
    MPI_Info info, MPI_Comm comm, void *baseptr, MPI_Win *win);
int MPI_Win_allocate_c (
    MPI_Aint size, MPI_Aint disp_unit,
    MPI_Info info, MPI_Comm comm, void *baseptr, MPI_Win *win);

int MPI_Win_allocate_shared (
    MPI_Aint size, int disp_unit,
    MPI_Info info, MPI_Comm comm, void *baseptr, MPI_Win *win);
int MPI_Win_allocate_shared_c (
    MPI_Aint size, MPI_Aint disp_unit,
    MPI_Info info, MPI_Comm comm, void *baseptr, MPI_Win *win);

int MPI_Win_create_dynamic (MPI_Info info, MPI_Comm comm, MPI_Win *win);
int MPI_Win_attach (MPI_Win win, void *base, MPI_Aint size);
int MPI_Win_detach (MPI_Win win, const void *base);

int MPI_Win_free (MPI_Win *win);
disp_unit

unit size used in scaling remote offsets

  • shared windows only possible when hardware architecture permits that

  • dynamic windows permit dynamic attaching and detaching of memory with given size

Remote memory access requires synchronization. Passive target synchronization assumes the target does not explicitly participate in synchronization and the origin process explicitly delimits access epochs with calls to MPI_Win_flush or MPI_Win_lock and MPI_Win_unlock. Active target synchronization assumes the target makes data available by explicitly delimiting exposure epochs with calls to MPI_Win_fence or MPI_Win_post and MPI_Win_wait, and the origin process explicitly delimits access epochs with calls to MPI_Win_fence or MPI_Win_start and MPI_Win_complete.

Accessing Remote Memory

int MPI_Put (
    void *origin_addr, int origin_count, MPI_Datatype origin_datatype,
    int target_rank, int target_disp, int target_count, MPI_Datatype target_datatype,
    MPI_Win win);

int MPI_Get (
    void *origin_addr, int origin_count, MPI_Datatype origin_datatype,
    int target_rank, MPI_Aint target_disp, int target_count, MPI_Datatype target_datatype,
    MPI_Win win);

int MPI_Accumulate (
    const void *origin_addr, int origin_count, MPI_Datatype origin_datatype,
    int target_rank, MPI_Aint target_disp, int target_count, MPI_Datatype target_datatype,
    MPI_Op op,
    MPI_Win win);

int MPI_Get_accumulate (
    const void *origin_addr, int origin_count, MPI_Datatype origin_datatype,
    void *result_addr, int result_count, MPI_Datatype result_datatype,
    int target_rank, MPI_Aint target_disp, int target_count, MPI_Datatype target_datatype,
    MPI_Op op,
    MPI_Win win);

int MPI_Fetch_and_op (
    const void *origin_addr, void *result_addr, MPI_Datatype datatype,
    int target_rank, MPI_Aint target_disp,
    MPI_Op op,
    MPI_Win win);

int MPI_Compare_and_swap (
    const void *origin_addr, const void *compare_addr, void *result_addr, MPI_Datatype datatype,
    int target_rank, MPI_Aint target_disp,
    MPI_Win win);
  • local access requires explicit synchronization

Passive Target Synchronization. 

int MPI_Win_flush (int rank, MPI_Win win);
int MPI_Win_flush_all (MPI_Win win);
int MPI_Win_flush_local (int rank, MPI_Win win);
int MPI_Win_flush_local_all (MPI_Win win);

int MPI_Win_lock (int lock_type, int rank, int assert, MPI_Win win);
int MPI_Win_lock_all (int assert, MPI_Win win);
int MPI_Win_unlock (int rank, MPI_Win win);
int MPI_Win_unlock_all (MPI_Win win);

Active Target Synchronization. 

int MPI_Win_fence (int assert, MPI_Win win);

int MPI_Win_start (MPI_Group group, int assert, MPI_Win win);
int MPI_Win_complete (MPI_Win win);

int MPI_Win_post (MPI_Group group, int assert, MPI_Win win);
int MPI_Win_wait (MPI_Win win);
  • assert parameter specifies application hints for optimization

  • starting and completing delineates epoch of remote window access

  • posting and waiting delineates epoch of exposure to remote window access

2.13.6. References

  1. The MPI Forum Home Page. https://www.mpi-forum.org

  2. The MPICH Project Home Page. https://www.mpich.org

  3. The OpenMPI Project Home Page. https://www.open-mpi.org