MHPCC

SP Parallel Programming Workshop
Message Passing Interface (MPI)


© Copyright Statement

Table of Contents
  1. Prerequisites

  2. What Is MPI?

  3. MPI Message Passing Definitions

  4. Environment Management Routines

  5. Point to Point Communication Routines
    1. MPI Message Passing Subroutine Arguments
    2. Blocking Message Passing Routines
    3. Non-Blocking Message Passing Routines

  6. Collective Communication Routines

  7. Group and Communicator Management Routines

  8. Derived Data Types

  9. Virtual Topologies

  10. MPE Multi Processing Environment
    1. MPE Logging Routines
    2. MPE Graphics Routines

  11. Case Studies

  12. MPI At The MHPCC

  13. MPI Subroutine Man Pages

  14. References, Acknowledgements, WWW Resources

  15. Exercises

Prerequisites


What Is MPI?



MPI Message Passing Definitions


Rank

Every process has its own unique, integer identifier assigned by the system when the process initializes. A rank is sometimes also called a "process ID". Ranks are contiguous and begin at zero.

Used by the programmer to specify the source and destination of messages. Often used conditionally by the application to control program execution (if rank=0 do this / if rank=1 do that).

Group

A group is an ordered set of N processes. Each process in a group is associated with a unique integer rank. Ranks begin at 0 and range in sequence to N - 1.

A group possesses its own unique indentifier (handle) assigned by the system and unknown to the user. That is, a group is represented by an opaque object - the user does not know the internal structure and must call an inquiry routine to determine its attributes.

A group is always associated with a communicator. Initially, all processes are members of the group given by the predefined communicator MPI_COMM_WORLD.

Communicator

A communicator defines the collection of processes (group) which may communicate with each other (context). MPI uses this combination of group and context to guarantee safe communications and avoid potential problems with message confusion when a user application calls a library routine that performs message passing. That is, the library routine may be using a message identical to a user's. The user has no knowledge of this if the library was developed independently from the user.

A communicator posseses its own unique indentifier (handle) assigned by the system and unknown to the user. That is, a communicator is represented by an opaque object - the user does not know the internal structure and must call an inquiry routine to determine its attributes.

Most MPI subroutines require you to specify the communicator as an argument. MPI_COMM_WORLD is the predefined communicator which includes all processes in the MPI application.

Application buffer

The address space that holds the data that you want to send or receive. For example, your program uses a variable called, "inmsg". The application buffer for inmsg is the program memory location where the value of inmsg resides.

System buffer

System space for storing messages. Depending upon the type of send/ receive operation, data in the application buffer may be required to be copied to/from system buffer space. Allows communication to be asynchronous.

Blocking Communication

A communication routine is blocking if the completion of the call is dependent on certain "events". For sends, the data must be successfully sent or safely copied to system buffer space so that the applicaton buffer that contained the data is available for reuse. For receives, the data must be safely stored in the receive buffer so that it is ready for use.

Non-blocking Communication

A communication routine is nonblocking if the call returns without waiting for any communications events to complete (such as copying of message from user memory to system memory or arrival of message).

It is not safe to modify or use the application buffer after completion of a non-blocking send. It is the programmer's responsibility to insure that the application buffer is free for reuse.

Non-blocking communications are primarily used to overlap computation with communication to effect performance gains.

Standard Send

The basic send operation used to transmit data from one process to another. May be either blocking or non-blocking.

Synchronous Send

Blocks until the corresponding receive has actually occurred on the destination process. May be either blocking or non-blocking.

Buffered Send

The programmer allocates a buffer for the data to be placed until it can be sent. Designed to help insure that the necessary buffer space is available if there is uncertainty about the availability of system buffer space. May be either blocking or non-blocking.

Ready Send

A type of send that may be used if the programmer is certain that the matching receive has already been posted. Implementation dependent on how this send may be implemented - could be a standard send or an optimized protocol if such exists. It is a programmer error to use a ready send if the matching receive has not already been posted. May be either blocking or non-blocking.

Standard Receive

The basic receive operation used to accept data sent from another process. May be either blocking or non-blocking. May be used to receive data from any of the 8 possible send operations.

Return Code

An integer value returned by the system upon the completion of a subroutine. Generally used as an error code. MPI Fortran subroutines usually include the return code as an extra argument in the subroutine parameter list, whereas the C subroutines do not.

Environment Management Routines


Several of the more commonly used MPI environment management routines are described below.

MPI_Init

Initializes the MPI execution environment. This function must be called in every MPI program, must be called before any other MPI functions and must be called only once in an MPI program. For C programs, MPI_Init may be used to pass the command line arguments to all processes, although this is not required by the standard and is implementation dependent.
 
    MPI_Init (argc,argv) 
    MPI_INIT (ierror)
    
MPI_Finalize

Terminates the MPI execution environment. This function should be the last MPI routine called in every MPI program - no other MPI routines may be called after it.
 
    MPI_Finalize ()
    MPI_FINALIZE (ierror)
    
MPI_Comm_size

Determines the number of processes in the group associated with a communicator. Generally used within the communicator MPI_COMM_WORLD to determine the number of processes being used by your application.
 
    MPI_Comm_size (comm,*size) 
    MPI_COMM_SIZE (comm,size,ierror)
    
MPI_Comm_rank

Determines the rank of the calling process within the communicator. Initially, each process will be assigned a unique integer rank between 0 and number of processors - 1 within the communicator MPI_COMM_WORLD. This rank is often referred to as a task ID. If a process becomes associated with other communicators, it will have a unique rank within each of these as well.

    MPI_Comm_rank (comm,*rank) 
    MPI_COMM_RANK (comm,rank,ierror) 
    
MPI_Abort

Terminates all MPI processes associated with the communicator. In most MPI implementations it terminates ALL processes regardless of the communicator specified.

    MPI_Abort (comm,errorcode)
    MPI_ABORT (comm,errorcode,ierror) 
    
MPI_Get_processor_name

Gets the name of the processor on which the command is executed. Also returns the length of the name. The buffer for "name" must be at least MPI_MAX_PROCESSOR_NAME characters in size. What is returned into "name" is implementation dependent - may not be the same as the output of the "hostname" or "host" shell commands.

    MPI_Get_processor_name (*name,*resultlength) 
    MPI_GET_PROCESSOR_NAME (name,resultlength,ierror)
    
MPI_Initialized

Indicates whether MPI_Init has been called - returns flag as either logical true (1) or false(0). MPI requires that MPI_Init be called once and only once by each process. This may pose a problem for modules that want to use MPI and are prepared to call MPI_Init if necessary. MPI_Initialized solves this problem.

    MPI_Initialized (*flag) 
    MPI_INITIALIZED (flag,ierror)
    
MPI_Wtime

Returns an elapsed wall clock time in seconds (double precision) on the calling processor.

    MPI_Wtime ()
    MPI_WTIME ()
    
MPI_Wtick

Returns the resolution in seconds (double precision) of MPI_Wtime.

    MPI_Wtick ()
    MPI_WTICK ()
    

Environment Management Routines Example - C Language


   #include "mpi.h"
   #include <stdio.h>

   int main(argc,argv)
   int argc;
   char *argv[]; {
   int    numtasks, rank, rc; 

   rc = MPI_Init(&argc,&argv);
   if (rc != 0) {
     printf ("Error starting MPI program. Terminating.\n");
     MPI_Abort(MPI_COMM_WORLD, rc);
     }

   MPI_Comm_size(MPI_COMM_WORLD,&numtasks);
   MPI_Comm_rank(MPI_COMM_WORLD,&rank);
   printf ("Number of tasks= %d My rank= %d\n", numtasks,rank);

   /*******  do some work *******/

   MPI_Finalize();
   }

Environment Management Routines Example - Fortran


     program simple
     include 'mpif.h'

     integer numtasks, rank, ierr, rc

     call MPI_INIT(ierr)
     if (ierr .ne. 0) then
        print *,'Error starting MPI program. Terminating.'
        call MPI_ABORT(MPI_COMM_WORLD, rc, ierr)
     end if

     call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
     call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
     print *, 'Number of tasks=',numtasks,' My rank=',rank

C    ****** do some work ******

     call MPI_FINALIZE(ierr)

     end

Point to Point Communication Routines
MPI Message Passing Subroutine Arguments


Buffer

Program (application) address space which references the data that is to be sent or or received. In most cases, this is simply the variable name that is be sent/received. For C programs, this argument is passed by reference and usually must be prepended with an ampersand: &var1

Data Count

Indicates the number of data elements of a particular type to be sent.

Data Type

For reasons of portability, MPI predefines its data types. Programmers may also create their own data types (derived types). Note that the MPI types MPI_BYTE and MPI_PACKED do not correspond to standard C or Fortran types.

MPI C data types


MPI_CHAR:             signed char
MPI_SHORT:            signed short int
MPI_INT:              signed int
MPI_LONG:             signed long int
MPI_UNSIGNED_CHAR:    unsigned char
MPI_UNSIGNED_SHORT:   unsigned short int
MPI_UNSIGNED:         unsigned int
MPI_UNSIGNED_LONG:    unsigned long int
MPI_FLOAT:            float
MPI_DOUBLE:           double
MPI_LONG_DOUBLE:      long double
MPI_BYTE:             8 binary digits; can be 
                      used to turn off data 
                      format conversion 
MPI_PACKED:           data packed/unpacked with 
                      MPI_Pack()/MPI_Unpack
    

MPI Fortran data types


MPI_CHARACTER:        character(1)
MPI_INTEGER:          integer
MPI_REAL:             real
MPI_DOUBLE_PRECISION: double precision
MPI_COMPLEX:          complex
MPI_LOGICAL:          logical
MPI_BYTE:             8 binary digits; can be 
                      used to turn off data 
                      format conversion 
MPI_PACKED:           data packed/unpacked with 
                      MPI_Pack()/MPI_Unpack
    

Destination

An argument to send routines which indicates the process where a message should be delivered. Specified as the rank of the receiving process.

Source

An argument to receive routines which indicates the originating process of the message. Specified as the rank of the sending process. This may be set to the wild card MPI_ANY_SOURCE to receive a message from any task.

Tag

Arbitrary integer assigned by the programmer to uniquely identify a message. Send and receive operations should match message tags. For a receive operations, the wild card MPI_ANY_TAG can be used to receive any message regardless of its tag. MPI guarantees that integers 0-32767 can be used as tags, but most implementations allow much larger values.

Communicator

Indicates the communication context, or set of processes for which the source or destination fields are valid. Unless the programmer is explicitly creating new communicators, the predefined communicator MPI_COMM_WORLD is usually used.

Status

For a receive operation, indicates the source of the message and the tag of the message. In C, this argument is a pointer to a predefined structure MPI_Status (ex. stat.MPI_SOURCE stat.MPI_TAG). In Fortran, it is an integer array of size MPI_STATUS_SIZE (ex. stat(MPI_SOURCE) stat(MPI_TAG)). Additionally, the actual number of bytes received are obtainable from Status via the MPI_Get_count routine.

Request

Used by non-blocking send and receive operations. Since non-blocking operations may return before the requested system buffer space is obtained, the system issues a unique "request number". The programmer uses this system assigned "handle" later (in a WAIT type routine) to determine completion of the non-blocking operation. In C, this argument is a pointer to a predefined structure MPI_Request. In Fortran, it is an integer.

Point to Point Communication Routines
Blocking Message Passing Routines


The more commonly used MPI blocking message passing routines are described below.

MPI_Send

Basic blocking send operation. Routine returns only after the application buffer in the sending task is free for reuse. Note that this routine may be implemented differently on different systems. The MPI standard permits the use of a system buffer but does not require it. Some implementations may actually use a synchronous send (discussed below) to implement the basic blocking send.

    MPI_Send (*buf,count,datatype,dest,tag,comm) 
    MPI_SEND (buf,count,datatype,dest,tag,comm,ierror)
    
MPI_Recv

Receive a message and block until the requested data is available in the application buffer in the receiving task.

    MPI_Recv (*buf,count,datatype,source,tag,comm,*status) 
    MPI_RECV (buf,count,datatype,source,tag,comm,status,ierror) 
    
MPI_Ssend

Synchronous blocking send: Send a message and block until the application buffer in the sending task is free for reuse and the destination process has received the message.

    MPI_Ssend (*buf,count,datatype,dest,tag,comm,ierror) 
    MPI_SSEND (buf,count,datatype,dest,tag,comm,ierror)
    
MPI_Bsend

Buffered blocking send: permits the programmer to allocate the required amount of buffer space into which data can be copied until it is delivered. Insulates against the problems associated with insufficient system buffer space. Routine returns after the data has been copied from application buffer space to the allocated send buffer. Must be used with the MPI_Buffer_attach routine.

    MPI_Bsend (*buf,count,datatype,dest,tag,comm) 
    MPI_BSEND (buf,count,datatype,dest,tag,comm,ierror)
    
MPI_Buffer_attach
MPI_Buffer_detach

Used by programmer to allocate/deallocate message buffer space to be used by the MPI_Bsend routine. The size argument is specified in actual data bytes - not a count of data elements.

    MPI_Buffer_attach (*buffer,size) 
    MPI_Buffer_detach (*buffer,size) 
    MPI_BUFFER_ATTACH (buffer,size,ierror) 
    MPI_BUFFER_DETACH (buffer,size,ierror)
    
MPI_Rsend

Blocking ready send. Should only be used if the programmer is certain that the matching receive has already been posted.

    MPI_Rsend (*buf,count,datatype,dest,tag,comm) 
    MPI_RSEND (buf,count,datatype,dest,tag,comm,ierror)
    
MPI_Sendrecv

Send a message and post a receive before blocking. Will block until the sending application buffer is free for reuse and until the receiving application buffer contains the received message.

    MPI_Sendrecv (*sendbuf,sendcount,sendtype,dest,sendtag,
                 *recvbuf,recvcount,recvtype,source,recvtag, 
                 comm,*status) 
    MPI_SENDRECV (sendbuf,sendcount,sendtype,dest,sendtag, 
                 recvbuf,recvcount,recvtype,source,recvtag,
                 comm,status,ierror)
    
MPI_Probe

Performs a blocking test for a message. The "wildcards" MPI_ANY_SOURCE and MPI_ANY_TAG may be used to test for a message from any source or with any tag. For the C routine, the actual source and tag will be returned in the status structure as status.MPI_SOURCE and status.MPI_TAG. For the Fortran routine, they will be returned in the integer array status(MPI_SOURCE) and status(MPI_TAG).

    MPI_Probe (source,tag,comm,*status)
    MPI_PROBE (source,tag,comm,status,ierr)
    

Blocking Message Passing Routines Example


Fortran version available here.

Task 0 - pings task 1 and awaits return ping


   #include "mpi.h"
   #include <stdio.h>

   int main(argc,argv) 
   int argc;
   char *argv[];  {
   int numtasks, rank, dest, source, rc, tag=1;  
   char inmsg, outmsg='x';
   MPI_Status Stat;

   MPI_Init(&argc,&argv);
   MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);

   if (rank == 0) {
     dest = 1;
     source = 1;
     rc = MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
     rc = MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat);
     } 

   else if (rank == 1) {
     dest = 0;
     source = 0;
     rc = MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat);
     rc = MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
     }

   MPI_Finalize();
   }

Point to Point Communication Routines
Non-Blocking Message Passing Routines


The more commonly used MPI non-blocking message passing routines are described below.

MPI_Isend

Identifies an area in memory to serve as a send buffer. Processing continues immediately without waiting for the message to be copied out from the application buffer. A communication request handle is returned for handling the pending message status. The program should not modify the application buffer until subsequent calls to MPI_Wait or MPI_Test indicates that the non-blocking send has completed.

    MPI_Isend (*buf,count,datatype,dest,tag,comm,*request) 
    MPI_ISEND (buf,count,datatype,dest,tag,comm,request,ierror)
    
MPI_Irecv

Identifies an area in memory to serve as a receive buffer. Processing continues immediately without actually waiting for the message to be received and copied into the the application buffer. A communication request handle is returned for handling the pending message status. The program must use calls to MPI_Wait or MPI_Test to determine when the non-blocking receive operation completes and the requested message is available in the application buffer.

    MPI_Irecv (*buf,count,datatype,source,tag,comm,*request) 
    MPI_IRECV (buf,count,datatype,source,tag,comm,request,ierror)
    
MPI_Issend

Non-blocking synchronous send. Similar to MPI_Isend(), except MPI_Wait() or MPI_Test() indicates when the destination process has received the message.

    MPI_Issend (*buf,count,datatype,dest,tag,comm,*request) 
    MPI_ISSEND (buf,count,datatype,dest,tag,comm,request,ierror)
    
MPI_Ibsend

Non-blocking buffered send. Similar to MPI_Bsend() except MPI_Wait() or MPI_Test() indicates when the destination process has received the message. Must be used with the MPI_Buffer_attach routine.

    MPI_Ibsend (*buf,count,datatype,dest,tag,comm,*request) 
    MPI_IBSEND (buf,count,datatype,dest,tag,comm,request,ierror)
    
MPI_Irsend

Non-blocking ready send. Similar to MPI_Rsend() except MPI_Wait() or MPI_Test() indicates when the destination process has received the message. Should only be used if the programmer is certain that the matching receive has already been posted.

    MPI_Irsend (*buf,count,datatype,dest,tag,comm,*request) 
    MPI_IRSEND (buf,count,datatype,dest,tag,comm,request,ierror)
    
MPI_Wait
MPI_Waitany
MPI_Waitall
MPI_Waitsome

MPI_Wait blocks until a specified non-blocking send or receive operation has completed. For multiple non-blocking operations, the programmer can specify any, all or some completions.

    MPI_Wait     (*request,*status)
    MPI_Waitany  (count,*array_of_requests,*index,*status) 
    MPI_Waitall  (count,*array_of_requests,*array_of_statuses)
    MPI_Waitsome (incount,*array_of_requests,*outcount,
                 *array_of_offsets, *array_of_statuses) 
    MPI_WAIT     (request,status,ierror)
    MPI_WAITANY  (count,array_of_requests,index,status,ierror)
    MPI_WAITALL  (count,array_of_requests,array_of_statuses,
                 ierror)
    MPI_WAITSOME (incount,array_of_requests,outcount,
                 array_of_offsets, array_of_statuses,ierror)
    
MPI_Test
MPI_Testany
MPI_Testall
MPI_Testsome

MPI_Test checks the status of a specified non-blocking send or receive operation. The "flag" parameter is returned logical true (1) if the operation has completed, and logical false (0) if not. For multiple non-blocking operations, the programmer can specify any, all or some completions.

    MPI_Test     (*request,*flag,*status) 
    MPI_Testany  (count,*array_of_requests,*index,*flag,*status)
    MPI_Testall  (count,*array_of_requests,*flag,*array_of_statuses)
    MPI_Testsome (incount,*array_of_requests,*outcount,
                 *array_of_offsets, *array_of_statuses)
    MPI_TEST     (request,flag,status,ierror)
    MPI_TESTANY  (count,array_of_requests,index,flag,status,ierror)
    MPI_TESTALL  (count,array_of_requests,flag,array_of_statuses,ierror)
    MPI_TESTSOME (incount,array_of_requests,outcount,
                 array_of_offsets, array_of_statuses,ierror)
    
MPI_Iprobe

Performs a non-blocking test for a message. The "wildcards" MPI_ANY_SOURCE and MPI_ANY_TAG may be used to test for a message from any source or with any tag. The integer "flag" parameter is returned logical true (1) if a message has arrived, and logical false (0) if not. For the C routine, the actual source and tag will be returned in the status structure as status.MPI_SOURCE and status.MPI_TAG. For the Fortran routine, they will be returned in the integer array status(MPI_SOURCE) and status(MPI_TAG).

    MPI_Iprobe (source,tag,comm,*flag,*status)
    MPI_IPROBE (source,tag,comm,flag,status,ierr)
    

Non-Blocking Message Passing Routines Example


Fortran version available here.

Nearest neighbor exchange in ring topology


   #include "mpi.h"
   #include <stdio.h>

   int main(argc,argv)
   int argc;
   char *argv[];  {
   int numtasks, rank, next, prev, buf[2], tag1=1, tag2=2;
   MPI_Request reqs[4];
   MPI_Status stats[4];

   MPI_Init(&argc,&argv);
   MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);

   prev = rank-1;
   next = rank+1;
   if (rank == 0)  prev = numtasks - 1;
   if (rank == (numtasks - 1))  next = 0;

   MPI_Irecv(&buf[0], 1, MPI_INT, prev, tag1, MPI_COMM_WORLD, &reqs[0]);
   MPI_Irecv(&buf[1], 1, MPI_INT, next, tag2, MPI_COMM_WORLD, &reqs[1]);

   MPI_Isend(&rank, 1, MPI_INT, prev, tag2, MPI_COMM_WORLD, &reqs[2]);
   MPI_Isend(&rank, 1, MPI_INT, next, tag1, MPI_COMM_WORLD, &reqs[3]);
 
   MPI_Waitall(4, reqs, stats);

   MPI_Finalize();
   }

Collective Communication Routines



Collective Communication Routines


MPI_Barrier

Creates a barrier synchronization in a group. Each task, when reaching the MPI_Barrier call, blocks until all tasks in the group reach the same MPI_Barrier call.

    MPI_Barrier (comm) 
MPI_BARRIER (comm,ierror)
MPI_Bcast

Broadcasts (sends) a message from the process with rank "root" to all other processes in the group. Diagram here.

    MPI_Bcast (*buffer,count,datatype,root,comm) 
MPI_BCAST (buffer,count,datatype,root,comm,ierror)
MPI_Scatter

Distributes distinct messages from a single source task to each task in the group. Diagram here.

    MPI_Scatter (*sendbuf,sendcnt,sendtype,*recvbuf, 
                recvcnt,recvtype,root,comm)
    MPI_SCATTER (sendbuf,sendcnt,sendtype,recvbuf, 
                recvcnt,recvtype,root,comm,ierror)
    
MPI_Gather

Gathers distinct messages from each task in the group to a single destination task. This routine is the reverse operation of MPI_Gather. Diagram here.

    MPI_Gather (*sendbuf,sendcnt,sendtype,*recvbuf,
               recvcount,recvtype,root,comm)
    MPI_GATHER (sendbuf,sendcnt,sendtype,recvbuf,
               recvcount,recvtype,root,comm,ierror)

    
MPI_Allgather

Concatenation of data to all tasks in a group. Each task in the group, in effect, performs a one-to-all broadcasting operation within the group. Diagram here.

    MPI_Allgather (*sendbuf,sendcount,sendtype,*recvbuf, 
                  recvcount,recvtype,comm)
    MPI_ALLGATHER (sendbuf,sendcount,sendtype,recvbuf, 
                  recvcount,recvtype,comm,info)
    
MPI_Reduce

Applies a reduction operation on all tasks in the group and places the result in one task. Diagram here.

    MPI_Reduce (*sendbuf,*recvbuf,count,datatype,op,
               root,comm)
    MPI_REDUCE (sendbuf,recvbuf,count,datatype,op,
               root,comm,ierror)
    
The predefined MPI reduction operations appear below. Users can also define their own reduction functions.

-----------------------------------------------------------------
MPI Reduction Operation   C data types       FORTRAN data types
-----------------------------------------------------------------
MPI_MAX    maximum       integer, float      integer, real, complex  
MPI_MIN    minimum       integer, float      integer, real, complex  
MPI_SUM    sum          integer, float      integer, real, complex  
MPI_PROD    product      integer, float      integer, real, complex  
MPI_LAND    logical and   integer           logical                 
MPI_BAND    bit-wise and  integer, MPI_BYTE   integer, MPI_BYTE      
MPI_LOR    logical or    integer            logical                 
MPI_BOR    bit-wise or   integer, MPI_BYTE   integer, MPI_BYTE       
MPI_LXOR    logical xor   integer           logical                 
MPI_BXOR    bit-wise xor  integer, MPI_BYTE   integer, MPI_BYTE      
MPI_MAXLOC  max value    combination of int  combination of integer     
          and location  float, double and   real, complex,     
                        long double         double precision        
MPI_MINLOC  min value    combination of int  combination of integer     
          and location  float, double and   real, complex,     
                        long double         double precision        
-----------------------------------------------------------------
MPI_Allreduce

Applies a reduction operation and places the result in all tasks in the group. This is equivalent to an MPI_Reduce followed by an MPI_Bcast. Diagram here.

    MPI_Allreduce (*sendbuf,*recvbuf,count,datatype,op,comm)
    MPI_ALLREDUCE (sendbuf,recvbuf,count,datatype,op,comm,ierror)
    
MPI_Alltoall

Each task in a group performs a scatter operation, sending a distinct message to all the tasks in the group in order by index. Diagram here.

    MPI_Alltoall (*sendbuf,sendcount,sendtype,*recvbuf,
                 recvcnt,recvtype,comm)
    MPI_ALLTOALL (sendbuf,sendcount,sendtype,recvbuf,
                 recvcnt,recvtype,comm,ierror)
    
MPI_Scan

Performs a scan operation with respect to a reduction operation across a task group. Diagram here.

    MPI_Scan (*sendbuf,*recvbuf,count,datatype,op,comm)
    MPI_SCAN (sendbuf,recvbuf,count,datatype,op,comm,ierror)
    

Collective Communications Example


Fortran version available here.

Perform a scatter operation on the rows of an array


   #include "mpi.h"
   #include <stdio.h>
   #define SIZE 4

   int main(argc,argv)
   int argc;
   char *argv[];  {
   int numtasks, rank, sendcount, recvcount, source;
   float sendbuf[SIZE][SIZE] = {
     {1.0, 2.0, 3.0, 4.0},
     {5.0, 6.0, 7.0, 8.0},
     {9.0, 10.0, 11.0, 12.0},
     {13.0, 14.0, 15.0, 16.0}  };
   float recvbuf[SIZE];

   MPI_Init(&argc,&argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
   MPI_Comm_size(MPI_COMM_WORLD, &numtasks);

   if (numtasks == SIZE) {
     source = 1;
     sendcount = SIZE;
     recvcount = SIZE;
     MPI_Scatter(sendbuf,sendcount,MPI_FLOAT,recvbuf,recvcount,
                MPI_FLOAT,source,MPI_COMM_WORLD);

     printf("rank= %d  Results: %f %f %f %f\n",rank,recvbuf[0],
            recvbuf[1],recvbuf[2],recvbuf[3]);
     }
   else
     printf("Must specify %d processors. Terminating.\n",SIZE);

   MPI_Finalize();
   }
Sample program output:

   rank= 0  Results: 1.000000 2.000000 3.000000 4.000000
   rank= 1  Results: 5.000000 6.000000 7.000000 8.000000
   rank= 2  Results: 9.000000 10.000000 11.000000 12.000000
   rank= 3  Results: 13.000000 14.000000 15.000000 16.000000

Group and Communicator Management Routines



Group and Communicator Management Routines


MPI_Group_difference

Creates a group from the difference of two groups.

    MPI_Group_difference (group1,group2,*newgroup)  
    MPI_GROUP_DIFFERENCE (group1,group2,newgroup,ierror)
    
MPI_Group_excl

Produces a group by reordering an existing group and taking only unlisted members.

    MPI_Group_excl (group,n,*ranks,*newgroup) 
    MPI_GROUP_EXCL (group,n,ranks,newgroup,ierror)
    
MPI_Group_incl

Produces a group by reordering an existing group and taking only listed members.

    MPI_Group_incl (group,n,*ranks,*newgroup) 
    MPI_GROUP_INCL (group,n,ranks,newgroup,ierror) 
    
MPI_Group_intersection

Produces a group as the intersection of two existing groups.

    MPI_Group_intersection (group1,group2,*newgroup) 
    MPI_GROUP_INTERSECTION (group1,group2,newgroup,ierror)
    
MPI_Group_union

Produces a group by combining two groups.

    MPI_Group_union (group1,group2,*newgroup) 
    MPI_GROUP_UNION (group1,group2,newgroup,ierror)
    
MPI_Group_compare

Compares two groups and returns an integer result which is MPI_IDENT if the order and members of the two groups are the same, MPI_SIMILAR if only the members are the same, and MPI_UNEQUAL otherwise.

    MPI_Group_compare (group1,group2,*result) 
    MPI_GROUP_COMPARE (group1,group2,result,ierror)
    
MPI_Group_rank

Returns the rank of this process in the given group or MPI_UNDEFINED if the process is not a member.

    MPI_Group_rank (group,*rank) 
    MPI_GROUP_RANK (group,rank,ierror)
    
MPI_Group_size

Returns the size of a group - number of processes in the group.

    MPI_Group_size (group,*size) 
    MPI_GROUP_SIZE (group,size,ierror)
    
MPI_Group_free

Frees a group

    MPI_Group_free (group) 
    MPI_GROUP_FREE (group,ierror)
    
MPI_Comm_group

Determines the group associated with the given communicator.

    MPI_Comm_group (comm,*group) 
    MPI_COMM_GROUP (comm,group,ierror)
    
MPI_Comm_create

Creates a new communicator from the old communicator and the new group.

    MPI_Comm_create (comm,group,*newcomm) 
    MPI_COMM_CREATE (comm,group,newcomm,ierror)
    
MPI_Comm_dup

Duplicates an existing communicator with all its associated information.

    MPI_Comm_dup (comm,*newcomm) 
    MPI_COMM_DUP (comm,newcomm,ierror)
    
MPI_Comm_compare

Compares two communicators and returns integer result which is MPI_IDENT if the contexts and groups are the same, MPI_CONGRUENT if different contexts but identical groups, MPI_SIMILAR if different contexts but similar groups, and MPI_UNEQUAL otherwise.

    MPI_Comm_compare (comm1,comm2,*result)  
    MPI_COMM_COMPARE (comm1,comm2,result,ierror)
    
MPI_Comm_free

Marks the communicator object for deallocation.

    MPI_Comm_free (*comm)
    MPI_COMM_FREE (comm,ierror)
    

Group and Communicator Routines Example


Fortran version available here.

Create two different groups for separate collective communications exchange. Requires creating new communicators also.


   #include "mpi.h"
   #include <stdio.h>
   #define NPROCS 8
   #define MASTER 0
   #define MSGSIZE 7

   int main(argc,argv)
   int argc;
   char *argv[];  {
   int       rank, new_rank,
            ranks1[4]={0,1,2,3}, 
            ranks2[4]={4,5,6,7};
   char      *msg;
   MPI_Group  orig_group, new_group;
   MPI_Comm   new_comm;

   MPI_Init(&argc,&argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);

   /* Extract the original group handle */
   MPI_Comm_group(MPI_COMM_WORLD, &orig_group);

   /* Divide tasks into two distinct groups. First  */
   /* create new group and then a new communicator. */
   /* Find new rank in new group and setup for the  */
   /* collective communication broadcast if MASTER. */
   if (rank < NPROCS/2) {
     MPI_Group_incl(orig_group, NPROCS/2, ranks1, &new_group);
     MPI_Comm_create(MPI_COMM_WORLD, new_group, &new_comm);
     MPI_Group_rank (new_group, &new_rank);
     if (new_rank == MASTER) msg="Group 1";
     }
   else {
     MPI_Group_incl(orig_group, NPROCS/2, ranks2, &new_group);
     MPI_Comm_create(MPI_COMM_WORLD, new_group, &new_comm);
     MPI_Group_rank (new_group, &new_rank);
     if (new_rank == MASTER) msg="Group 2";
     }

   MPI_Bcast(&msg,MSGSIZE,MPI_CHAR,MASTER,new_comm);

   printf("rank= %d newrank= %d msg= %s\n",rank,new_rank,msg);

   MPI_Finalize();
   }
Sample program output:

   rank= 0 newrank= 0 msg= Group 1
   rank= 1 newrank= 1 msg= Group 1
   rank= 2 newrank= 2 msg= Group 1
   rank= 3 newrank= 3 msg= Group 1
   rank= 4 newrank= 0 msg= Group 2
   rank= 5 newrank= 1 msg= Group 2
   rank= 6 newrank= 2 msg= Group 2
   rank= 7 newrank= 3 msg= Group 2

Derived Data Types



Derived Data Type Routines


MPI_Type_contiguous

The simplest constructor. Produces a new data type by making count copies of an existing data type.

    MPI_Type_contiguous (count,oldtype,*newtype) 
    MPI_TYPE_CONTIGUOUS (count,oldtype,newtype,ierr)
    
MPI_Type_vector
MPI_Type_hvector

Similar to contiguous, but allows for regular gaps (stride) in the displacements. MPI_Type_hvector is identical to MPI_Type_vector except that stride is specified in bytes.

    MPI_Type_vector (count,blocklength,stride,oldtype,*newtype) 
    MPI_TYPE_VECTOR (count,blocklength,stride,oldtype,newtype,ierr)
    
MPI_Type_indexed
MPI_Type_hindexed

An array of displacements of the input data type is provided as the map for the new data type. MPI_Type_hindexed is identical to MPI_Type_indexed except that offsets are specified in bytes.

    MPI_Type_indexed (count,blocklens[],offsets[],old_type,*newtype)
    MPI_TYPE_INDEXED (count,blocklens(),offsets(),old_type,newtype,ierr)
    
MPI_Type_struct

The new data type is formed according to completely defined map of the component data types.

    MPI_Type_struct (count,blocklens[],offsets[],old_types,*newtype)
    MPI_TYPE_STRUCT (count,blocklens(),offsets(),old_types,newtype,ierr)
    
MPI_Type_extent

Returns the size in bytes of the specified data type. Useful for the MPI subroutines which require specification of offsets in bytes.

    MPI_Type_extent (datatype,*extent)
    MPI_TYPE_EXTENT (datatype,extent,ierr)
    

Contiguous Derived Data Type Example


Fortran version available here.

Create a data type representing a row of an array and distribute a different row to all processes. Diagram here.


   #include "mpi.h"
   #include <stdio.h>
   #define SIZE 4

   int main(argc,argv)
   int argc;
   char *argv[];  {
   int numtasks, rank, source=0, dest, tag=1, i;
   float a[SIZE][SIZE] =
     {1.0, 2.0, 3.0, 4.0,
      5.0, 6.0, 7.0, 8.0,
      9.0, 10.0, 11.0, 12.0,
      13.0, 14.0, 15.0, 16.0};
   float b[SIZE];

   MPI_Status stat;
   MPI_Datatype rowtype;

   MPI_Init(&argc,&argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
   MPI_Comm_size(MPI_COMM_WORLD, &numtasks);

   MPI_Type_contiguous(SIZE, MPI_FLOAT, &rowtype);
   MPI_Type_commit(&rowtype);

   if (numtasks == SIZE) {
     if (rank == 0) {
        for (i=0; i<numtasks; i++)
          MPI_Send(&a[i][0], 1, rowtype, i, tag, MPI_COMM_WORLD);
        }

     MPI_Recv(b, SIZE, MPI_FLOAT, source, tag, MPI_COMM_WORLD, &stat);
     printf("rank= %d  b= %3.1f %3.1f %3.1f %3.1f\n",
            rank,b[0],b[1],b[2],b[3]);
     }
   else
     printf("Must specify %d processors. Terminating.\n",SIZE);

   MPI_Finalize();
   }
Sample program output:

   rank= 0  b= 1.0 2.0 3.0 4.0
   rank= 1  b= 5.0 6.0 7.0 8.0
   rank= 2  b= 9.0 10.0 11.0 12.0
   rank= 3  b= 13.0 14.0 15.0 16.0

Vector Derived Data Type Example


Fortran version available here.

Create a data type representing a column of an array and distribute different columns to all processes. Diagram here.


   #include "mpi.h"
   #include <stdio.h>>
   #define SIZE 4

   int main(argc,argv)
   int argc;
   char *argv[];  {
   int numtasks, rank, source=0, dest, tag=1, i;
   float a[SIZE][SIZE] = 
     {1.0, 2.0, 3.0, 4.0,  
      5.0, 6.0, 7.0, 8.0, 
      9.0, 10.0, 11.0, 12.0,
     13.0, 14.0, 15.0, 16.0};
   float b[SIZE]; 

   MPI_Status stat;
   MPI_Datatype columntype;

   MPI_Init(&argc,&argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
   MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
   
   MPI_Type_vector(SIZE, 1, SIZE, MPI_FLOAT, &columntype);
   MPI_Type_commit(&columntype);

   if (numtasks == SIZE) {
     if (rank == 0) {
        for (i=0; i<numtasks; i++) 
          MPI_Send(&a[0][i], 1, columntype, i, tag, MPI_COMM_WORLD);
        }
 
     MPI_Recv(b, SIZE, MPI_FLOAT, source, tag, MPI_COMM_WORLD, &stat);
     printf("rank= %d  b= %3.1f %3.1f %3.1f %3.1f\n",
           rank,b[0],b[1],b[2],b[3]);
     }
   else
     printf("Must specify %d processors. Terminating.\n",SIZE);
   
   MPI_Finalize();
   }
Sample program output:

   rank= 0  b= 1.0 5.0 9.0 13.0
   rank= 1  b= 2.0 6.0 10.0 14.0
   rank= 2  b= 3.0 7.0 11.0 15.0
   rank= 3  b= 4.0 8.0 12.0 16.0

Indexed Derived Data Type Example


Fortran version available here.

Create a datatype by extracting variable portions of an array and distribute to all tasks. Diagram here.


   #include "mpi.h"
   #include <stdio.h>
   #define NELEMENTS 6

   int main(argc,argv)
   int argc;
   char *argv[];  {
   int numtasks, rank, source=0, dest, tag=1, i;
   int blocklengths[2], displacements[2];
   float a[16] = 
     {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 
      9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0};
   float b[NELEMENTS]; 

   MPI_Status stat;
   MPI_Datatype indextype;

   MPI_Init(&argc,&argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
   MPI_Comm_size(MPI_COMM_WORLD, &numtasks);

   blocklengths[0] = 4;
   blocklengths[1] = 2;
   displacements[0] = 5;
   displacements[1] = 12;
   
   MPI_Type_indexed(2, blocklengths, displacements, MPI_FLOAT, &indextype);
   MPI_Type_commit(&indextype);

   if (rank == 0) {
     for (i=0; i<numtasks; i++) 
        MPI_Send(a, 1, indextype, i, tag, MPI_COMM_WORLD);
     }
 
   MPI_Recv(b, NELEMENTS, MPI_FLOAT, source, tag, MPI_COMM_WORLD, &stat);
   printf("rank= %d  b= %3.1f %3.1f %3.1f %3.1f %3.1f %3.1f\n",
        rank,b[0],b[1],b[2],b[3],b[4],b[5]);
   
   MPI_Finalize();
   }
Sample program output:

   rank= 0  b= 6.0 7.0 8.0 9.0 13.0 14.0
   rank= 1  b= 6.0 7.0 8.0 9.0 13.0 14.0
   rank= 2  b= 6.0 7.0 8.0 9.0 13.0 14.0
   rank= 3  b= 6.0 7.0 8.0 9.0 13.0 14.0

Struct Derived Data Type Example


Fortran version available here.

Create a data type which represents a particle and distribute an array of such particles to all processes. Diagram here.


   #include "mpi.h"
   #include <stdio.h>
   #define NELEM 25

   int main(argc,argv)
   int argc;
   char *argv[];  {
   int numtasks, rank, source=0, dest, tag=1, i;

   typedef struct {
     float x, y, z;
     float velocity;
     int  n, type;
     }          Particle;
   Particle     p[NELEM], particles[NELEM];
   MPI_Datatype particletype, oldtypes[2]; 
   int          blockcounts[2];

   /* MPI_Aint type used to be consistent with syntax of */
   /* MPI_Type_extent routine */
   MPI_Aint    offsets[2], extent;

   MPI_Status stat;

   MPI_Init(&argc,&argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
   MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
 
   /* Setup description of the 4 MPI_FLOAT fields x, y, z, velocity */
   offsets[0] = 0;
   oldtypes[0] = MPI_FLOAT;
   blockcounts[0] = 4;

   /* Setup description of the 2 MPI_INT fields n, type */
   /* Need to first figure offset by getting size of MPI_FLOAT */
   MPI_Type_extent(MPI_FLOAT, &extent);
   offsets[1] = 4 * extent;
   oldtypes[1] = MPI_INT;
   blockcounts[1] = 2;

   /* Now define structured type and commit it */
   MPI_Type_struct(2, blockcounts, offsets, oldtypes, &particletype);
   MPI_Type_commit(&particletype);

   /* Initialize the particle array and then send it to each task */
   if (rank == 0) {
     for (i=0; i<NELEM; i++) {
        particles[i].x = i * 1.0;
        particles[i].y = i * -1.0;
        particles[i].z = i * 1.0; 
        particles[i].velocity = 0.25;
        particles[i].n = i;
        particles[i].type = i % 2; 
        }
     for (i=0; i<numtasks; i++) 
        MPI_Send(particles, NELEM, particletype, i, tag, MPI_COMM_WORLD);
     }
 
   MPI_Recv(p, NELEM, particletype, source, tag, MPI_COMM_WORLD, &stat);

   /* Print a sample of what was received */
   printf("rank= %d   %3.2f %3.2f %3.2f %3.2f %d %d\n", rank,p[3].x,
        p[3].y,p[3].z,p[3].velocity,p[3].n,p[3].type);
   
   MPI_Finalize();
   }
Sample program output:

   rank= 0   3.00 -3.00 3.00 0.25 3 1
   rank= 2   3.00 -3.00 3.00 0.25 3 1
   rank= 1   3.00 -3.00 3.00 0.25 3 1
   rank= 3   3.00 -3.00 3.00 0.25 3 1

Virtual Topologies



Virtual Topology Routines


MPI_Cart_coords

Determines process coordinates in Cartesian topology given rank in group.

    MPI_Cart_coords (comm,rank,maxdims,*coords[])
    MPI_CART_COORDS (comm,rank,maxdims,coords(),ierr)
    
MPI_Cart_create

Creates a new communicator to which Cartesian topology information has been attached.

    MPI_Cart_create (comm_old,ndims,*dims[],*periods,reorder,*comm_cart)
    MPI_CART_CREATE (comm_old,ndims,dims(),periods,reorder,comm_cart,ierr)
    
MPI_Cart_get

Retrieves the number of dimensions, coordinates and periodicity setting for the calling process in a Cartesian topology.

    MPI_Cart_get (comm,maxdims,*dims,*periods,*coords[])
    MPI_CART_GET (comm,maxdims,dims,periods,coords(),ierr)
    
MPI_Cart_map

Maps process to Cartesian topology information

    MPI_Cart_map (comm_old,ndims,*dims[],*periods[],*newrank)
    MPI_CART_MAP (comm_old,ndims,dims(),periods(),newrank,ierr)
    
MPI_Cart_rank

Determines process rank in communicator given the Cartesian coordinate location.

    MPI_Cart_rank (comm,*coords[],*rank)
    MPI_CART_RANK (comm,coords(),rank,ierr)
    
MPI_Cart_shift

Returns the shifted source and destination ranks for the calling process in a Cartesian topology. Calling process specifies the shift direction and amount.

    MPI_Cart_shift (comm,direction,displ,*source,*dest)
    MPI_CART_SHIFT (comm,direction,displ,source,dest,ierr)
    
MPI_Cart_sub

Partitions a communicator into subgroups which form lower-dimensional Cartesian subgrids

    MPI_Cart_sub (comm,*remain_dims[],*comm_new)
    MPI_CART_SUB (comm,remain_dims(),comm_new,ierr)
    
MPI_Cartdim_get

Retrieves the number of dimensions associated with a Cartesian topology communicator.

    MPI_Cartdim_get (comm,*ndims)
    MPI_CARTDIM_GET (comm,ndims,ierr)
    
MPI_Dims_create

Creates a division of processors in a Cartesian grid.

    MPI_Dims_create (nnodes,ndims,*dims[])
    MPI_DIMS_CREATE (nnodes,ndims,dims(),ierr)
    
MPI_Graph_create

Makes a new communicator to which topology information has been attached.

    MPI_Graph_create (comm_old,nnodes,*index[],*edges[],
                     reorder,*comm_graph)
    MPI_GRAPH_CREATE (comm_old,nnodes,index(),edges(),
                     reorder,comm_graph,ierr)
    
MPI_Graph_get

Retrieves graph topology information associated with a communicator.

    MPI_Graph_get (comm,maxindex,maxedges,*index[],*edges[])
    MPI_GRAPH_GET (comm,maxindex,maxedges,index(),edges(),ierr)
    
MPI_Graph_map

Maps process to graph topology information.

    MPI_Graph_map (comm_old,nnodes,*index[],*edges[],*newrank)
    MPI_GRAPH_MAP (comm_old,nnodes,index(),edges(),newrank,ierr)
    
MPI_Graph_neighbors

Returns the neighbors of a node associated with a graph topology.

    MPI_Graph_neighbors (comm,rank,maxneighbors,*neighbors[])
    MPI_GRAPH_NEIGHBORS (comm,rank,maxneighbors,neighbors(),ierr)
    
MPI_Graphdims_get

Retrieves graph topology information (number of nodes and number of edges) associated with a communicator

    MPI_Graphdims_get (comm,*nnodes,*nedges)
    MPI_GRAPHDIMS_GET (comm,nnodes,nedges,ierr)
    
MPI_Topo_test

Determines the type of topology (if any) associated with a communicator.

     MPI_Topo_test (comm,*top_type)
     MPI_TOPO_TEST (comm,top_type,ierr)
    

Cartesian Virtual Topology Example


Fortran version available here

Create a 4 x 4 Cartesian topology from 16 processors and have each process exchange its rank with four neighbors.


   #include "mpi.h"
   #include <stdio.h>
   #define SIZE 16
   #define UP    0
   #define DOWN  1
   #define LEFT  2
   #define RIGHT 3

   int main(argc,argv)
   int argc;
   char *argv[];  {
   int numtasks, rank, source, dest, outbuf, i, tag=1, 
      inbuf[4]={MPI_PROC_NULL,MPI_PROC_NULL,MPI_PROC_NULL,MPI_PROC_NULL,}, 
      nbrs[4], dims[2]={4,4}, 
      periods[2]={0,0}, reorder=0, coords[2];

   MPI_Request reqs[8];
   MPI_Status stats[8];
   MPI_Comm cartcomm;

   MPI_Init(&argc,&argv);
   MPI_Comm_size(MPI_COMM_WORLD, &numtasks);

   if (numtasks == SIZE) {
     MPI_Cart_create(MPI_COMM_WORLD, 2, dims, periods, reorder, &cartcomm);
     MPI_Comm_rank(cartcomm, &rank);
     MPI_Cart_coords(cartcomm, rank, 2, coords);
     MPI_Cart_shift(cartcomm, 0, 1, &nbrs[UP], &nbrs[DOWN]);
     MPI_Cart_shift(cartcomm, 1, 1, &nbrs[LEFT], &nbrs[RIGHT]);

     outbuf = rank;

     for (i=0; i<4; i++) {
        dest = nbrs[i];
        source = nbrs[i];
        MPI_Isend(&outbuf, 1, MPI_INT, dest, tag, MPI_COMM_WORLD, &reqs[i]);
        MPI_Irecv(&inbuf[i], 1, MPI_INT, source, tag, MPI_COMM_WORLD, &reqs[i+4]);
        }

     MPI_Waitall(8, reqs, stats);
   
     printf("rank= %d coords= %d %d  neighbors(u,d,l,r)= %d %d %d %d\n",
           rank,coords[0],coords[1],nbrs[UP],nbrs[DOWN],nbrs[LEFT],
           nbrs[RIGHT]);
     printf("rank= %d                 inbuf(u,d,l,r)= %d %d %d %d\n",
           rank,inbuf[UP],inbuf[DOWN],inbuf[LEFT],inbuf[RIGHT]);
     }
   else
     printf("Must specify %d processors. Terminating.\n",SIZE);
   
   MPI_Finalize();
   }
Sample program output (partial):

   rank= 0 coords= 0 0  neighbors(u,d,l,r)= -3 4 -3 1
   rank= 0                inbuf(u,d,l,r)= -3 4 -3 1
   rank= 1 coords= 0 1  neighbors(u,d,l,r)= -3 5 0 2
   rank= 1                inbuf(u,d,l,r)= -3 5 0 2
   rank= 2 coords= 0 2  neighbors(u,d,l,r)= -3 6 1 3
   rank= 2                inbuf(u,d,l,r)= -3 6 1 3
           . . . . .

   rank= 14 coords= 3 2  neighbors(u,d,l,r)= 10 -3 13 15
   rank= 14                inbuf(u,d,l,r)= 10 -3 13 15
   rank= 15 coords= 3 3  neighbors(u,d,l,r)= 11 -3 14 -3
   rank= 15                inbuf(u,d,l,r)= 11 -3 14 -3

MPE Multi Processing Environment



MPE Multi Processing Environment
MPE Logging Routines


MPE_Initlog

Must be called by all MPI processes to initialize MPE logging data structures.

    MPE_Init_log ()
    MPE_INIT_LOG ()
    
MPE_Finish_log

Must be called by each process like MPE_Initlog. Causes local event logs to be merged in parallel into a final buffer, which is sorted by event timestamp by the process with rank 0 in the communicator MPI_WORLD_COMM. Process 0 then writes out the file with the specified filename.

Postprocessing performed by MPE_Finish_log includes alignment and stretching of event time stamps to compensate for processor clock differences. Synchronizations are made with respect to the time stamps of MPE_Initlog and MPE_Finish_log.


    MPE_Finish_log (*filename)
    MPE_FINISH_LOG (filename)
    
MPE_Start_log

Begin logging of events. Used in association with MPE_Stop_log to dynamically turn event logging on and off. Should be called after MPE_Initlog.

    MPE_Start_log ()
    MPE_START_LOG ()
    
MPE_Stop_log

Stop logging events. Used in association with MPE_Start_log to dynamically turn event logging on and off. Should be called after MPE_Initlog and MPE_Start_log. If logging is turned off during program execution, the timer continues to tick.

    MPE_Stop_log ()
    MPE_STOP_LOG ()
    
MPE_Describe_event

Create a log record which describes an event type. Event types are arbitrarily assigned by the programmer as non-negative integers. Events have no duration. Programmers must use event pairs with MPE_Describe_state to define duration.

    MPE_Describe_event (event,*name)
    MPE_DESCRIBE_EVENT (event,name)
    
MPE_Describe_state

Creates a log record which describes a state. A state consists of a starting event, and ending event, a name and a color. Used by the programmer to define event durations.

    MPE_Describe_state (start,end,*name,*color)
    MPE_DESCRIBE_STATE (start,end,name,color)
    
MPE_Log_event

Logs a single event. The event type is defined by the user. In addition to the event type, one integer and one string of data are logged, both also defined by the user.

    MPE_Log_event (event,intdata,*string)
    MPE_LOG_EVENT (event,intdata,*string,ierr)
    

MPE Logging Routines Example


Fortran version available here

Defines three event states, one each for send, receive and wait communication events. Logs event records both "before" and "after" each communication event to the logfile, "logfile".


   #include "mpi.h"
   #include <stdio.h>

   int main(argc,argv)
   int argc;
   char *argv[];  {

   int numtasks, rank, next, prev, buf, tag=1;

   MPI_Request req;
   MPI_Status stat;

   MPI_Init(&argc,&argv);
   MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
   printf("rank= %d starting\n",rank);

   MPE_Init_log();   

   if (rank == 0)  {
     MPE_Describe_state(1,2,"SSend","green:gray3");
     MPE_Describe_state(3,4,"IRecv","blue:light_gray");
     MPE_Describe_state(5,6,"Wait","yellow:gray");
     }

   prev = rank -1;
   next = rank + 1;
   if (rank == (numtasks - 1))  
     next = 0;
   if (rank == 0) 
     prev = numtasks-1;

   MPE_Log_event(3,prev,"recv");
   MPI_Irecv(&buf,1,MPI_INT,prev,tag,MPI_COMM_WORLD,&req);
   MPE_Log_event(4,prev,"recvd");
   
   MPE_Log_event(1,next,"send");
   MPI_Ssend(&rank,1,MPI_INT,next,tag,MPI_COMM_WORLD);
   MPE_Log_event(2,next,"sent");

   MPE_Log_event(5,rank,"waitstart");
   MPI_Wait(&req, &stat);
   MPE_Log_event(6,rank,"waitdone");

   MPE_Finish_log ("logfile");
   MPI_Finalize();
   }
Sample program output:

MPE Multi Processing Environment
MPE Graphics Routines


MPE_Open_graphics

Collectively opens an X Windows display for writing by all processes. This routine should be called by all tasks that need to write to the X Windows display.

     MPE_Open_graphics (*handle,comm,display,x,y,w,h,is_collective)
     MPE_OPEN_GRAPHICS (handle,comm,display,x,y,,w,h,is_collective)
    
MPE_Close_graphics

Closes a X11 graphics device. Should be called by all tasks which called MPE_Open_graphics when they are finished writing to the X Windows display.

    MPE_Close_graphics (*handle)
    MPE_CLOSE_GRAPHICS (handle,ierr)
    
MPE_Draw_point

Draws a point on an X Windows display

    MPE_Draw_point (handle,x,y,color)
    MPE_DRAW_POINT (handle,x,y,color,ierr)
    
MPE_Draw_points


    MPE_Draw_points (handle,*pointlist,npoints)
    MPE_DRAW_POINTS (handle,pointlist,npoints,ierr)
    
MPE_Draw_line

Draws a line on an X11 display

     MPE_Draw_line (handle,x1,y1,x2,y2,color)
     MPE_DRAW_LINE (handle,x1,y1,x2,y2,color,ierr)
    
MPE_Line_thickness

Sets the thickness of lines

    MPE_Line_thickness (graph,thickness)
    MPE_LINE_THICKNESS (graph,thickness,ierr)
    
MPE_Draw_circle

Draws a circle on an X11 display

    MPE_Draw_circle (graph,centerx,centery,radius,color)
    MPE_DRAW_CIRCLE (graph,centerx,centery,radius,color,ierr)
    
MPE_Fill_circle

Fills a circle on an X11 display

    MPE_Fill_circle (graph,centerx,centery,radius,color)
    MPE_FILL_CIRCLE (graph,centerx,centery,radius,color,ierr)
    
MPE_Fill_rectangle

Draws a filled rectangle on an X11 display

    MPE_Fill_rectangle (handle,x,y,w,h,color)
    MPE_FILL_RECTANGLE (handle,x,y,w,h,color,ierr)
    
MPE_Add_RGB_color

Adds a color to the colormap given its RGB values. By default, the following colors are predefined: MPE_WHITE, MPE_BLACK, MPE_RED, MPE_YELLOW, MPE_GREEN, MPE_CYAN, MPE_BLUE, MPE_MAGENTA, MPE_AQUAMARINE, MPE_FORESTGREEN, MPE_ORANGE, MPE_VIOLET, MPE_BROWN, MPE_PINK, MPE_CORAL and MPE_GRAY.

    MPE_Add_RGB_color (graph,red,green,blue,mapping)
    MPE_ADD_RGB_COLOR (graph,red,green,blue,mapping,ierr)
    
MPE_Make_color_array

Makes an array of color indices. The new colors for a uniform distribution in hue space and replace the existing colors except for MPE_WHITE and MPE_BLACK

    MPE_Make_color_array (handle,ncolors,array)
    MPE_MAKE_COLOR_ARRAY (handle,ncolors,array,ierr)
    
MPE_Num_colors

Gets the number of available colors. MPE_Num_colors

    MPE_Num_colors (handle,*nc)
    MPE_NUM_COLORS (handle,nc,ierr)
    
MPE_Update

Updates an X11 display. Only after an MPE_Update can you count on seeing the results of MPE drawing routines. This is caused by the buffering of graphics requests for improved performance.

    MPE_Update (handle)
    MPE_UPDATE (handle,ierr)
    
MPE_Get_mouse_press

Waits for mouse button press, blocking until the mouse button is pressed inside this MPE window. When pressed, returns the coordinate relative to the upper right of this MPE window and the button that was pressed.

    MPE_Get_mouse_press (graph,*x,*y,*button)
    MPE_GET_MOUSE_PRESS (graph,x,y,button,ierr)
    

MPE Graphics Routines Example



Case Studies


{ under development }

MPI At The MHPCC



MPI Subroutine Man Pages


Environment Management Routines

MPI_Abort
MPI_Errhandler_create
MPI_Errhandler_free
MPI_Errhandler_get
MPI_Errhandler_set
MPI_Error_class
MPI_Error_string
MPI_Finalize
MPI_Get_processor_name
MPI_Init
MPI_Initialized
MPI_Wtick
MPI_Wtime

Collective Communication Routines

MPI_Allgather
MPI_Allgatherv
MPI_Allreduce
MPI_Alltoall
MPI_Alltoallv
MPI_Barrier
MPI_Bcast
MPI_Gather
MPI_Gatherv
MPI_Op_create
MPI_Op_free
MPI_Reduce
MPI_Reduce_scatter
MPI_Scan
MPI_Scatter
MPI_Scatterv

Point-to-Point Communication Routines

MPI_Bsend
MPI_Bsend_init
MPI_Buffer_attach
MPI_Buffer_detach
MPI_Cancel
MPI_Get_count
MPI_Get_elements
MPI_Ibsend
MPI_Iprobe
MPI_Irecv
MPI_Irsend
MPI_Isend
MPI_Issend
MPI_Probe
MPI_Recv
MPI_Recv_init
MPI_Request_free
MPI_Rsend
MPI_Rsend_init
MPI_Send
MPI_Send_init
MPI_Sendrecv
MPI_Sendrecv_replace
MPI_Ssend
MPI_Ssend_init
MPI_Start
MPI_Startall
MPI_Test
MPI_Test_cancelled
MPI_Testall
MPI_Testany
MPI_Testsome
MPI_Wait
MPI_Waitall
MPI_Waitany
MPI_Waitsome

Process Group Routines

MPI_Group_compare
MPI_Group_difference
MPI_Group_excl
MPI_Group_free
MPI_Group_incl
MPI_Group_intersection
MPI_Group_range_excl
MPI_Group_range_incl
MPI_Group_rank
MPI_Group_size
MPI_Group_translate_ranks
MPI_Group_union

Communicators Routines

MPI_Comm_compare
MPI_Comm_create
MPI_Comm_dup
MPI_Comm_free
MPI_Comm_group
MPI_Comm_rank
MPI_Comm_remote_group
MPI_Comm_remote_size
MPI_Comm_size
MPI_Comm_split
MPI_Comm_test_inter
MPI_Intercomm_create
MPI_Intercomm_merge

Derived Types Routines

MPI_Type_commit
MPI_Type_contiguous
MPI_Type_count
MPI_Type_extent
MPI_Type_free
MPI_Type_hindexed
MPI_Type_hvector
MPI_Type_indexed
MPI_Type_lb
MPI_Type_size
MPI_Type_struct
MPI_Type_ub
MPI_Type_vector

Virtual Topology Routines

MPI_Cart_coords
MPI_Cart_create
MPI_Cart_get
MPI_Cart_map
MPI_Cart_rank
MPI_Cart_shift
MPI_Cart_sub
MPI_Cartdim_get
MPI_Dims_create
MPI_Graph_create
MPI_Graph_get
MPI_Graph_map
MPI_Graph_neighbors
MPI_Graph_neighbors_count
MPI_Graphdims_get
MPI_Topo_test

Miscellaneous Routines

MPI_Address
MPI_Attr_delete
MPI_Attr_get
MPI_Attr_put
MPI_DUP_FN
MPI_Keyval_create
MPI_Keyval_free
MPI_NULL_COPY_FN
MPI_NULL_DELETE_FN
MPI_Pack
MPI_Pack_size
MPI_Pcontrol
MPI_Unpack

References, Acknowledgements, WWW Resources


Additional Information on the WWW

References and Acknowledgements


© Copyright 1995 Maui High Performance Computing Center. All rights reserved.

Documents located on the Maui High Performance Computing Center's WWW server are copyrighted by the MHPCC. Educational institutions are encouraged to reproduce and distribute these materials for educational use as long as credit and notification are provided. Please retain this copyright notice and include this statement with any copies that you make. Also, the MHPCC requests that you send notification of their use to help@@mail.mhpcc.edu.

Commercial use of these materials is prohibited without prior written permission.

18 December 1996 blaise@@mhpcc.edu