.. role:: raw-latex(raw)
   :format: latex
..

Developer Guide
***************

Introduction
============

MPI is a standard in HPC community which allows a simple use of
clusters. Nowadays, there are several implementation (OpenMPI, BullxMPI,
MPT, IntelMPI, MPC, ...) each of which involves a specific ABI
(Application Binary Interface) for an application compiled with a
specific MPI implementation. With wi4mpi, an application compiled with
an alpha MPI implementation can be run under a beta MPI implementation
without any compilation protocol and any concern about the ABI (Preload
version). WI4MPI can also be seen as a dedicated MPI implementation;
This time, the application is compiled against the wi4mpi library
(libmpi.so) with the dedicated wrapper (mpicc,mpif90...) meant for that
purpose, and can be run under any other MPI implementation (Interface
Version).

Library
=======

How it works
------------

Before performing any translation we need to distinguish the application
side from the runtime side. To do that, any MPI object from the
application side are prefixed by :code:`A_` and those from the runtime side are
prefixed by :code:`R_`. To perform a translation, all original MPI calls from
the application are intercepted by WI4MPI and replaced by the same call
prefixed by :code:`A_`. For example, with an OpenMPI ---> IntelMPI conversion.

.. graphviz:: developer_guide/how_it_works.dot
    :caption: How it works
    :name: how_it_works

..
    .. code-block::
    
        Application: MPI_Init (OpenMPI)                         MPI_Init (IntelMPI)
                                      \                         ^
                              Phase 1  \                       /  Phase 3
                                        v                     /
                                        |--------------------|
                                        | WI4MPI: A_MPI_Init |
                                        |--------------------|
                                                  ^
                                                  | Phase 2 
                                                  v
                                              Translation

Implementation
--------------

Library settings
~~~~~~~~~~~~~~~~

The library is set during its loading time, when the program start. All
the runtime MPI routines are saved to function pointers using a :code:`dlsym`
call to be called later on, all the tables are created and set with MPI
constant objects, and the spinlocks are initialized. To do so, we use
the following syntax:

.. code-block:: c++

    void **attribute** ((constructor)) wrapper_init {
      void(*) lib_handle = dlopen(getenv("WI4MPI_RUN_MPI_C_LIB"), RTLD_NOW);
      LOCAL_MPI_Function = dlsym(lib_handle, "PMPI_Function")
      ....
    }

The library contains three constructors:

-  :code:`wrapper_init` in :code:`test_generation_wrapper.c` (API C) (preload and
   interface)
-  :code:`wrapper_init_f` in :code:`wrapper.c` (API Fortran) (preload and interface)
-  :code:`wrapper_init_c2ff2c` in :code:`c2f_f2c.c` (API c2f/f2c)

Symbol overload
~~~~~~~~~~~~~~~

The mpi calls are intercepted thanks to the following rerouting:

-  :code:`#define A_MPI_Send PMPI_Send`
-  :code:`#pragma weak MPI_Send=PMPI_Send`

(See :code:`interface_test.c` in :code:`src/interface/gen/interface_test.c` and
:code:`src/interface/gen/interface_fort.c`) This syntax is also present but
hidden in :code:`test_generation_wrapper.c` (:code:`src/{preload,interface}/gen/`)
within an asm code chooser for the next reason.

The MPI-IO implementation (ROMIO) present within most MPI implementations
triggers some calls to the MPI user interface that WI4MPI intercepted using
the symbols overload protocol. This implies that during the runtime (phase
3 and above), some MPI calls are made triggering WI4MPI to re-intercept them
and crash the application. The crash is the result of WI4MPI trying to convert
arguments from the runtime version to the runtime version.
An :ref:`example <symbol_overload_example>` is illustrated below.

.. graphviz:: developer_guide/symbol_overload.dot
    :caption: Example: symbol overload
    :name: symbol_overload_example

..
    Example:
    
    .. code-block::
    
        Application:MPI_File_open (OpenMPI)                           MPI_File_open(IntelMPI)
                                     \                               ^                      \
                      Phase 1         \                             /         Phase 3        \
                                       v                           /                          v
                                        |-------------------------|                      |----------------------------------------------------|
                                        | WI4MPI: A_MPI_File_open |                      | WI4MPI: A_MPI_Allreduce but with runtime arguments |
                                        |-------------------------|                      | instead of application arguments (R_ instead of A_)|
                                                    ^                                    |----------------------------------------------------|
                                                    |   Phase 2                                                     |
                                                    v                                                               v
                                               Translation                                                     Translation ----> Crash

To overcome this issue, we used an assembly code router.

Code chooser assembly
~~~~~~~~~~~~~~~~~~~~~~~

The ASM code chooser does the simple following things:

If we already are in the wrapper:

-  The arguments are passed without any translation protocol to the
   underlying MPI runtime call (:code:`LOCAL_MPI_function`)

Otherwise:

-  The arguments are translated and passed to the underlying MPI runtime
   call (:code:`LOCAL_MPI_function`)

To know which state the process is, we check the value of the :code:`in_w` variable:

-  :code:`in_w=1` : in the wrapper
-  :code:`in_w=0` : in the application

Since the implementation of MPI objects is developer dependent, some of them
may vary in size. To make sure that there is no side effect, the code chooser
analyzes the stack itself.

ASM Code chooser implementation (generated for each function):

.. code-block:: asm

    .global PMPI_Function                   # Define global PMPI_Function symbol
    .weak MPI_Function                      # Define a weak MPI_Function symbol
    .set MPI_function,PMPI_Function         # Set contents of MPI_function to PMPI_Function
    .extern in_w
    .extern A_MPI_Function
    .extern R_MPI_Function
    .type PMPI_Function,@function           # Set PMPI_Function type to function
    .text
    PMPI_Function:
    push %rbp
    mov %rsp, %rbp
    ; ------------- Put arguments on stack for safekeeping
    sub $0x20, %rsp
    mov %rdi, -0x8(%rbp)
    mov %rsi, -0x10(%rbp)
    mov %rdx, -0x18(%rbp)
    mov %rcx, -0x20(%rbp)
    ; ------------- Access thread-local variable in_w
    .byte 0x66
    leaq in_w@tlsgd(%rip), %rdi             # Load address of in_w into %rdi
    .value 0x6666
    rex64
    call __tls_get_addr@PLT                 # Get contents of address in %rdi into %rax
    ; ------------- Put arguments back where we found them
    mov -0x8(%rbp), %rdi
    mov -0x10(%rbp), %rsi
    mov -0x18(%rbp), %rdx
    mov -0x20(%rbp), %rcx
    leave                                   # Set %rsp to %rbp, then pop top of stack into %rbp
    ; ------------ Jump to the target function
    cmpl $0x0, 0x0(%rax)
    jne inwrap_MPI_Function
    jmp (*)A_MPI_Function@GOTPCREL(%rip)    # If not in wrapper call application method
    inwrap_MPI_Function:
    jmp (*)R_MPI_Function@GOTPCREL(%rip)    # If in wrapper call run method
    ; ------------ Calculate symbol size
    .size PMPI_Function,.-PMPI_Function     # Declares symbol size to be the size of the above

.. graphviz:: developer_guide/asm_code_chooser.dot
    :caption: ASM Code chooser
    :name: asm_code_chooser

..
    .. code-block::
    
        Application:MPI_File_open (OpenMPI)                           MPI_File_open(IntelMPI)
                                     \                               ^                      \
                      Phase 1         \                             /         Phase 3        \
                                       v                           /                          v
                                        |-------------------------|                     |-------------------------|
                                        | WI4MPI: PMPI_File_open  |                     | WI4MPI: PMPI_Allreduce  |
                                        | Testing in_w: in_w=0    |                     | Testing in_w: in_w=1    |
                                        |-------------------------|                     | ------------------------|
                                                   ^    Phase 2                                      |
                                                   |                                                 |
                                                   v                                                 v
                                        A_MPI_File_open:Translation                     R_MPI_Allreduce:No Translation

:code:`A_MPI_Function`
~~~~~~~~~~~~~~~~~~~~~~~~~~

All translations are executed thanks to some mappers defined within
:code:`mappers.h` using an underlying hash table mechanism named :code:`uthash`
(https://troydhanson.github.io/uthash/) The mappers (see example below)
always have the same syntax :

.. code-block:: c++

    mapper_name_a2r(&buf, &buf_tmp);
    mapper_name_r2a(&buf, &buf_tmp);

In case of an a2r translation, :code:`buf_tmp` represent the translation of
:code:`buf` and vice versa for an r2a translation.

Example:

.. code-block:: c++

    A_MPI_Send(void *buf, int count, A_MPI_Datatype datatype, int dest, int tag,
               A_MPI_Comm comm) {
      void *buf_tmp;
      const_buffer_conv_a2r(&buf, &buf_tmp); // mapper
      R_MPI_Datatype datatype_tmp;
      datatype_conv_a2r(&datatype, &datatype_tmp); // mapper
      int dest_tmp;
      dest_conv_a2r(&dest, &dest_tmp); // mapper
      int tag_tmp;
      tag_conv_a2r(&tag, &tag_tmp); // mapper
      R_MPI_Comm comm_tmp;
      comm_conv_a2r(&comm, &comm_tmp); // mapper
      int ret_tmp = LOCAL_MPI_Send(buf_tmp, count, datatype_tmp, dest_tmp, tag_tmp,
                                   comm_tmp); // Runtime MPI_Send call
      return error_code_conv_r2a(ret_tmp);
    }

:code:`R_MPI_Function`
~~~~~~~~~~~~~~~~~~~~~~~

In :code:`R_MPI_Function`, the arguments are directly passed to the MPI runtime
call

.. code-block:: c++

    int R_MPI_Send(void *buf, int count, R_MPI_Datatype datatype, int dest, int tag,
                   R_MPI_Comm comm) {

      int ret_tmp = LOCAL_MPI_Send(buf, count, datatype, dest, tag, comm);

      return ret_tmp;
    }

Hash table
~~~~~~~~~~

The underlying hash table mechanism presented earlier is contained in
:code:`engine.*`, :code:`engine_fn.*` and :code:`utash.h`. For each
MPI objects, two tables are created. One for the constants, and one for the
:code:`MPI_Type` created by the application.

The different types being:

- :code:`MPI_Comm`
- :code:`MPI_Datatype`
- :code:`MPI_Errhandler`
- :code:`MPI_Group`
- :code:`MPI_Op`
- :code:`MPI_Request` (Split en 2 tables, in order to dissociate blocking
  requests from asynchronous requests)
- :code:`MPI_File`

The table within :code:`engine_fn.*` contains the following translation:

- :code:`MPI_Handler_function`
- :code:`MPI_Comm_copy_attr_function`
- :code:`MPI_Comm_delete_function`
- :code:`MPI_Type_delete_function`
- :code:`MPI_Comm_errhandler_function`
- :code:`MPI_File_errhandler_function`

Thread safety
~~~~~~~~~~~~~

To make WI4MPI usable in a multithread environment, the :code:`in_w` (see above) variable is TLS protected.

-  :code:`__thread int in_w=0;` (:code:`test_wrapper_generation.c:118`)
-  :code:`extern __thread int in_w;` (:code:`wrapper.c:7`)
-  :code:`extern __thread int in_w;` (:code:`c2f_f2c.c:6 || c2f_f2c.c:1149`)

The table are spinlock protected. (cf ::code:`thread_safety.h`):

-  :code:`#define lock_dest(a) pthread_spin_destroy(a)`
-  :code:`#define lock_init(a) pthread_spin_init(a, PTHREAD_PROCESS_PRIVATE)`
-  :code:`#define lock(a) pthread_spin_lock(a)`
-  :code:`#define unlock(a) pthread_spin_unlock(a)`
-  :code:`typedef pthread_spinlock_t (*)table_lock_t`

Interface
=========

The interface version of WI4MPI propose the promise as the preload
version (one compilation, several run over different MPI
implementation), but this time WI4MPI had to be seen as a fully MPI
Library. All the previously section are still relevant for the
interface, the only things that changed is the new level name INTERFACE
(see the :ref:`schema below <developer_guide_interface>`). This level has to be considered as a "libmpi.so"
which is linked to the user application.

.. todo:: Update with relevant changes of the jinja generator

.. graphviz:: developer_guide/interface.dot
    :caption: Interface
    :name: developer_guide_interface

..
    .. code-block::
    
                  dlopen|----------|  dlopen       |---------|
                       /| Lib_OMPI | ----------- > | OpenMPI |
                      / |----------|               |---------|
       |-----------| /
       |           |/
       | INTERFACE |
       | libmpi.so |\
       |-----------| \
                      \
                       \|----------|  dlopen       |----------|
                        | Lib_IMPI | ----------- > | IntelMPI |
                  dlopen|----------|               |----------|


The files :code:`interface_test.c` and :code:`interface_fort.c`, deal with the
overload symbol mechanism see earlier for respectively the C and Fortran
API, then according the conversion a dlopen is made to the appropriate
library (:code:`WI4MPI_WRAPPER_LIB`) responsible for the translation (ASM code
chooser + :code:`A_MPI_Function` + :code:`R_MPI_Function`).

:code:`MPI_Init` example

.. code-block:: c++

    int MPI_Init(int *argc, char ***argv);
    #define MPI_Init PMPI_Init
    #pragma weak MPI_Init = PMPI_Init
    int (*INTERFACE_LOCAL_MPI_Init)(int *, char ***);

    int PMPI_Init(int *argc, char ***argv) {
      int ret_tmp = INTERFACE_LOCAL_MPI_Init(argc, argv);
      return ret_tmp;
    }
    __attribute__((constructor)) void wrapper_interface(void) {
      void *interface_handle =
          dlopen(getenv("WI4MPI_WRAPPER_LIB"), RTLD_NOW | RTLD_GLOBAL);
      if (!interface_handle) {
        printf("no true IC lib defined\nerror :%s\n", dlerror());
        exit(1);
      }
      INTERFACE_LOCAL_MPI_Init = dlsym(interface_handle, "CCMPI_MPI_Init");
    }

Static mode
-----------

The static mode builds an executable with every targets translation. To
avoid conflicts, symbols are renamed as follow:
:code:`INTERF2_{TARGET}_{Symbol_name}`. No more dlopen is needed (cf.
Interface), functions pointer are chosen by 2 variables:
:code:`WI4MPI_STATIC_TARGET_TYPE_F` and :code:`WI4MPI_STATIC_TARGET_TYPE`. Static
sections are controlled by directives: :code:`#if(n)def WI4MPI_STATIC` / :code:`#endif`

Common files for both version of WI4MPI:

- :code:`func_char_fort.*`:

    Contain all Fortran MPI functions that deal with some character arguments.
    Since in Fortran a character argument always reference is len (character(len=*) :: dark_side) and since the len argument is not the same size according to the compiler (Intel/GNU < 8 or GNU >= 8) used,
    WI4MPI had to implement both.

    Example:

.. code-block:: c++

           #ifdef IFORT_CALL
                  void  A_f_MPI_Get_processor_name(char * name,int * resultlen,int * ret,int namelen) // The character length is of type int
           #elif GFORT_CALL
                  void  A_f_MPI_Get_processor_name(char * name,int * resultlen,int * ret,size_t namelen) // The character length is of type size_t
           #endif

- :code:`manual_wrapper.h`: Contain some manual mappers for Fortran translation
- :code:`mappers.h`: Contain the a2r/r2a mappers for C translation
- :code:`engine.*, engine_fn.*, uthash.h`: Contain all table routines
- :code:`thread_safety.h`: Contain the spinlock protection

Preload files:

-  bin/{wi4mpi,mpirun}: see User\_Guide
-  etc/wi4mpi.cfg: see User\_Guide
-  gen:

   -  :code:`c2f_f2c.c`:
   -  :code:`lib_empty.c`: Empty file to create empty Libraries made to
      replace the one from MPI use for the compilation
   -  :code:`test_generation_wrapper.c`: contain all C MPI function within
      WI4MPI which deal with the translation
   -  :code:`wrapper.c`: contain all the Fortran MPI function within WI4MPI
      which deal with the translation

-  header:

   -  :code:`INTEL_INTEL`: :code:`app_mpi.h app_mpio.h run_mpi.h run_mpio.h wrapper_f.h`
   -  :code:`INTEL_OMPI`: :code:`app_mpi.h app_mpio.h run_mpi.h wrapper_f.h`
   -  :code:`OMPI_INTEL`: :code:`app_mpi.h run_mpi.h run_mpio.h wrapper_f.h`
   -  :code:`OMPI_OMPI`: :code:`app_mpi.h run_mpi.h wrapper_f.h`

Interface files:

-  gen:

   -  :code:`c2f_f2c.c`:
   -  :code:`test_generation_wrapper.c`: Same as the preload version
   -  :code:`wrapper.c`: Same as the preload version
   -  :code:`interface_fort.c`: Contain the overload symbol mechanism for
      Fortran MPI Function
   -  :code:`interface_test.c`: Contain the overload symbol mechanism for C MPI
      Function and rerouting to CodeChooser

-  header:

   -  :code:`OMPI_INTEL`: :code:`app_mpi.h run_mpi.h run_mpio.h wrapper_f.h`
   -  :code:`OMPI_OMPI`: :code:`app_mpi.h run_mpi.h wrapper_f.h`

-  :code:`interface_utils`:

   -  :code:`bin`: Contain all mpi wrapper for compilation
   -  :code:`include`: Contain all include exposed to users

-  manual:

   -  :code:`dlsym_global.c` : Get runtime MPI constants

-  module: Contain all elements to create a descent module

Get involved in WI4MPI
======================

Generator Guide is prerequisites to this part

Expand MPI cover of WI4MPI
--------------------------

On the generator side
~~~~~~~~~~~~~~~~~~~~~

-  Add the function name to the :code:`func_list_....txt` files
-  Add the function description in the dictionary functions.json
-  Add the new mappers (if needed) to convert the arguments in the
   dictionary :code:`mappers.json`
-  Get involved in the generator code if some special case have to be
   handled
-  Generate the new Fortran header for both interface and preload
   version

On the library side
~~~~~~~~~~~~~~~~~~~

-  Code the new mappers in :code:`mappers.h, engine*`
-  Update :code:`app_mpi.h app_mpio.h run_mpi.h run_mpio.h` for all
   conversion of both version
-  Update headers within :code:`src/interface/interface_utils/include`
-  Make sure to respect the MPI norm

Expand WI4MPI conversion capability
-----------------------------------

-  In mappers.h, you have to make sure that the status mapper translate
   the :code:`MPI_Status.count` in the right way since its implementation is
   developer dependent.
-  Generate the associated :code:`app_mpi.h` and :code:`run_mpi.h` to new conversion