Hello, I have been playing around with a simple program within the MPI framework. The idea is to construct a row-wise partitioned matrix and then simply call the MPI_ALLGATHERV function to collect the complete matrix on all cpus (assuming that the matrix is not particularly large but evaluation of individual elements is independent and pretty expensive). One possibility how to collect the data would be to iterate over the columns of the matrix and call MPI_ALLGATHERV on each column independently. However, I tried to do it in a more MPI-like fashion. To this end, I defined a custom MPI type using MPI_TYPE_VECTOR (as shown in the minimalistic example below) in order to exploit only one call of MPI_ALLGATHERV.
This program works (or seems to work) correctly when compiled in a straightforward fashion:
mpiifort -o gather.lp64 gather.f90 mpirun -n 2 ./gather.lp64
Moreover, the results are independent of the optimization level.
However, for certain reasons, I would need to use the ILP64 interface. Following the instructions from the MPI manual, I compiled and executed the program like this:
mpiifort -f90=ifort -fc=ifort -c -warn all -O1 -i8 -I$MKLROOT/include/intel64/ilp64 -I${MKLROOT}/include -I${I_MPI_ROOT}/include64 -o gather.o gather.f90
mpiifort -f90=ifort -fc=ifort -ilp64 -warn all -i8 -o gather.ilp64 gather.o ${MKLROOT}/lib/intel64/libmkl_blas95_ilp64.a -L${MKLROOT}/lib/intel64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_ilp64 -lpthread -lm
mpirun -ilp64 -n 2 ./gather.ilp64
Now, the strange thing is that this produces the expected results with -O0, -O2, and -O3, nevertheless a Segmentation fault pops in with -O1. Strangely enough, this segmentation fault disappears when one removes ${MKLROOT}/lib/intel64/libmkl_blas95_ilp64.a - however, I need this library for certain BLAS ILP64 calls (not used in the minimalistic example below).
I am using ifort Version 13.0.1.117 Build 20121010 and Intel MPI library v. 4.0.3.008.
Any ideas what might be wrong? Perhaps some arguments of the MPI calls are still supposed to be INTEGER(KIND=4) even with -i8? For example, in case of the MKL, the manual mentions that one should check the header files in order to find out the correct kinds, nevertheless the mpif.h header (recommended for ILP64) didn't provide me with any additional insight...
PROGRAM gather IMPLICIT NONE INCLUDE 'mpif.h' ! INTEGER, PARAMETER :: dp = KIND(1D0) INTEGER, PARAMETER :: number_of_states = 2 INTEGER, PARAMETER :: number_of_points = 7 ! INTEGER :: i INTEGER :: nproc, my_id, ierr INTEGER(KIND = MPI_ADDRESS_KIND) :: lb, extent INTEGER :: ROW_TYPE, ROW_TYPE_RESIZED ! REAL(dp), DIMENSION(:, :), ALLOCATABLE :: psi, psi_local, psi_local_tr INTEGER, ALLOCATABLE :: number_of_points_per_proc(:), gather_displ_points(:) INTEGER :: points_start_index, points_end_index ! CALL MPI_INIT(ierr) CALL MPI_COMM_SIZE(MPI_COMM_WORLD, nproc, ierr) CALL MPI_COMM_RANK(MPI_COMM_WORLD, my_id, ierr) ! ALLOCATE(gather_displ_points(0:nproc-1), number_of_points_per_proc(0:nproc-1)) ! IF(nproc > 1) THEN number_of_points_per_proc(0:nproc-2) = number_of_points / nproc number_of_points_per_proc(nproc-1) = number_of_points - SUM(number_of_points_per_proc(0:nproc-2)) ELSE number_of_points_per_proc(0) = number_of_points END IF gather_displ_points(0) = 0 DO i = 0, nproc - 2 gather_displ_points(i + 1) = gather_displ_points(i) + number_of_points_per_proc(i) END DO ! points_start_index = gather_displ_points(my_id) + 1 points_end_index = points_start_index + number_of_points_per_proc(my_id) - 1 ! ALLOCATE(psi_local(points_start_index:points_end_index, number_of_states)) ALLOCATE(psi_local_tr(number_of_states, points_start_index:points_end_index)) ALLOCATE(psi(number_of_points, number_of_states)) ! CALL MPI_TYPE_VECTOR(number_of_states, 1, number_of_points, MPI_REAL8, ROW_TYPE, ierr) CALL MPI_TYPE_COMMIT(ROW_TYPE, ierr) CALL MPI_TYPE_GET_EXTENT(MPI_REAL8, lb, extent, ierr) CALL MPI_TYPE_CREATE_RESIZED(ROW_TYPE, lb, extent, ROW_TYPE_RESIZED, ierr) CALL MPI_TYPE_COMMIT(ROW_TYPE_RESIZED, ierr) ! psi = 0 psi_local = 8 + my_id psi_local_tr = TRANSPOSE(psi_local) ! IF(my_id .EQ. 0) THEN WRITE(*, *) "calling ALLGATHER" WRITE(*, *) number_of_points_per_proc WRITE(*, *) gather_displ_points END IF ! CALL MPI_ALLGATHERV( & psi_local_tr, number_of_points_per_proc(my_id)*number_of_states, MPI_REAL8, & psi, number_of_points_per_proc, gather_displ_points, ROW_TYPE_RESIZED, MPI_COMM_WORLD, ierr) ! IF(my_id .EQ. 0) THEN DO i = 1, number_of_points WRITE(*, *) psi(i, :) END DO END IF ! DEALLOCATE(psi, psi_local, psi_local_tr) CALL MPI_FINALIZE(ierr) END PROGRAM