I am testing on an AVX machine. The code somehow looks like this:
include "Sub_Prog_1.f90" include "Sub_Prog_2.f90" program MyCode use Sub_Prog_1_Mod use Sub_Prog_2_Mod implicit none integer, parameter :: dp = selected_real_kind(15,307), dp2 =selected_real_kind(15,307) integer :: array_size, i, j, k, l integer, dimension(:), allocatable :: idx real(kind = dp2) :: time1, time2, omp_get_wtime real(kind = dp), dimension(:), allocatable :: a, b, c array_size = 100000000 !assuming I read from an input file, here I just wrote like this allocate ( idx(array_size), a(array_size), b(array_size), c(array_size) ) ! Initialization do i = 1, array_size a(i) = dble(i) ; b(i) = dble(i * 2) ; idx(i) = array_size - i + 1 end do time1 = omp_get_wtime() !$omp parallel do i = 1, 10 call Sub_Prog_1 ( array_size, idx, a, b ) call Sub_Prog_2 ( array_size, a, b, c ) end do !$omp end parallel time2 = omp_get_wtime() print *, c(8000000) print *, 'Results =', time2 - time1 end program MyCode !================================================================== subroutine Sub_Prog_1 ( array_size, idx, a, b ) implicit none integer, parameter :: dp = selected_real_kind(15,307), dp2 = selected_real_kind(15,307) integer :: array_size, i, j, k, l integer, dimension(:), allocatable :: idx real(kind = dp), dimension(:), allocatable :: a, b, c !$omp do private(i) schedule(runtime) !dir$ vector aligned !$omp simd simdlen(4) do i = 1, array_size a(i) = a(idx(i)) + dble(i) if (a(i) <= 3000.0d+0) then a(i) = dble(idx(i)) / 200.0d+0 end if b(i) = sqrt(b(i)) + dble(i * 2) end do !$omp end simd !$omp end do end subroutine Sub_Prog_1 !================================================================== subroutine Sub_Prog_2 ( array_size, a, b, c ) implicit none integer, parameter :: dp = selected_real_kind(15,307), dp2 = selected_real_kind(15,307) integer :: array_size, i, j, k, l real(kind = dp), dimension(:), allocatable :: a, b, c !$omp do private(i) schedule(runtime) !dir$ vector aligned !$omp simd simdlen(4) do i = 1, array_size c(i) = a(i) + sqrt(b(i)) / 3.67d+0 if (c(i) <= 350.0d+0) then c(i) = a(i) + sqrt(b(i)) / 8.67d+0 end if end do !$omp end simd !$omp end do end subroutine Sub_Prog_2
I wanted to exploit the ability of the Intel Compiler 19 for applications of aligned data access for efficient vectorization. Thus, I compiled using the flags "ifort -O3 -qopt-report5 -qopenmp -align array32byte -xAVX -o MyCode.exe Main.f90". Now, I have two questions.
- I was wondering why I cannot combine !dir$ vector aligned and !$omp simd simdlen(...) like written above as the compiler always showed me a message like this:
Sub_Prog_1.f90(17): catastrophic error: **Internal compiler error: internal abort** Please report this error along with the circumstances in which it occurred in a Software Problem Report. Note: File and line given may not be explicit cause of this error. compilation aborted for Main.f90 (code 1)
- As I actually prefer using OpenMP directives to Intel one, I was also previously using the directives "!$omp simd simdlen(4) aligned(a,b,idx :32)" and "!$omp simd simdlen(4) aligned(a,b,c :32)" for the first and second subroutines, respectively. However, as I saw the vectorization reports, I found that the arrays still had unaligned access. The only thing that I could do so that I achieved both aligned access and vectorization is— to use "!dir$ simd vectorlength(4)" instead of "!$omp simd simdlen(4)".
Could someone please explain this matter?
Many thanks.
Best wishes,