Quantcast
Channel: Intel® Fortran Compiler for Linux* and macOS*
Viewing all articles
Browse latest Browse all 2583

Data alignment for supporting more efficient vectorization

$
0
0

I am testing on an AVX machine. The code somehow looks like this:

 include "Sub_Prog_1.f90"
 include "Sub_Prog_2.f90"

 program MyCode

 use Sub_Prog_1_Mod
 use Sub_Prog_2_Mod
 
 implicit none
                                                    
 integer, parameter :: dp = selected_real_kind(15,307), dp2 =selected_real_kind(15,307)    
 integer  :: array_size, i, j, k, l
 integer, dimension(:), allocatable :: idx  
 real(kind = dp2) :: time1, time2, omp_get_wtime
 real(kind = dp), dimension(:), allocatable :: a, b, c 

 array_size = 100000000  !assuming I read from an input file, here I just wrote like this

 allocate ( idx(array_size), a(array_size), b(array_size), c(array_size) ) 

   ! Initialization
       do i = 1, array_size
          a(i) = dble(i)   ;   b(i) = dble(i * 2)   ;   idx(i) = array_size - i + 1
       end do

   time1 = omp_get_wtime()

      !$omp parallel
       do i = 1, 10

          call Sub_Prog_1 ( array_size, idx, a, b )
          call Sub_Prog_2 ( array_size, a, b, c ) 

       end do
      !$omp end parallel

   time2 = omp_get_wtime()

   print *, c(8000000)
   print *, 'Results =', time2 - time1

 end program MyCode

!==================================================================

 subroutine Sub_Prog_1 ( array_size, idx, a, b )

 implicit none

 integer, parameter :: dp = selected_real_kind(15,307), dp2 = selected_real_kind(15,307)
 integer :: array_size, i, j, k, l
 integer, dimension(:), allocatable :: idx    
 real(kind = dp), dimension(:), allocatable :: a, b, c    
      
       !$omp do private(i) schedule(runtime)
       !dir$ vector aligned 
       !$omp simd simdlen(4)
        do i = 1, array_size   
            a(i) = a(idx(i)) + dble(i)
                if (a(i) <= 3000.0d+0) then
                     a(i) = dble(idx(i)) / 200.0d+0
                end if 
            b(i) = sqrt(b(i)) + dble(i * 2)
        end do
       !$omp end simd
       !$omp end do

    end subroutine Sub_Prog_1

!==================================================================

 subroutine Sub_Prog_2 ( array_size, a, b, c )

 implicit none

 integer, parameter :: dp = selected_real_kind(15,307), dp2 = selected_real_kind(15,307)  
 integer  :: array_size, i, j, k, l
 real(kind = dp), dimension(:), allocatable :: a, b, c    

       !$omp do private(i) schedule(runtime)
       !dir$ vector aligned
       !$omp simd simdlen(4)
        do i = 1, array_size 
            c(i) = a(i) + sqrt(b(i)) / 3.67d+0
               if (c(i) <= 350.0d+0) then
                     c(i) = a(i) + sqrt(b(i)) / 8.67d+0
               end if 
        end do
       !$omp end simd
       !$omp end do       

 end subroutine Sub_Prog_2

I wanted to exploit the ability of the Intel Compiler 19 for applications of aligned data access for efficient vectorization. Thus, I compiled using the flags "ifort -O3 -qopt-report5 -qopenmp -align array32byte -xAVX -o MyCode.exe Main.f90". Now, I have two questions.

  1. I was wondering why I cannot combine !dir$ vector aligned and !$omp simd simdlen(...) like written above as the compiler always showed me a message like this:
    Sub_Prog_1.f90(17): catastrophic error: **Internal compiler error: internal abort** Please report this error along with the circumstances in which it occurred in a Software Problem Report.  Note: File and line given may not be explicit cause of this error.
    compilation aborted for Main.f90 (code 1)
  2. As I actually prefer using OpenMP directives to Intel one, I was also previously using the directives "!$omp simd simdlen(4) aligned(a,b,idx :32)" and "!$omp simd simdlen(4) aligned(a,b,c :32)" for the first and second subroutines, respectively. However, as I saw the vectorization reports, I found that the arrays still had unaligned access. The only thing that I could do so that I achieved both aligned access and vectorization is— to use "!dir$ simd vectorlength(4)" instead of "!$omp simd simdlen(4)". 

Could someone please explain this matter?

Many thanks.

Best wishes,

 

 

 

 


Viewing all articles
Browse latest Browse all 2583

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>