Hi,
I encountered a weird problem when I was trying to use SIMD directives and AVX instructions to vectorize my loops.
I wrote the following small program to illustrate this problem:
Program AVX_test REAL(kind = 8), Allocatable :: time_g(:), timeOld_g(:), dt_g(:) INTEGER :: nCells nCells = 1000000 if (.not.allocated(time_g) .eqv. .true.) allocate(time_g(nCells)) if (.not.allocated(timeOld_g) .eqv. .true.) allocate(timeOld_g(nCells)) if (.not.allocated(dt_g) .eqv. .true.) allocate(dt_g(nCells)) !DIR$ SIMD do i = 1, nCells time_g(i) = timeOld_g(i) + dt_g(i) end do End Program AVX_test
Case1: The vector report looks good if I only use -O2 option:
login2$ ifort -O2 -vec-report6 avxtest.f90 -o avxtest avxtest.f90(14): (col. 5) remark: vectorization support: streaming store was generated for avx_test. avxtest.f90(14): (col. 5) remark: vectorization support: streaming store was generated for avx_test. avxtest.f90(14): (col. 32) remark: SIMD LOOP WAS VECTORIZED.
I found addpd instruction in the assembly file for line 14. So I think the loop has been vectorized with SSE2 vector instructions.
Case2: If I further add -xHost to use AVX instructions. The vector report will complain:
login2$ ifort -O2 -xHost -vec-report6 avxtest.f90 -o avxtest avxtest.f90(14): (col. 5) remark: vectorization support: streaming store was generated for avx_test. avxtest.f90(14): (col. 32) remark: SIMD LOOP WAS VECTORIZED. avxtest.f90(14): (col. 32) remark: loop was not vectorized: unsupported data type. avxtest.f90(14): (col. 32) warning #13379: loop was not vectorized with "simd"
This is confusing to me because it says yes (SIMD LOOP WAS VECTORIZED) and no (unsupported data type). I checked the
assembly file and found vaddpd instruction. But I am just not sure whether this loop has finally been vectorized. The message
"unsupported data type" is a little bit weird to me. The ifort version is 13.1.0.
Another quick question is that in the assembly files I also found addsd instruction in Case 1 and vaddsd instruction in Case 2 for
line 14. These should be scalar instructions right? If the loop has been vectorized, why there exist scalar instructions? Is it because
the remainder after loop unrolling?
I would truly appreciate your help and reply.
Best regards,
Wentao