I have a big set of code with OMP4.0 directives (target, simd...)
In one module the compiler throws lot's of warnings about "loops not vectorized with simd" although it should.
I cut the code down to the bare minimum that still produces this behaviour:
SUBROUTINE simdTest
IMPLICIT NONE
INTEGER :: i, j, k, sr, tn,nzb,nzt,nxl,nxr,nys,nyn
REAL :: s1, s2, s3, s4
REAL, DIMENSION(:,:,:), ALLOCATABLE :: u,v,pt,rmask,sums_l
REAL, DIMENSION(:,:), ALLOCATABLE :: usws,vsws,shf
!$omp parallel do schedule(runtime) private(s1,s2,s3)
DO k = nzb, nzt+1
!$omp simd collapse( 2 ) reduction( +: s1, s2, s3 )
DO i = nxl, nxr
DO j = nys, nyn
s1 = s1 + u(k,j,i) * rmask(j,i,sr)
s2 = s2 + v(k,j,i) * rmask(j,i,sr)
s3 = s3 + pt(k,j,i) * rmask(j,i,sr)
ENDDO
ENDDO
sums_l(k,1,tn) = s1
sums_l(k,2,tn) = s2
sums_l(k,4,tn) = s3
ENDDO
!$omp parallel do reduction( +: s1, s2, s3, s4) schedule(runtime)
DO i = nxl, nxr
DO j = nys, nyn
s1 = s1 + usws(j,i) * rmask(j,i,sr)
s2 = s2 + vsws(j,i) * rmask(j,i,sr)
s3 = s3 + shf(j,i) * rmask(j,i,sr)
s4 = s4 + 0.0
ENDDO
ENDDO
sums_l(nzb,12,tn) = s1
sums_l(nzb,14,tn) = s2
sums_l(nzb,16,tn) = s3
END SUBROUTINE
If you compile this with "ifort -openmp -O2" it will warn about the first loop. If you remove literally anything (even from second loop) it will vectorize.
Message from vec-report is "subscript to complex".
Could you explain that? IMO not vectorizing the first loop would lead to significant performance loss.