I have made a small testcase out of a bigger program where the compiler fails to vectorize a loop due to "assumed dependency" although one used the loop simd statement:
program fortTest integer, allocatable :: vals(:) integer, allocatable :: vals2(:) integer, allocatable :: send(:) integer i,j,ct,tmp,tmp2,tmp3 ct=10000 allocate(vals(ct*ct)) allocate(vals2(ct*ct)) allocate(send(ct)) do i=1, ct send(i)=i do j=1, ct vals2(i*ct+j)=i+1+j end do end do !$omp parallel do simd private(tmp,tmp2,tmp3) schedule(runtime) do i=1, ct tmp = vals2(i) * 2 tmp2 = vals2(i+ct) * 2 tmp3 = vals2(i+2*ct) * 2 vals(send(i)) = tmp+tmp2 vals(send(i)+i*ct) = tmp+tmp3 end do end
Please note, that once u remove the "schedule(runtime)" clause, the loop gets vectorized and my (big) program receives a 4x speedup