I realize that F90 gives us some array operations but just trying to figure this out. Old school thinking has us looping over the last array index in the outer most loop to address memory consecutively.
The results I'm getting are not what I expect. With default optimization I used -opt-report and for the "slow" code the compiler is optimizing and switching the order of the loops. For the "fast" code (where I loop over the last index first) it does not and that runs *slower*. What is going on? If I set -O0 then I get the expected result, code below runs faster with j in outer loop.
Source codes attached.
What do I take away from this? Should we not try and be smart about the index order in loops? Thanks for any insight.
integer ndimi,ndimj,ntimes
parameter (ndimi=2000, ndimj=3000, ntimes=1000)
integer x(ndimi,ndimj),y(ndimi,ndimj), i,j,k
integer timesec1, timesec2
call system_clock(timesec1)
print *, 'time: ', timesec1
do k = 1,ntimes
do j=1,ndimj
do i=1,ndimi
x(i,j) = 5
y(i,j) = 6
x (i,j) = x(i,j) * y(i,j)
end do
end do
end do
call system_clock(timesec2)
print *, 'time: ',timesec2
print *, 'diff: ' ,timesec2 - timesec1
end program
ifort (IFORT) 12.1.6 20130222
ifort -mcmodel=medium -shared-intel -opt-report loopindex_slow.f >& report_slow.txt
./a.out
time: 2033097649
time: 2033115630
diff: 17981
ifort -mcmodel=medium -shared-intel -opt-report loopindex.f > & report.txt
./a.out
time: 2033245879
time: 2033338024
diff: 92145
report_slow.txt has:
<loopindex_slow.f;10:10;hlo_linear_trans;MAIN__;0>
LOOP INTERCHANGE in loops at line: 10 12 13
Loopnest permutation ( 1 2 3 ) --> ( 3 1 2 )