All,
I have some standard code that I OMPized (and MPIized, and GPUized and MICized and...) and figured I should try DO CONCURRENT as well. Now, my first naive attempt was to replace:
!$omp parallel do default(private) & !$omp shared(m,np,ict,icb,nb,overcast) & ... !$omp shared(caib, caif) RUN_LOOP: do i=1,m
with:
RUN_LOOP: do concurrent (i=1:m)
Now, in doing so, the code does run, but it isn't parallel at all. I can setenv OMP_NUM_THREADS to 4 or 28 and no difference in speed.
This was compiling with -qopenmp. In my desire to make some effect, I tried using -qopenmp -parallel. Now, this definitely spawned threads, but it did so in a bad way: OMP_NUM_THREADS=1 took ~5 seconds, OMP_NUM_THREADS=4 took ~12 seconds.
So, is there a nice standard treatise/tutorial on how to take a code that works with OpenMP and convert to use DO CONCURRENT?