Part of my code has been vectorized using !$omp simd. Whenver I have the vectorization enabled I get an error saying " array index out of bounds". The code line it points I find quite random, since when I comment out that line the error persists and referring to another line.
In my loop I have a clause which contains which if true calls function A and iffalse calls function B. (Both functions also have a function call inside them). But all these functions have been inlined and declared simd. The point I want to make is if I comment out one of these function call ( the part of the clause which I KNOW the code won't process at run time because of my flag settings) the segmentation fault is delayed. If I comment out the other function (B - the one that is being called) then following two scenarios happen
1) If I also comment out function A EVEN THOUGH it is not being called , my program runs!
2) If I DON'T comment out function A (EVEN THOUGH IT IS NOT BEING CALLED) my program complains about an "array index out of bounds"
I did have -traceback enabled. But that is completely useless.
I even did write a clause saying if the index gets larger than the array size, then skip that loop (CYCLE). However, I am 100% sure that my array index does not go out of bounds, unless the vectorization is doing something I am not aware about.
I don't know if this is useful
when running with Valgrind
I get numerous errors messages which are quite identical (only when running on the case that the prorgram actually fails)
I first get this error :
==2883== Invalid read of size 8 ==2883== at 0x44C200: lpt_particles_mp_displu_ (in lpt.x) ==2883== by 0x43AC6E: lpt_marching_mp_unsteady_spray_steady_flow_ (in lpt.x) ==2883== by 0x41ED25: MAIN__ (in lpt.x) ==2883== by 0x403761: main (in lpt.x) ==2883== Address 0xc001077a90872154 is not stack'd, malloc'd or (recently) free'd
Then I get followings erros ( which are probably due to the fact that I have not be deallocated and the program crashed, correct me if I am wrong
==31838== 262,144 bytes in 1 blocks are still reachable in loss record 137 of 137 ==31838== at 0x4C2C1E0: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==31838== by 0xDDF8C9A: ??? ==31838== by 0xDDF707B: ??? ==31838== by 0xDDEF425: ??? ==31838== by 0xDDEF797: ??? ==31838== by 0x5421EDB: fi_endpoint (fi_endpoint.h:156) ==31838== by 0x5421EDB: ??? (ofi_init.h:1733) ==31838== by 0x5429F08: MPIDI_NM_mpi_init_hook (ofi_init.h:1117) ==31838== by 0x5429F08: MPID_Init (ch4_init.h:855) ==31838== by 0x5429F08: MPIR_Init_thread (initthread.c:647) ==31838== by 0x541DD1B: PMPI_Init (init.c:284) ==31838== by 0xC611CFA: MPI_INIT (initf.c:275) ==31838== by 0x4481EF: lpt_parallel_mp_parallel_init_ (in lpt.x) ==31838== by 0x41ED0D: MAIN__ (in lpt.x) ==31838== by 0x403761: main (in lpt.x)
Then I get several of these :
==31838== 28,517,032 bytes in 1 blocks are possibly lost in loss record 137 of 137 ==31838== at 0x4C2A0B0: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==31838== by 0x53465F: _mm_malloc (in lpt.x) ==31838== by 0x4B9680: for_alloc_allocatable (in lpt.x) ==31838== by 0x429CCF: lpt_geom_mp_tri_normals_ (in lpt.x) ==31838== by 0x436FF6: lpt_init_mp_spray_init_ (in lpt.x) ==31838== by 0x466EFC: lpt_preprocessor_mp_preproc_ (in lpt.x) ==31838== by 0x41ED17: MAIN__ (in lpt.x) ==31838== by 0x403761: main (in pt.x)
I really know not showing the code makes it much difficult, but it would be insane for me to put the entire code here which is very large. Trying to simplify the problem and yet producing this bug has not been successful yet. It is very difficult to do so when not knowing an entire thing on where the error is.
A suggestion: Could it be that I am exhausting my vectorisation register.
I have tried to compile with -xcore-AVX2 -align array32byte -qopt-zmm-usage=high and AVX512 -align array64byte -qopt-zmm-usage=high.
I would really appreciate if somebody have experienced similar issue or could indicate potential reasons for this error,
Please notice again, I have run this in full debug mode and fully optimised (-O3) but not having vectorisation. Nor did the compiler complain or when running with valgrind. Everything just seemed fine?