I am working with a numerical model application that somewhere in past months started to segfault on ifort on Linux. The application normally runs in MPI and links to some libraries (hdf5,netcdf,parmetis), although the behavior I'm writing about occurs with one processor and doesn't require mpiexec to reproduce.
When I use -traceback, the segmentation fault is attributed to more or less the samesubroutine every time. The line number varies, though, with the options and system that I am on (our own cluster and SDSC comet for instance). On my system I am working with ifort 14.0.1, a fairly close match to the one on Comet. I always set ulimit -s unlimited and my base compile options for getting a trace are: -O2 -debug extended -traceback. Many of the arrays are allocated dynamically on the heap manually.
As to what I have tried ...
-check bounds: does not produce a warning but eliminates the segfault crash on our cluster and Comet (same for uninit)
-mcmodel medium: eliminates the segfault on the head node of our cluster but not on comet. I did this with just my code, not the libraries -- that might not be kosher?
random print statements: often eliminate the segfault
-heap-arrays: no effect
-O3: eliminates the segfault, on our cluster
None of the diagnostics I've tried has produced a complaint I could really figure out. Can anyone think of a way to get more info? Constructing a minimal example is onerous -- it is a big production code and everytime I change one line it seems to affect reproducibility. I've tried Intel Inspector XE 2013 but it just hangs at startup, which I suppose is material for a different post. Is valgrind appropriate? We could upgrade of course, but there is a lot of startup cost to that and a lot of the work is destined for Comet.
Thanks,
Eli