Hello all,
This post is regarding an issue about compiling the routine that performs
the Jacobian calculation (chem_spack_jacdchemdc.f90), in a Numerical Weather Prediction model, named BRAMS (http://brams.cptec.inpe.br/).
BRAMS is based on the Regional Atmospheric Modeling System (RAMS) originally
developed at CSU/USA. BRAMS software is under a free license (CC-GPL).
This routine is one of the hotspots of chemistry module in BRAMS, and we are
trying to accelerate its performance. The routine was decoupled from
BRAMS, so we worked without the need to run with the forecast model,
and is now called 'chem_spack_jacdchemdc_offline.f90'.
There were two versions of the code 'chem_spack_jacdchemdc_offline.f90': main and function.
In the main version, the large loop is in the main program itself.
In the function version, this loop is in a function, which is called
by the main program, as it is done in BRAMS.
The two versions were compiled with Intel (2016 and 2017), gcc 5.3
and pgi 16.5. The times obtained are in the attached worksheet:
Only with Intel (2016 and 2017) the executable generated with -O3 in the function
version, could not optimize as well as in the main version.
The source codes are available at
http://www.lncc.br/~rpsouto/brams/chem_spack_jacdchemdc_offline.tar.
This is a case with chemistry scheme (RELACS_TUV) containing 47 species.
We are suspecting that it may be related to the size of the main loop
in this routine. We found Intel's report about this issue:
https://software.intel.com/en-us/ARTICLES/INTERNAL-THRESHOLD-WAS-EXCEEDED
For example, for the attached code 'chem_spack_jacdchemdc.f90', which
calculates the Jacobian for 72 chemical species (RACM_TUV scheme),
and has a loop with more than 2000 rows, returns the following
message when compiling:
$ ifort -O3 -c chem_spack_jacdchemdc.f90
Space exceeded in Data Dependence Test in jacdchemdc_
Subdivide routine into smaller ones to avoid optimization loss
Although this message does not occur with RELACS_TUV (loop of about
1000 lines), this may be part of the explanation.
Thanks in advance,
Roberto Pinto Souto
HPC analyst at National Laboratory for Scientific Computing (LNCC/Brazil)