I have a working program (Fortran 77) that I’m trying to auto-parallelize on a cluster. The changes made in the compiling makefiles (abbreviated) are shown in the curly brackets below.
Build library:
FOR=ifort -c -O3 {-parallel}
LINK=ifort
PROG_DIR=/export/home/mydir/
HJS=$(PROG_DIR)/hjs
.f.o:
$(FOR) $<
rm libprog.a
$(FOR) $(HJS)/*.f
ar -rv $(PROG_DIR)/libprog.a *.o
rm *.o
Compile executable:
FOR=ifort -c -O3 {-parallel}
LINK=ifort {-openmp}
PROG_DIR=/export/home/mydir/
LIBPROG=-L$(PROG_DIR) -lprog
.f.o:
$(FOR) $(OPT) $<
prog_hjs.o: prog_hjs.f
$(FOR) prog_hjs.f
prog_hjs: prog_hjs.o
$(LINK) -o prog_hjs prog_hjs.o $(LIBPROG)
The program runs at the same speed irrespective if 4, 8 or 16 cpus are given to it (and it appears that it can only access 1 node (i.e.up to 8 cpus)). What is missing? Is anything more required in the compilation above?
↧
Auto-parallelized F77 code runs but no speedup
↧