ifort 13.0, 14.0 coarray extremly slow read/write between nodes

This is my test code: $ cat ca_check.f90 program z implicit none integer :: x(10)[*], img, nimgs, i real :: time1, time2 img = this_image() nimgs = num_images() x = img if (img .eq. 1) then do i=1,nimgs call cpu_time(time1) x = x(:)[i] call cpu_time(time2) write (*,"(a,f)") "Remote read took, s : ", time2-time1 call cpu_time(time1) x(:)[i] = x call cpu_time(time2) write (*,"(a,f)") "Remote write took, s : ", time2-time1 write (*,"(99999(i0,tr1))") x end do end if sync all write (*,"(a,i0,a,i0,a)") "Image: ", img, " out of ", nimgs, "completed ok" end program z $ Compiled with: ifort -o ca_check.xcack ca_check.f90 -coarray=distributed -coarray-config-file=ca.conf -debug full -warn all $ cat ca.conf -envall -n 64 ./ca_check.xcack $ $ cat zpbs #!/bin/sh #PBS -l walltime=00:01:00,nodes=4:ppn=16 #PBS -j oe #PBS -m abe cd $HOME/nobackup/cgpack/branches/coarray/tests echo "LD_LIBRARY_PATH: " $LD_LIBRARY_PATH > zzz echo "which mpirun: " `which mpirun` >> zzz export I_MPI_DAPL_PROVIDER=ofa-v2-ib0 mpdboot --rsh=ssh --file=$PBS_NODEFILE -n 4 mpdtrace -l >> zzz cm-launcher ./ca_check.xcack >> zzz mpdallexit $ $ cat zzz LD_LIBRARY_PATH: /cm/shared/apps/torque/4.2.4.1/lib:/cm/shared/apps/moab/7.2.2/lib:/cm/shared/tools/subversion-1.8.4/lib:/cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/compiler/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/compiler/lib/intel64:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/lib/intel64 which mpirun: /cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/bin/mpirun node32-035_47536 (10.131.0.179) node33-002_50475 (10.131.0.98) node33-003_55287 (10.131.0.99) node34-006_42324 (10.131.0.54) Remote read took, s : 0.0010000 Remote write took, s : 0.0000000 1 1 1 1 1 1 1 1 1 1 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 0 0 0 0 0 0 0 0 0 0 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 3 3 3 3 3 3 3 3 3 3 Remote read took, s : 0.0000000 Remote write took, s : 0.0010000 0 0 0 0 0 0 0 0 0 0 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 5 5 5 5 5 5 5 5 5 5 Remote read took, s : 0.0000000 Remote write took, s : 0.0010000 0 0 0 0 0 0 0 0 0 0 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 7 7 7 7 7 7 7 7 7 7 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 0 0 0 0 0 0 0 0 0 0 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 9 9 9 9 9 9 9 9 9 9 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 10 10 10 10 10 10 10 10 10 10 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 11 11 11 11 11 11 11 11 11 11 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 12 12 12 12 12 12 12 12 12 12 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 13 13 13 13 13 13 13 13 13 13 Remote read took, s : 0.0009990 Remote write took, s : 0.0000000 14 14 14 14 14 14 14 14 14 14 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 15 15 15 15 15 15 15 15 15 15 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 16 16 16 16 16 16 16 16 16 16 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 17 17 17 17 17 17 17 17 17 17 Remote read took, s : 13.3259735 Remote write took, s : 12.9360342 18 18 18 18 18 18 18 18 18 18 Remote read took, s : 13.8728924 Remote write took, s : 12.5950813 19 19 19 19 19 19 19 19 19 19 Remote read took, s : 14.5117950 Remote write took, s : 12.9060364 20 20 20 20 20 20 20 20 20 20 $ Note that: - values read from processors 2,4,6,8 are just wrong. They are all zero, but must be equal to the processor number. - There are 16 cores in a node. Read/write to/from the first 16 processors are very fast, <1us. Read/write to/from processor 17, which probably is the first processor in another node, is still fast, but every other processor beyond that takes over 10 seconds for read or write. I've checked with both 13.0 and 14.0. I'm happy to provide further details of MPI setup. Thanks Anton

ifort 13.0, 14.0 coarray extremly slow read/write between nodes

Trending Articles

Scuffham Amps - S-GEAR 2.6.0 VST, AAX, STANDALONE x86 x64 (R2R NO iLok2, +NO...

Practice Sheet of Right form of verbs for HSC Students

VHSE First (1st) Allotment 2025 - vhscap.kerala.gov.in

UNIVERSE LEAGUE – UNIVERSE LEAGUE – WAR (We Are Ready) – EP [iTunes Plus M4A]

City Hunter Teledrama – Episode 18 – 07th May 2016

Comment on Proposed Criteria for Identifying Predatory Conferences by Luke...

Bureau of Internal Revenue: Regional Offices (Directory)

Kendrick Lamar – Not Like Us (2024) [24Bit-88.2kHz] [PMEDIA] ⭐️

Inception 2010 Hindi Dual Audio 650MB BRRip 720p ESubs HEVC

East Hull MD admits sexual assaults after another victim comes forward

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

R. v. Sargeant, 2023 ONSC 6406 (CanLII)

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Who’s been sentenced at Northampton Magistrates’ Court

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Family cries out as traditional ruler allegedly abducts brother, extorts N2.5m

Long-Running Conflict In Springfield (MA) Gangland Sphere Has Manzi Family &...

Wondershare Filmora X v10.1.20.16 x64

Man arrested after fracas in flat

Man charged in ongoing Sexual Assault Investigation Derek Nyilas, 46, Faces...