I have a strange problem that looks like a "Memory Cache Leak" (not a memory leak).
Let me set the stage first. Reproducibly (using ganglia to monitor), on a cluster I have noticed that the cached memory is increasing, relatively slowly. When it becomes large, something like 2/3 of the total memory (Intel Gold with 32 cores & 192Gb) a program is running slower by about a factor of ~1.5. If I clear the cache and sync the disc (I have not tested which matter) with "sync ; echo 3 > /proc/sys/vm/drop_caches" the speed of the program increases back (~1.5 times faster).
The issue seems to be associated with I/O -- the relevant code uses mpi and only the core that is doing any I/O shows the cache leak. The program is doing a fair amount of I/O, but not massive amounts (10-40 Mb). I compile using ifort with -assume buffered_io. My suspicion is that may leave some cached files at the end, effectively a "cache leak".
Has anyone seen anything like this? I don't believe the amount of cached memory is supposed to matter with Linux -- but it does!
Are there any calls/flags/tricks that might remove this?
Any other ideas about how to probe it?