# mmap and RSS resource limit



## ta0kira (Aug 2, 2013)

I have a C/C++ program that mmaps two files that are about 7.1GB each, running on FreeBSD 9.1-RELEASE-p3. The program reads about 25k pages per second at random from one mmap and reads/writes about 2.5k pages per second to the other mmap. I've noticed that the process image will grow to fill its RSS limit very quickly, but it will stop growing after that. This happens on both ZFS (with L2ARC) and UFS2, and valgrind indicates that it isn't a memory leak. If I comment out the lines that access the mmaped data (100% floating-point arithmetic on array elements,) or if I operate on much smaller files, this doesn't happen. I therefore assume that the process caches mmaped pages in its own process image, within the memory limits imposed by the OS. Does that sound about right?

Is this something that should be documented? Intuitively, you mmap a file to save memory, but if doing so actually causes the process to take up whatever remaining memory is available, that can be a problem. Before this I thought, "I can set my RSS limit to 8GB because I'll know when I run a program that needs that much memory," and I'm sure other people out there think something similar. I admit, though, that it was lazy of me to not set soft limits in my .bashrc.

Thanks!

Kevin Barry


----------



## Uniballer (Aug 2, 2013)

This sounds exactly how I would expect it to work.  You said "map this file into my virtual address space", then you started reading from the mapped address space.  Naturally, at least part of the mapped file must become resident or your process could not operate on it.  You have exactly the same situation with the file you have mapped and are writing.

If you wanted to save memory you should open the files, seek to the appropriate spot, and read(2) or write(2) to them.  However, the performance of mapping the files and operating on them in your resident set of pages is likely to be much higher assuming that you have some locality or other repetition of reference (i.e. as long as every reference does not result in a page fault).  You are spending memory to save on disk I/O.


----------



## ta0kira (Aug 2, 2013)

Uniballer said:
			
		

> This sounds exactly how I would expect it to work.  You said "map this file into my virtual address space", then you started reading from the mapped address space.  Naturally, at least part of the mapped file must become resident or your process could not operate on it.  You have exactly the same situation with the file you have mapped and are writing.


I'm not just talking about address space. I expect the address space to reflect the sizes of the mapped files. I'm talking about the physical RAM allocated to the process.

For some reason I thought that most of the caching (other than a few pages) would be done by the filesystem driver (or in shared memory,) mainly because pages overwritten by another process need to be invalidated, and the kernel is the only source of that information. Also, because mmap is a system call, I thought it wouldn't involve an uncontrollable cache size in userland.





			
				Uniballer said:
			
		

> If you wanted to save memory you should open the files, seek to the appropriate spot, and read(2) or write(2) to them.  However, the performance of mapping the files and operating on them in your resident set of pages is likely to be much higher assuming that you have some locality or other repetition of reference (i.e. as long as every reference does not result in a page fault).  You are spending memory to save on disk I/O.


I limit the memory usage by lowering the RSS soft limit. The ZFS L2ARC helps keep the processing speed up when I do that, since it can fully cache both files.

While trying to nail down the cause of the expanding process image, I made the mistake of calling msync with MS_INVALIDATE on the read mmap. That made ZFS evict the cached pages from both ARC and L2ARC, which slowed the process down to about 1/100th the speed until the caches warmed back up again. That's a separate issue (that any user can instantly cause ZFS to evict a huge number pages,) but I thought it was an interesting anecdote to add to the thread.

Kevin Barry


----------



## Uniballer (Aug 2, 2013)

ta0kira said:
			
		

> For some reason I thought that most of the caching (other than a few pages) would be done by the filesystem driver (or in shared memory,) mainly because pages overwritten by another process need to be invalidated, and the kernel is the only source of that information. Also, because mmap is a system call, I thought it wouldn't involve an uncontrollable cache size in userland.I limit the memory usage by lowering the RSS soft limit.



My understanding is that the mapped file is treated as an additional segment with its own kernel data structures to represent it (see Design Overview of 4.4 BSD: 2.5 Memory Management for more.  The full design document has more details but I don't know where to find it online.).  I believe that multiple processes mapped to the same file would all reference this segment, and be charged for the resident part in their resident set sizes.



> The ZFS L2ARC helps keep the processing speed up when I do that, since it can fully cache both files.


Good to know.



> While trying to nail down the cause of the expanding process image, I made the mistake of calling msync with MS_INVALIDATE on the read mmap. That made ZFS evict the cached pages from both ARC and L2ARC, which slowed the process down to about 1/100th the speed until the caches warmed back up again. That's a separate issue (that any user can instantly cause ZFS to evict a huge number pages,) but I thought it was an interesting anecdote to add to the thread.



I think so too.  And it does not surprise me.


----------



## ta0kira (Aug 7, 2013)

ta0kira said:
			
		

> The ZFS L2ARC helps keep the processing speed up when I do that, since it can fully cache both files.


After I posted this I became paranoid that I was over-stating the benefit of the L2ARC, so I did a comparison. I ran the program once with a cold ARC and a cold L2ARC; it took 8.1 hours. I ran it again with a cold ARC and a _disabled_ L2ARC; it took 17.2 hours. That's 112% longer! This was the same process I described in the original post, with RSS limited to 4 GB. When I run this sort of process, the reads are generally 92% ARC, 7% L2ARC, and 1% pool.

Kevin Barry


----------

