I am looking at porting an application to the cloud, more speficially I am looking at Amazon EC2 or Google GCE.
My app heavily uses Linux's mmap to memory map large read-only files and I I would like to understand how mmap would actually work when a file is on the EBS volume.
I would specifically like to know what happens when I call mmap as EBS appears to be a black-box. Also, are the benefits negated?
I can speak for GCE Persistent Disks. It behaves pretty much in the same way a physical disk would. At a high level, pages are faulted in from disk as mapped memory is accessed. Depending on your access pattern these pages might be loaded one by one, or in a larger quantity when readahead kicks in. As the file system cache fills up, old pages are discarded to give space to new pages, writing out dirty pages if needed.
One thing to keep in mind with Persistent Disk is that performance is proportional to disk size. So you'd need to estimate your throughput and IOPS requirements to ensure you get a disk with enough performance for your application. You can find more details here: Persistent disk performance.
Is there any aspect of mmap that you're worried about? I would recommend to write a small app that simulates your workload and test it before deciding to migrate your application.
~ Fabricio.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With