My java program spends most time by reading some files and I want to optimize it, e.g., by using concurrency, prefetching, memory mapped files, or whatever.
Optimizing without benchmarking is a non-sense, so I benchmark. However, during the benchmark the whole file content gets cached in RAM, unlike in the real run. Thus the run-times of the benchmark are much smaller and most probably unrelated to the reality.
I'd need to somehow tell the OS (Linux) not to cache the file content, or better to wipe out the cache before each benchmark run. Or maybe consume most of the available RAM (32 GB), so that only a tiny fraction of the file content fits in. How to do it?
I'm using caliper for benchmarking, but in this case I don't think its necessary (it's by no means a microbenchmark) and I'm not sure it's a good idea.
Clear the Linux file cache
sync && echo 1 > /proc/sys/vm/drop_caches
Create a large file that uses all your RAM
dd if=/dev/zero of=dummyfile bs=1024 count=LARGE_NUMBER
(don't forget to remove dummyfile
when done).
You can create a very large file and then delete it. This will clear the disk cache.
Another way to test the performance is to read a file(s) which is larger than your main memory.
Either way, what you are testing is the performance of your hardware. To improve this you need to improve your hardware, there is only so much you can do in software. e.g. multiple threads won't make your disks spin faster. ;)
Windows NT http://research.microsoft.com/pubs/68479/seqio.doc
When doing sequential scans, NT makes 64KB prefetch requests
From Linux http://www.ece.eng.wayne.edu/~sjiang/Tsinghua-2010/linux-readahead.pdf
Sequential prefetching, also known as readahead in Linux, is a widely deployed technique to bridge the huge gap between the characteristics of storage devices and their inefficient ways of usage by applications
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With