Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance / stability of a Memory Mapped file - Native or MappedByteBuffer - vs. plain ol' FileOutputStream

I support a legacy Java application that uses flat files (plain text) for persistence. Due to the nature of the application, the size of these files can reach 100s MB per day, and often the limiting factor in application performance is file IO. Currently, the application uses a plain ol' java.io.FileOutputStream to write data to disk.

Recently, we've had several developers assert that using memory-mapped files, implemented in native code (C/C++) and accessed via JNI, would provide greater performance. However, FileOutputStream already uses native methods for its core methods (i.e. write(byte[])), so it appears a tenuous assumption without hard data or at least anecdotal evidence.

I have several questions on this:

  1. Is this assertion really true? Will memory mapped files always provide faster IO compared to Java's FileOutputStream?

  2. Does the class MappedByteBuffer accessed from a FileChannel provide the same functionality as a native memory mapped file library accessed via JNI? What is MappedByteBuffer lacking that might lead you to use a JNI solution?

  3. What are the risks of using memory-mapped files for disk IO in a production application? That is, applications that have continuous uptime with minimal reboots (once a month, max). Real-life anecdotes from production applications (Java or otherwise) preferred.

Question #3 is important - I could answer this question myself partially by writing a "toy" application that perf tests IO using the various options described above, but by posting to SO I'm hoping for real-world anecdotes / data to chew on.

[EDIT] Clarification - each day of operation, the application creates multiple files that range in size from 100MB to 1 gig. In total, the application might be writing out multiple gigs of data per day.

like image 293
noahlz Avatar asked Feb 11 '09 15:02

noahlz


People also ask

Are memory mapped files faster?

Accessing memory mapped files is faster than using direct read and write operations for two reasons. Firstly, a system call is orders of magnitude slower than a simple change to a program's local memory.

What are the disadvantages of memory-mapped IO?

But there are also disadvantages: An I/O error on a memory-mapped file cannot be caught and dealt with by SQLite. Instead, the I/O error causes a signal which, if not caught by the application, results in a program crash.

What is the advantage of memory-mapped?

The principal benefits of memory-mapping are efficiency, faster file access, the ability to share memory between applications, and more efficient coding.


2 Answers

Memory mapped I/O will not make your disks run faster(!). For linear access it seems a bit pointless.

A NIO mapped buffer is the real thing (usual caveat about any reasonable implementation).

As with other NIO direct allocated buffers, the buffers are not normal memory and wont get GCed as efficiently. If you create many of them you may find that you run out of memory/address space without running out of Java heap. This is obviously a worry with long running processes.

like image 114
Tom Hawtin - tackline Avatar answered Oct 17 '22 05:10

Tom Hawtin - tackline


You might be able to speed things up a bit by examining how your data is being buffered during writes. This tends to be application specific as you would need an idea of the expected data writing patterns. If data consistency is important, there will be tradeoffs here.

If you are just writing out new data to disk from your application, memory mapped I/O probably won't help much. I don't see any reason you would want to invest time in some custom coded native solution. It just seems like too much complexity for your application, from what you have provided so far.

If you are sure you really need better I/O performance - or just O performance in your case, I would look into a hardware solution such as a tuned disk array. Throwing more hardware at the problem is often times more cost effective from a business point of view than spending time optimizing software. It is also usually quicker to implement and more reliable.

In general, there are a lot of pitfalls in over optimization of software. You will introduce new types of problems to your application. You might run into memory issues/ GC thrashing which would lead to more maintenance/tuning. The worst part is that many of these issues will be hard to test before going into production.

If it were my app, I would probably stick with the FileOutputStream with some possibly tuned buffering. After that I'd use the time honored solution of throwing more hardware at it.

like image 35
Gary Avatar answered Oct 17 '22 05:10

Gary