I use Java 1.5 on an embedded Linux device and want to read a binary file with 2MB of int values. (now 4bytes Big Endian, but I can decide, the format)
Using DataInputStream
via BufferedInputStream
using dis.readInt()
), these 500 000 calls needs 17s to read, but the file read into one big byte buffer needs 5 seconds.
How can i read that file faster into one huge int[]?
The reading process should not use more than additionally 512 kb.
This code below using nio
is not faster than the readInt() approach from java io.
// asume I already know that there are now 500 000 int to read:
int numInts = 500000;
// here I want the result into
int[] result = new int[numInts];
int cnt = 0;
RandomAccessFile aFile = new RandomAccessFile("filename", "r");
FileChannel inChannel = aFile.getChannel();
ByteBuffer buf = ByteBuffer.allocate(512 * 1024);
int bytesRead = inChannel.read(buf); //read into buffer.
while (bytesRead != -1) {
buf.flip(); //make buffer ready for get()
while(buf.hasRemaining() && cnt < numInts){
// probably slow here since called 500 000 times
result[cnt] = buf.getInt();
cnt++;
}
buf.clear(); //make buffer ready for writing
bytesRead = inChannel.read(buf);
}
aFile.close();
inChannel.close();
Update: Evaluation of the answers:
On PC the Memory Map with IntBuffer approach was the fastest in my set up.
On the embedded device, without jit, the java.io DataiInputStream.readInt() was a bit faster (17s, vs 20s for the MemMap with IntBuffer)
Final Conclusion: Significant speed up is easier to achieve via Algorithmic change. (Smaller file for init)
To read from a binary fileUse the ReadAllBytes method, which returns the contents of a file as a byte array.
Input and output are much faster using binary data. Converting a 32-bit integer to characters takes time. Not a great deal of time, but if a file (such as an image file) contains millions of numbers the accumulated conversion time is significant.
2 Reading Binary Data. The steps involved in reading data from a binary file are the same as for reading data from a text file: Create an input stream and open the file, read the data, close the file. The main difference lies in the way you check for the end-of-file marker in a binary file.
The open() function opens a file in text format by default. To open a file in binary format, add 'b' to the mode parameter. Hence the "rb" mode opens the file in binary format for reading, while the "wb" mode opens the file in binary format for writing.
I don't know if this will be any faster than what Alexander provided, but you could try mapping the file.
try (FileInputStream stream = new FileInputStream(filename)) {
FileChannel inChannel = stream.getChannel();
ByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
int[] result = new int[500000];
buffer.order( ByteOrder.BIG_ENDIAN );
IntBuffer intBuffer = buffer.asIntBuffer( );
intBuffer.get(result);
}
You can use IntBuffer
from nio package -> http://docs.oracle.com/javase/6/docs/api/java/nio/IntBuffer.html
int[] intArray = new int[ 5000000 ];
IntBuffer intBuffer = IntBuffer.wrap( intArray );
...
Fill in the buffer, by making calls to inChannel.read(intBuffer)
.
Once the buffer is full, your intArray
will contain 500000 integers.
EDIT
After realizing that Channels only support ByteBuffer
.
// asume I already know that there are now 500 000 int to read:
int numInts = 500000;
// here I want the result into
int[] result = new int[numInts];
// 4 bytes per int, direct buffer
ByteBuffer buf = ByteBuffer.allocateDirect( numInts * 4 );
// BIG_ENDIAN byte order
buf.order( ByteOrder.BIG_ENDIAN );
// Fill in the buffer
while ( buf.hasRemaining( ) )
{
// Per EJP's suggestion check EOF condition
if( inChannel.read( buf ) == -1 )
{
// Hit EOF
throw new EOFException( );
}
}
buf.flip( );
// Create IntBuffer view
IntBuffer intBuffer = buf.asIntBuffer( );
// result will now contain all ints read from file
intBuffer.get( result );
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With