Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to read huge number of int from binary file

Tags:

java

io

nio

I use Java 1.5 on an embedded Linux device and want to read a binary file with 2MB of int values. (now 4bytes Big Endian, but I can decide, the format)

Using DataInputStream via BufferedInputStream using dis.readInt()), these 500 000 calls needs 17s to read, but the file read into one big byte buffer needs 5 seconds.

How can i read that file faster into one huge int[]?

The reading process should not use more than additionally 512 kb.

This code below using nio is not faster than the readInt() approach from java io.

    // asume I already know that there are now 500 000 int to read:
    int numInts = 500000;
    // here I want the result into
    int[] result = new int[numInts];
    int cnt = 0;

    RandomAccessFile aFile = new RandomAccessFile("filename", "r");
    FileChannel inChannel = aFile.getChannel();

    ByteBuffer buf = ByteBuffer.allocate(512 * 1024);

    int bytesRead = inChannel.read(buf); //read into buffer.

    while (bytesRead != -1) {

      buf.flip();  //make buffer ready for get()

      while(buf.hasRemaining() && cnt < numInts){
       // probably slow here since called 500 000 times
          result[cnt] = buf.getInt();
          cnt++;
      }

      buf.clear(); //make buffer ready for writing
      bytesRead = inChannel.read(buf);
    }


    aFile.close();
    inChannel.close();

Update: Evaluation of the answers:

On PC the Memory Map with IntBuffer approach was the fastest in my set up.
On the embedded device, without jit, the java.io DataiInputStream.readInt() was a bit faster (17s, vs 20s for the MemMap with IntBuffer)

Final Conclusion: Significant speed up is easier to achieve via Algorithmic change. (Smaller file for init)

like image 641
AlexWien Avatar asked Apr 15 '13 18:04

AlexWien


People also ask

How do I read a full binary file?

To read from a binary fileUse the ReadAllBytes method, which returns the contents of a file as a byte array.

Is reading a binary file is faster than reading a text file?

Input and output are much faster using binary data. Converting a 32-bit integer to characters takes time. Not a great deal of time, but if a file (such as an image file) contains millions of numbers the accumulated conversion time is significant.

How do you read and display the data from a binary file binary data?

2 Reading Binary Data. The steps involved in reading data from a binary file are the same as for reading data from a text file: Create an input stream and open the file, read the data, close the file. The main difference lies in the way you check for the end-of-file marker in a binary file.

How do I read a whole binary file in Python?

The open() function opens a file in text format by default. To open a file in binary format, add 'b' to the mode parameter. Hence the "rb" mode opens the file in binary format for reading, while the "wb" mode opens the file in binary format for writing.


2 Answers

I don't know if this will be any faster than what Alexander provided, but you could try mapping the file.

    try (FileInputStream stream = new FileInputStream(filename)) {
        FileChannel inChannel = stream.getChannel();

        ByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
        int[] result = new int[500000];

        buffer.order( ByteOrder.BIG_ENDIAN );
        IntBuffer intBuffer = buffer.asIntBuffer( );
        intBuffer.get(result);
    }
like image 196
Michael Krussel Avatar answered Oct 22 '22 08:10

Michael Krussel


You can use IntBuffer from nio package -> http://docs.oracle.com/javase/6/docs/api/java/nio/IntBuffer.html

int[] intArray = new int[ 5000000 ];

IntBuffer intBuffer = IntBuffer.wrap( intArray );

...

Fill in the buffer, by making calls to inChannel.read(intBuffer).

Once the buffer is full, your intArray will contain 500000 integers.

EDIT

After realizing that Channels only support ByteBuffer.

// asume I already know that there are now 500 000 int to read:
int numInts = 500000;
// here I want the result into
int[] result = new int[numInts];

// 4 bytes per int, direct buffer
ByteBuffer buf = ByteBuffer.allocateDirect( numInts * 4 );

// BIG_ENDIAN byte order
buf.order( ByteOrder.BIG_ENDIAN );

// Fill in the buffer
while ( buf.hasRemaining( ) )
{
   // Per EJP's suggestion check EOF condition
   if( inChannel.read( buf ) == -1 )
   {
       // Hit EOF
       throw new EOFException( );
   }
}

buf.flip( );

// Create IntBuffer view
IntBuffer intBuffer = buf.asIntBuffer( );

// result will now contain all ints read from file
intBuffer.get( result );
like image 38
Alexander Pogrebnyak Avatar answered Oct 22 '22 09:10

Alexander Pogrebnyak