Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java: InputStream too slow to read huge files

I have to read a 53 MB file character by character. When I do it in C++ using ifstream, it is completed in milliseconds but using Java InputStream it takes several minutes. Is it normal for Java to be this slow or am I missing something?

Also, I need to complete the program in Java (it uses servlets from which I have to call the functions which process these characters). I was thinking maybe writing the file processing part in C or C++ and then using Java Native Interface to interface these functions with my Java programs... How is this idea?

Can anyone give me any other tip... I seriously need to read the file faster. I tried using buffered input, but still it is not giving performance even close to C++.

Edited: My code spans several files and it is very dirty so I am giving the synopsis

import java.io.*;

public class tmp {
    public static void main(String args[]) {
        try{
        InputStream file = new BufferedInputStream(new FileInputStream("1.2.fasta"));
        char ch;        
        while(file.available()!=0) {
            ch = (char)file.read();
                    /* Do processing */
            }
        System.out.println("DONE");
        file.close();
        }catch(Exception e){}
    }
}
like image 915
pflz Avatar asked May 06 '12 20:05

pflz


People also ask

What is the difference between InputStream and FileInputStream?

There is no real difference. FileInputStream extends InputStream , and so you can assign an InputStream object to be a FileInputStream object. In the end, it's the same object, so the same operations will happen. This behavior is called Polymorphism and is very important in Object-Oriented Programming.

Why is BufferedInputStream faster?

With a BufferedInputStream , the method delegates to an overloaded read() method that reads 8192 amount of bytes and buffers them until they are needed. It still returns only the single byte (but keeps the others in reserve). This way the BufferedInputStream makes less native calls to the OS to read from the file.

How do you read all bytes from InputStream?

Since Java 9, we can use the readAllBytes() method from InputStream class to read all bytes into a byte array. This method reads all bytes from an InputStream object at once and blocks until all remaining bytes have read and end of a stream is detected, or an exception is thrown.


2 Answers

I ran this code with a 183 MB file. It printed "Elapsed 250 ms".

final InputStream in = new BufferedInputStream(new FileInputStream("file.txt"));
final long start = System.currentTimeMillis();
int cnt = 0;
final byte[] buf = new byte[1000];
while (in.read(buf) != -1) cnt++;
in.close();
System.out.println("Elapsed " + (System.currentTimeMillis() - start) + " ms");
like image 77
Marko Topolnik Avatar answered Sep 17 '22 11:09

Marko Topolnik


I would try this

// create the file so we have something to read.
final String fileName = "1.2.fasta";
FileOutputStream fos = new FileOutputStream(fileName);
fos.write(new byte[54 * 1024 * 1024]);
fos.close();

// read the file in one hit.
long start = System.nanoTime();
FileChannel fc = new FileInputStream(fileName).getChannel();
ByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
while (bb.remaining() > 0)
    bb.getLong();
long time = System.nanoTime() - start;
System.out.printf("Took %.3f seconds to read %.1f MB%n", time / 1e9, fc.size() / 1e6);
fc.close();
((DirectBuffer) bb).cleaner().clean();

prints

Took 0.016 seconds to read 56.6 MB
like image 21
Peter Lawrey Avatar answered Sep 17 '22 11:09

Peter Lawrey