Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read large amount of data from file in Java

I've got text file that contains 1 000 002 numbers in following formation:

123 456
1 2 3 4 5 6 .... 999999 100000

Now I need to read that data and allocate it to int variables (the very first two numbers) and all the rest (1 000 000 numbers) to an array int[].

It's not a hard task, but - it's horrible slow.

My first attempt was java.util.Scanner:

 Scanner stdin = new Scanner(new File("./path"));
 int n = stdin.nextInt();
 int t = stdin.nextInt();
 int array[] = new array[n];

 for (int i = 0; i < n; i++) {
     array[i] = stdin.nextInt();
 }

It works as excepted but it takes about 7500 ms to execute. I need to fetch that data in up to several hundred of milliseconds.

Then I tried java.io.BufferedReader:

Using BufferedReader.readLine() and String.split() I got the same results in about 1700 ms, but it's still too many.

How can I read that amount of data in less that 1 second? The final result should be equal to:

int n = 123;
int t = 456;
int array[] = { 1, 2, 3, 4, ..., 999999, 100000 };

According to trashgod answer:

StreamTokenizer solution is fast (takes about 1400 ms) but it's still too slow:

StreamTokenizer st = new StreamTokenizer(new FileReader("./test_grz"));
st.nextToken();
int n = (int) st.nval;

st.nextToken();
int t = (int) st.nval;

int array[] = new int[n];

for (int i = 0; st.nextToken() != StreamTokenizer.TT_EOF; i++) {
    array[i] = (int) st.nval;
}

PS. There is no need for validation. I'm 100% sure that data in ./test_grz file is correct.

like image 900
Crozin Avatar asked Apr 22 '10 17:04

Crozin


People also ask

How do I read large files?

To be able to open such large CSV files, you need to download and use a third-party application. If all you want is to view such files, then Large Text File Viewer is the best choice for you. For actually editing them, you can try a feature-rich text editor like Emacs, or go for a premium tool like CSV Explorer.

How do you handle big data in Java?

Provide more memory to your JVM (usually using -Xmx / -Xms ) or don't load all the data into memory. For many operations on huge amounts of data there are algorithms which don't need access to all of it at once. One class of such algorithms are divide and conquer algorithms.

How read data from line from file in Java?

Java Read File line by line using BufferedReader We can use java. io. BufferedReader readLine() method to read file line by line to String. This method returns null when end of file is reached.


2 Answers

Thanks for every answer but I've already found a method that meets my criteria:

BufferedInputStream bis = new BufferedInputStream(new FileInputStream("./path"));
int n = readInt(bis);
int t = readInt(bis);
int array[] = new int[n];
for (int i = 0; i < n; i++) {
    array[i] = readInt(bis);
}

private static int readInt(InputStream in) throws IOException {
    int ret = 0;
    boolean dig = false;

    for (int c = 0; (c = in.read()) != -1; ) {
        if (c >= '0' && c <= '9') {
            dig = true;
            ret = ret * 10 + c - '0';
        } else if (dig) break;
    }

    return ret;
}

It requires only about 300 ms to read 1 mln of integers!

like image 163
Crozin Avatar answered Sep 23 '22 02:09

Crozin


StreamTokenizer may be faster, as suggested here.

like image 45
trashgod Avatar answered Sep 24 '22 02:09

trashgod