Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to load huge text file into a int array

I have a big text file (+100MB), each line being an integer number (containing 10 million numbers). Of course, the size and amount may change, so I don't know this in advance.

I want to load the file into a int[], making the process as fast as posible. First I came to this solution:

public int[] fileToArray(String fileName) throws IOException
{
    List<String> list = Files.readAllLines(Paths.get(fileName));
    int[] res = new int[list.size()];
    int pos = 0;
    for (String line: list)
    {
        res[pos++] = Integer.parseInt(line);
    }
    return res;
}

It was pretty fast, 5.5 seconds. Of which, 5.1s goes for the readAllLines call, and 0.4s for the loop.

But then I decided to try using BufferedReader, and came to this different solution:

public int[] fileToArray(String fileName) throws IOException
{
    BufferedReader bufferedReader = new BufferedReader(new FileReader(new File(fileName)));
    ArrayList<Integer> ints = new ArrayList<Integer>();
    String line;
    while ((line = bufferedReader.readLine()) != null)
    {
        ints.add(Integer.parseInt(line));
    }
    bufferedReader.close();

    int[] res = new int[ints.size()];
    int pos = 0;
    for (Integer i: ints)
    {
        res[pos++] = i.intValue();
    }
    return res;
}

This was even faster! 3.1 seconds, just 3s for the while loop and not even 0.1s for the for loop.

I know there is no much space here for optimization, at least in time, but using an ArrayList and then a int[] seems like too much memory to me.

Any ideas on how to make this faster, or avoid using the middle ArrayList?

Just for comparison, I do this same task with FreePascal in 1.9 seconds [see edit], using TStringList class and StrToInt function.

EDIT: Since I got a pretty short time with Java method, I had to improve the FreePascal one. 330~360ms.

like image 708
mclopez Avatar asked Mar 12 '23 09:03

mclopez


1 Answers

If you're using Java 8, you can eliminate this middle ArrayList by using lines() and then mapping to an int, then collecting the values into an array.

You should also be using try-with-resources for proper exception handling and auto-closing.

try (BufferedReader br = new BufferedReader(new FileReader(fileName))) {
    return br.lines()
             .mapToInt(Integer::parseInt)
             .toArray();
}

I'm not sure if this is faster, but it is certainly much easier to maintain.

Edit: It is apparently MUCH faster.

like image 155
4castle Avatar answered Mar 20 '23 21:03

4castle