Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

My Java program which reads a large text file is running out of memory, can anyone help explain why?

Tags:

java

I have a large text file with 20 million lines of text. When I read the file using the following program, it works just fine, and in fact I can read much larger files with no memory problems.

public static void main(String[] args) throws IOException {
    File tempFile = new File("temp.dat");
    String tempLine = null;
    BufferedReader br = null;
    int lineCount = 0;
    try {
        br = new BufferedReader(new FileReader(tempFile));
        while ((tempLine = br.readLine()) != null) {
            lineCount += 1;
        }
    } catch (Exception e) {
        System.out.println("br error: " +e.getMessage());
    } finally {
        br.close();
        System.out.println(lineCount + " lines read from file");
    }
}

However if I need to append some records to this file before reading it, the BufferedReader consumes a huge amount of memory (I have just used Windows task manager to monitor this, not very scientific I know but it demonstrates the problem). The amended program is below, which is the same as the first one, except I am appending a single record to the file first.

public static void main(String[] args) throws IOException {
    File tempFile = new File("temp.dat");
    PrintWriter pw = null;
    try {
        pw = new PrintWriter(new BufferedWriter(new FileWriter(tempFile, true)));
        pw.println(" ");
    } catch (Exception e) {
        System.out.println("pw error: " + e.getMessage());
    } finally {
        pw.close();
    }

    String tempLine = null;
    BufferedReader br = null;
    int lineCount = 0;
    try {
        br = new BufferedReader(new FileReader(tempFile));
        while ((tempLine = br.readLine()) != null) {
            lineCount += 1;
        }
    } catch (Exception e) {
        System.out.println("br error: " +e.getMessage());
    } finally {
        br.close();
        System.out.println(lineCount + " lines read from file");
    }
}

A screenshot of Windows task manager, where the large bump in the line shows the memory consumption when I run the second version of the program.

task manager screenshot

So I was able to read this file without running out of memory. But I have much larger files with more than 50 million records, which encounter an out of memory exception when I run this program against them? Can someone explain why the first version of the program works fine on files of any size, but the second program behaves so differently and ends in failure? I am running on Windows 7 with:

java version "1.7.0_05"
Java(TM) SE Runtime Environment (build 1.7.0_05-b05)
Java HotSpot(TM) Client VM (build 23.1-b03, mixed mode, sharing)

like image 787
Wee Shetland Avatar asked Aug 30 '12 17:08

Wee Shetland


People also ask

How can I read large files?

The best way to view extremely large text files is to use… a text editor. Not just any text editor, but the tools meant for writing code. Such apps can usually handle large files without a hitch and are free. Large Text File Viewer is probably the simplest of these applications.

How do you search for a specific word in a large text file in Java?

Use a method from Scanner object - FindWithinHorizon. Scanner will internally make a FileChannel to read the file. And for pattern matching it will end up using a Boyer-Moore algorithm for efficient string searching.


1 Answers

you can start a Java-VM with VM-Options

-XX:+HeapDumpOnOutOfMemoryError

this will write a heap dump to a file, which can be analysed for finding leak suspects

Use a '+' to add an option and a '-' to remove an option.

If you are using Eclipse the Java Memory Analyzer Plugin MAT to get Heap-Dumps from running VMs with some nice analyses for Leak Suspects etc.

like image 123
jethroo Avatar answered Oct 21 '22 15:10

jethroo