Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop FileSystem closed exception when doing BufferedReader.close()

From within the Reduce setup method,I am trying to close a BufferedReader object and getting a FileSystem closed exception. It does not happen all the time. This is the piece of code I used to create the BufferedReader.

    String fileName = <some HDFS file path>
    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(conf);
    Path hdfsPath = new Path(filename);
    FSDataInputStream in = fs.open(hdfsPath);
    InputStreamReader inputStreamReader = new InputStreamReader(fileInputStream);
    BufferedReader bufferedReader = new BufferedReader(inputStreamReader);

I read contents from the bufferedReader and once all the reading is done, I close it.

This is the piece of code that reads it

String line;
while ((line = reader.readLine()) != null) {
    // Do something
}

This the piece of code that closes the reader.

    if (bufferedReader != null) {
        bufferedReader.close();
    }

This is the stack trace for the exception that happens when I do a bufferedReader.close().

I, [2013-11-18T04:56:51.601135 #25683] INFO -- : attempt_201310111840_142285_r_000009_0: at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:565)

I, [2013-11-18T04:56:51.601168 #25683] INFO -- : attempt_201310111840_142285_r_000009_0: at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:522)

I, [2013-11-18T04:56:51.601199 #25683] INFO -- : attempt_201310111840_142285_r_000009_0: at java.io.FilterInputStream.close(FilterInputStream.java:155)

I, [2013-11-18T04:56:51.601230 #25683] INFO -- : attempt_201310111840_142285_r_000009_0: at sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:358)

I, [2013-11-18T04:56:51.601263 #25683] INFO -- : attempt_201310111840_142285_r_000009_0: at sun.nio.cs.StreamDecoder.close(StreamDecoder.java:173)

I, [2013-11-18T04:56:51.601356 #25683] INFO -- : attempt_201310111840_142285_r_000009_0: at java.io.InputStreamReader.close(InputStreamReader.java:182)

I, [2013-11-18T04:56:51.601395 #25683] INFO -- : attempt_201310111840_142285_r_000009_0: at java.io.BufferedReader.close(BufferedReader.java:497)

I am not sure why this exception is happening. This is not multithreaded and so, I do not expect there to be a race condition of any sort. Can you please help me understand.

Thanks,

Venk

like image 813
Venk K Avatar asked Nov 18 '13 21:11

Venk K


2 Answers

There is a little-known gotcha with the hadoop filesystem API: FileSystem.get returns the same object for every invocation with the same filesystem. So if one is closed anywhere, they are all closed. You could debate the merits of this decision, but that's the way it is.

So, if you attempt to close your BufferedReader, and it tries to flush out some data it has buffered, but the underlying stream is connected to a FileSystem that is already closed, you'll get this error. Check your code for any other places you are closing a FileSystem object, and look for race conditions. Also, I believe Hadoop itself will at some point close the FileSystem, so to be safe, you should probably only be accessing it from within the Reducer's setup, reduce, or cleanup methods (or configure, reduce, and close, depending on which API you're using).

like image 111
Joe K Avatar answered Oct 26 '22 23:10

Joe K


You have to use FileSystem.newInstance to avoid using a shared connection (as described by Joe K). It will give you a unique, non-shared instance.

like image 22
Marius Soutier Avatar answered Oct 26 '22 22:10

Marius Soutier