Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does checking whether a file exists in hadoop cause a NullPointerException?

Tags:

java

hadoop

I'm trying to create or open a file to store some output in HDFS, but I'm getting a NullPointerException when I call the exists method in the second to last line of the code snippet below:

DistributedFileSystem dfs = new DistributedFileSystem();
Path path = new Path("/user/hadoop-user/bar.txt");
if (!dfs.exists(path)) dfs.createNewFile(path);
FSDataOutputStream dos = dfs.create(path);

Here is the stack trace:

java.lang.NullPointerException
        at org.apache.hadoop.dfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:390)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
        at ClickViewSessions$ClickViewSessionsMapper.map(ClickViewSessions.java:80)
        at ClickViewSessions$ClickViewSessionsMapper.map(ClickViewSessions.java:65)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)

What could the problem be?

like image 940
jonderry Avatar asked Jan 18 '11 18:01

jonderry


2 Answers

I think the preferred way of doing this is:

Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://mynamenodehost:9000");
FileSystem fs = FileSystem.get(conf);
Path path = ...

That way you don't tie your code to a particular implementation of FileSystem; plus you don't have to worry about how each implementation of FileSystem is initialized.

like image 160
bajafresh4life Avatar answered Nov 02 '22 23:11

bajafresh4life


The default constructor DistributedFileSystem() does not perform initialization; you need to call dfs.initialize() explicitly.

The reason you are getting a null pointer exception is that the DistributedFileSystem internally uses an instance of DFSClient. Since you did not call initialize(), the instance of DFSClient is null. getFileStatus() calls dfsClient.getFileInfo(getPathName(f) - which causes NullPointerException, since dfsClient is null.

See https://trac.declarativity.net/browser/src/hdfs/org/apache/hadoop/dfs/DistributedFileSystem.java?rev=3593

like image 27
Oleg Ryaboy Avatar answered Nov 02 '22 22:11

Oleg Ryaboy