Is the following code for Mappers, reading a text file from HDFS right? And if it is:
InputStreamReader
? If so, how to do it without closing the filesystem?My code is:
Path pt=new Path("hdfs://pathTofile");
FileSystem fs = FileSystem.get(context.getConfiguration());
BufferedReader br=new BufferedReader(new InputStreamReader(fs.open(pt)));
String line;
line=br.readLine();
while (line != null){
System.out.println(line);
To read a file from HDFS, a client needs to interact with namenode (master) as namenode is the centerpiece of Hadoop cluster (it stores all the metadata i.e. data about the data).
You can use the Hadoop filesystem command to read any file. It supports the cat command to read the content.
To browse the HDFS file system in the HDFS NameNode UI, select Utilities > Browse the file system . The Browse Directory page is populated. Enter the directory path and click Go!.
This will work, with some amendments - i assume the code you've pasted is just truncated:
Path pt=new Path("hdfs://pathTofile");
FileSystem fs = FileSystem.get(context.getConfiguration());
BufferedReader br=new BufferedReader(new InputStreamReader(fs.open(pt)));
try {
String line;
line=br.readLine();
while (line != null){
System.out.println(line);
// be sure to read the next line otherwise you'll get an infinite loop
line = br.readLine();
}
} finally {
// you should close out the BufferedReader
br.close();
}
You can have more than one mapper reading the same file, but there is limit at which it makes more sense to use the Distributed Cache (not only reducing the load on the data nodes which host the blocks for the file but also will be more efficient if you have a job with a larger number of tasks than you have task nodes)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With