I have following test program to read a file from HDFS.
public class FileReader {
public static final String NAMENODE_IP = "172.32.17.209";
public static final String FILE_PATH = "/notice.html";
public static void main(String[] args) throws MalformedURLException,
IOException {
String url = "hdfs://" + NAMENODE_IP + FILE_PATH;
InputStream is = new URL(url).openStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);
String line = br.readLine();
while(line != null) {
System.out.println(line);
line = br.readLine();
}
}
}
It is giving java.net.MalformedURLException
Exception in thread "main" java.net.MalformedURLException: unknown protocol: hdfs
at java.net.URL.<init>(URL.java:592)
at java.net.URL.<init>(URL.java:482)
at java.net.URL.<init>(URL.java:431)
at in.ksharma.hdfs.FileReader.main(FileReader.java:29)
Consider the figure: Step 1: The client opens the file it wishes to read by calling open () on the File System Object (which for HDFS is an instance of Distributed File System). Step 2: Distributed File System ( DFS) calls the name node, using remote procedure calls (RPCs), to determine the locations of the first few blocks in the file.
Now let us see how internally read operation is carried out in Hadoop HDFS, how data flows between the client, the NameNode, and DataNodes during file read. In order to open the required file, the client calls the open () method on the FileSystem object, which for HDFS is an instance of DistributedFilesystem.
If the url you have passed in the string which cannot be parsed or, without legal protocol a MalformedURLException is generated. In the following Java example we are tring to get establish a connection to a page and publishing the response.
HDFS follows Write Once Read Many philosophies. So we cannot edit files already stored in HDFS, but we can append new data to these files by re-opening them. To read the files stored in HDFS, the HDFS client interacts with the NameNode and DataNode.
Register Hadoop's Url handler. Standard Url handler won't know how to handle hdfs:// scheme.
Try this:
public static void main(String[] args) throws MalformedURLException,
IOException {
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
String url = "hdfs://" + NAMENODE_IP + FILE_PATH;
InputStream is = new URL(url).openStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);
String line = br.readLine();
while(line != null) {
System.out.println(line);
line = br.readLine();
}
}
I get the same issue while writing a Java application for reading from hdfs on hadoop 2.6. My solution is : Add
hadoop-2.X/share/hadoop/hdfs/hadoop-hdfs-2.X.jar to your classpath.
In our case we had to combine it with other answer:
https://stackoverflow.com/a/21118824/1549135
So firstly in our HDFS setup class (Scala code
):
val hadoopConfig: Configuration = new Configuration()
hadoopConfig.set("fs.hdfs.impl", classOf[DistributedFileSystem].getName)
hadoopConfig.set("fs.file.impl", classOf[LocalFileSystem].getName)
And later, like in accepted answer:
https://stackoverflow.com/a/25971334/1549135
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory)
Try(new URL(path))
Side note:
We already had:
"org.apache.hadoop" % "hadoop-hdfs" % "2.8.0"
in our dependencies and it did not help.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With