I want to using Spark Streaming to retrieve data from Kafka. Now, I want to save my data in a remote HDFS. I know that I have to use the function saveAsText. However, I don't know precisely how to specify the path.
Is that correct if I write this:
myDStream.foreachRDD(frm->{
frm.saveAsTextFile("hdfs://ip_addr:9000//home/hadoop/datanode/myNewFolder");
});
where ip_addr
is the ip address of my hdfs remote server.
/home/hadoop/datanode/
is the DataNode HDFS directory created when I installed hadoop (I don't know if I have to specify this directory). And,
myNewFolder
is the folder where I want to save my data.
Thanks in advance.
Yassir
The path has to be a directory in HDFS.
For example, if you want to save the files inside a folder named myNewFolder
under the root /
path in HDFS.
The path to use would be hdfs://namenode_ip:port/myNewFolder/
On execution of the spark job this directory myNewFolder
will be created.
The datanode data directory which is given for the dfs.datanode.data.dir
in hdfs-site.xml
is used to store the blocks of the files you store in HDFS, should not be referenced as HDFS directory path.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With