how to save data in HDFS with spark?

Question

I want to using Spark Streaming to retrieve data from Kafka. Now, I want to save my data in a remote HDFS. I know that I have to use the function saveAsText. However, I don't know precisely how to specify the path.

Is that correct if I write this:

myDStream.foreachRDD(frm->{
    frm.saveAsTextFile("hdfs://ip_addr:9000//home/hadoop/datanode/myNewFolder");
});

where ip_addr is the ip address of my hdfs remote server. /home/hadoop/datanode/ is the DataNode HDFS directory created when I installed hadoop (I don't know if I have to specify this directory). And, myNewFolder is the folder where I want to save my data.

Thanks in advance.

Yassir

franklinsijo · Accepted Answer

The path has to be a directory in HDFS.

For example, if you want to save the files inside a folder named myNewFolder under the root / path in HDFS.

The path to use would be hdfs://namenode_ip:port/myNewFolder/

On execution of the spark job this directory myNewFolder will be created.

The datanode data directory which is given for the dfs.datanode.data.dir in hdfs-site.xml is used to store the blocks of the files you store in HDFS, should not be referenced as HDFS directory path.

how to save data in HDFS with spark?

Tags:

apache-spark

hadoop

hdfs

spark-streaming

Yassir S

1 Answers

franklinsijo

Recent Activity

Donate For Us

how to save data in HDFS with spark?

Tags:

apache-spark

hadoop

hdfs

spark-streaming

Yassir S

1 Answers

franklinsijo

Related questions

Recent Activity

Donate For Us