Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to save Spark RDD to local filesystem

can i save a file into the local system with the saveAsTextFile syntax ? This is how i'm writing the syntax to save a file: insert_df.rdd.saveAsTextFile("<local path>")

when i'm trying to do this i'm getting error as no permissions, but i have all the permissions to that specific local path, looks like it is treating the file as HDFS file.

like image 728
roh Avatar asked Oct 24 '16 19:10

roh


People also ask

How do I save a RDD file?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. Whereas you can read the RDD using textFile and sequenceFile function from SparkContext.

Can Spark write to local file system?

Spark can create distributed datasets from any storage source supported by Hadoop, including your local file system, HDFS, Cassandra, HBase, Amazon S3, etc. Spark supports text files, SequenceFiles, and any other Hadoop InputFormat.

How do I save my Spark output?

Saving the text files: Spark consists of a function called saveAsTextFile(), which saves the path of a file and writes the content of the RDD to that file. The path is considered as a directory, and multiple outputs will be produced in that directory. This is how Spark becomes able to write output from multiple codes.

Where are RDD stored?

Physically, RDD is stored as an object in the JVM driver and refers to data stored either in permanent storage (HDFS, Cassandra, HBase, etc.) or in a cache (memory, memory+disks, disk only, etc.), or on another RDD. RDD stores the following metadata: Partitions — a set of data splits associated with this RDD.


1 Answers

I think you should try "file:///local path" instead of "/local path".

like image 149
Simon Schiff Avatar answered Oct 02 '22 17:10

Simon Schiff