Can I write a plain text HDFS (or local) file from a Spark program, not from an RDD?

Tags:

I have a Spark program (in Scala) and a SparkContext. I am writing some files with RDD's saveAsTextFile. On my local machine I can use a local file path and it works with the local file system. On my cluster it works with HDFS.

I also want to write other arbitrary files as the result of processing. I'm writing them as regular files on my local machine, but want them to go into HDFS on the cluster.

SparkContext seems to have a few file-related methods but they all seem to be inputs not outputs.

How do I do this?

294

asked Oct 05 '15 15:10

Joe

1 Answers

Thanks to marios and kostya, but there are few steps to writing a text file into HDFS from Spark.

// Hadoop Config is accessible from SparkContext
val fs = FileSystem.get(sparkContext.hadoopConfiguration); 

// Output file can be created from file system.
val output = fs.create(new Path(filename));

// But BufferedOutputStream must be used to output an actual text file.
val os = BufferedOutputStream(output)

os.write("Hello World".getBytes("UTF-8"))

os.close()

Note that FSDataOutputStream, which has been suggested, is a Java serialized object output stream, not a text output stream. The writeUTF method appears to write plaint text, but it's actually a binary serialization format that includes extra bytes.

151

answered Sep 28 '22 16:09

Joe

Related questions
                            
                                Scala - how to explicitly choose which overloaded method to use when one arg must be null?
                            
                                How to write (String): Int function?
                            
                                How to set SBT default log level to "warn"?
                            
                                List elements matching on their type
                            
                                Why scala can't infer the type in a partial method?
                            
                                Mutating a mutable collection using map?
                            
                                Where can I find list of all special traits in Scala?
                            
                                In Scala, why does getInstance fail to work with GregorianCalendar?
                            
                                How to import FromString for joda-time?
                            
                                Executing bash strings using scala.sys.process
                            
                                XML to JSON with Scala
                            
                                an idiomatic way to initialize a Scala ArrayBuffer?
                            
                                JodaTime: format date with 1st, 2nd, 3rd, etc. day
                            
                                Scala import multiple packages
                            
                                Akka actors unit testing with Scala
                            
                                No Json deserializer found for type Option[reactivemongo.bson.BSONObjectID]
                            
                                Convert Seq or List to collection.immutable.Queue
                            
                                Is there any difference between flatten and flatMap(identity)?
                            
                                What is the difference between Clojure REPL and Scala REPL?
                            
                                Scala sort one list according to the values in another one

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can I write a plain text HDFS (or local) file from a Spark program, not from an RDD?

Tags:

scala

apache-spark

hadoop

Joe

People also ask

1 Answers

Joe

Recent Activity

Donate For Us