Spark: saveAsTextFile without compression

Tags:

By default, newer versions of Spark use compression when saving text files. For example:

val txt = sc.parallelize(List("Hello", "world", "!"))
txt.saveAsTextFile("/path/to/output")

will create files in .deflate format. It's quite easy to change compression algorithm, e.g. for .gzip:

import org.apache.hadoop.io.compress._
val txt = sc.parallelize(List("Hello", "world", "!"))
txt.saveAsTextFile("/path/to/output", classOf[GzipCodec])

But is there a way to save RDD as a plain text files, i.e. without any compression?

886

asked Oct 26 '16 13:10

ffriend

1 Answers

I can see the text file in HDFS without any compression with this code.

val conf = new SparkConf().setMaster("local").setAppName("App name")
val sc = new SparkContext(conf);
sc.hadoopConfiguration.set("mapred.output.compress", "false")
val txt = sc.parallelize(List("Hello", "world", "!"))
txt.saveAsTextFile("hdfs/path/to/save/file")

You can set all Hadoop related properties to hadoopConfiguration on sc.

Verified this code in Spark 1.5.2(scala 2.11).

168

answered Sep 19 '22 16:09

mrsrinivas

Related questions
                            
                                Method cannot be accessed in Macro generated class
                            
                                Is it possible/advisable to have a different supervision strategy for different children of an Akka 2 actor?
                            
                                implicit conversion over multiple levels, why does int to double automatically work?
                            
                                How to control the concurrency of future.sequence in scala?
                            
                                Spark (Scala) filter array of structs without explode
                            
                                Scala class file vs Java class file
                            
                                Can you suggest any good intro to Scala philosophy and programs design?
                            
                                Which are the implicit objects available by default on templates?
                            
                                How to exclude java source files in doc task?
                            
                                Play2 form attributes with - in them "value - is not a member of Symbol"
                            
                                sbt: suppressing logging prefix in stdout
                            
                                Scala Datatype for numeric real range
                            
                                Scalacheck won't properly report the failing case
                            
                                SBT and External Libraries in Intellij
                            
                                PlayFramework: How to generate JSON without optional fields set to null
                            
                                Overloaded method value with alternatives
                            
                                Scala class extends {}
                            
                                How do I create a custom scala library using sbt?
                            
                                Akka cluster-sharding: Can Entry actors have dynamic props
                            
                                Pure Java/Scala code for writing Tensorflow TFRecords data file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark: saveAsTextFile without compression

Tags:

compression

scala

apache-spark

ffriend

People also ask

1 Answers

mrsrinivas

Recent Activity

Donate For Us