I am doing some testing for spark using scala. We usually read json files which needs to be manipulated like the following example: test.json: <pre class="prettyprint"><code>{"a":1,"b":[2,3]} </code></pre> <pre class="prettyprint lang-scala prettyprint-override"><code>val test = sqlContext.read.json("test.json") </code></pre> How can I convert it to the following format: <pre class="prettyprint"><code>{"a":1,"b":2} {"a":1,"b":3} </code></pre>

You can use <code>explode</code> function: <pre class="prettyprint"><code>scala> import org.apache.spark.sql.functions.explode import org.apache.spark.sql.functions.explode scala> val test = sqlContext.read.json(sc.parallelize(Seq("""{"a":1,"b":[2,3]}"""))) test: org.apache.spark.sql.DataFrame = [a: bigint, b: array<bigint>] scala> test.printSchema root |-- a: long (nullable = true) |-- b: array (nullable = true) | |-- element: long (containsNull = true) scala> val flattened = test.withColumn("b", explode($"b")) flattened: org.apache.spark.sql.DataFrame = [a: bigint, b: bigint] scala> flattened.printSchema root |-- a: long (nullable = true) |-- b: long (nullable = true) scala> flattened.show +---+---+ | a| b| +---+---+ | 1| 2| | 1| 3| +---+---+ </code></pre>

Flattening Rows in Spark

{"a":1,"b":[2,3]}

val test = sqlContext.read.json("test.json")

How can I convert it to the following format:

{"a":1,"b":2} {"a":1,"b":3}

709

asked Oct 02 '15 11:10

Nir Ben Yaacov

1 Answers

You can use explode function:

scala> import org.apache.spark.sql.functions.explode import org.apache.spark.sql.functions.explode   scala> val test = sqlContext.read.json(sc.parallelize(Seq("""{"a":1,"b":[2,3]}"""))) test: org.apache.spark.sql.DataFrame = [a: bigint, b: array<bigint>]  scala> test.printSchema root  |-- a: long (nullable = true)  |-- b: array (nullable = true)  |    |-- element: long (containsNull = true)  scala> val flattened = test.withColumn("b", explode($"b")) flattened: org.apache.spark.sql.DataFrame = [a: bigint, b: bigint]  scala> flattened.printSchema root  |-- a: long (nullable = true)  |-- b: long (nullable = true)  scala> flattened.show +---+---+ |  a|  b| +---+---+ |  1|  2| |  1|  3| +---+---+

142

answered Sep 25 '22 12:09

zero323

Related questions
                            
                                How to use switch/case (simple pattern matching) in Scala?
                            
                                Does Scala have guards?
                            
                                forall in Scala
                            
                                Joining Spark dataframes on the key
                            
                                Compose and andThen methods
                            
                                Using Scala from Java: passing functions as parameters
                            
                                Identify and describe Scala's generic type constraints
                            
                                Comparing collection contents with ScalaTest
                            
                                Why is foreach better than get for Scala Options?
                            
                                How to implement Map with default operation in Scala
                            
                                Why aren't static methods considered good OO practice? [closed]
                            
                                What can I do to my scala code so it will compile faster?
                            
                                How to use Scala in IntelliJ IDEA (or: why is it so difficult to get a working IDE for Scala)?
                            
                                What are your experiences developing in Scala/Lift?
                            
                                String interpolation in Scala 2.10 - How to interpolate a String variable?
                            
                                Scala actors - worst practices? [closed]
                            
                                Testing an assertion that something must not compile
                            
                                Idiomatic way to update value in a Map based on previous value
                            
                                Spark: what's the best strategy for joining a 2-tuple-key RDD with single-key RDD?
                            
                                Checking if values in List is part of String

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Flattening Rows in Spark

Tags:

scala

distributed-computing

apache-spark

apache-spark-sql

Nir Ben Yaacov

People also ask

1 Answers

zero323

Recent Activity

Donate For Us