Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing JSON file and extracting keys and values using Spark

I'm new to spark. I have tried to parse the below mentioned JSON file in spark using SparkSQL but it didn't work. Can someone please help me to resolve this.

InputJSON:

[{"num":"1234","Projections":[{"Transactions":[{"14:45":0,"15:00":0}]}]}]

Expected output:

1234 14:45 0\n
1234 15:00 0

I have tried with the below code but it did not work

val sqlContext = new SQLContext(sc)
val df = sqlContext.read.json("hdfs:/user/aswin/test.json").toDF();
val sql_output = sqlContext.sql("SELECT num, Projections.Transactions FROM df group by Projections.TotalTransactions ")
sql_output.collect.foreach(println)

Output:

[01532,WrappedArray(WrappedArray([0,0]))]
like image 624
user3792699 Avatar asked Jan 19 '26 18:01

user3792699


1 Answers

Spark recognizes your {"14:45":0,"15:00":0} map as structure so probably the only way to read your data is to specify schema manually:

>>> from pyspark.sql.types import *
>>> schema = StructType([StructField('num', StringType()), StructField('Projections', ArrayType(StructType([StructField('Transactions', ArrayType(MapType(StringType(), IntegerType())))])))])

Then you can query this temporary table to get results using multiple exploding:

>>> sqlContext.read.json('sample.json', schema=schema).registerTempTable('df')
>>> sqlContext.sql("select num, explode(col) from (select explode(col.Transactions), num from (select explode(Projections), num from df))").show()
+----+-----+-----+
| num|  key|value|
+----+-----+-----+
|1234|14:45|    0|
|1234|15:00|    0|
+----+-----+-----+
like image 169
Mariusz Avatar answered Jan 22 '26 10:01

Mariusz



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!