Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read a Json file with a specific format with Spark Scala?

I'm trying to read a Json file which is like :

[ 
{"IFAM":"EQR","KTM":1430006400000,"COL":21,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} 
,{"MLrate":"31","Nrout":"0","up":null,"Crate":"2"} 
,{"MLrate":"30","Nrout":"5","up":null,"Crate":"2"} 
,{"MLrate":"34","Nrout":"0","up":null,"Crate":"4"} 
,{"MLrate":"33","Nrout":"0","up":null,"Crate":"2"} 
,{"MLrate":"30","Nrout":"8","up":null,"Crate":"2"} 
]} 
,{"IFAM":"EQR","KTM":1430006400000,"COL":22,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} 
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"0"} 
,{"MLrate":"35","Nrout":"1","up":null,"Crate":"5"} 
,{"MLrate":"30","Nrout":"6","up":null,"Crate":"2"} 
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} 
,{"MLrate":"38","Nrout":"8","up":null,"Crate":"1"} 
]} 
,...
] 

I've tried the command:

    val df = sqlContext.read.json("namefile") 
    df.show() 

But this does not work : my columns are not recognized...

like image 732
SparkUser Avatar asked Nov 27 '25 10:11

SparkUser


1 Answers

If you want to use read.json you need a single JSON document per line. If your file contains a valid JSON array with documents it simply won't work as expected. For example if we take your example data input file should be formatted like this:

{"IFAM":"EQR","KTM":1430006400000,"COL":21,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}, {"MLrate":"31","Nrout":"0","up":null,"Crate":"2"}, {"MLrate":"30","Nrout":"5","up":null,"Crate":"2"} ,{"MLrate":"34","Nrout":"0","up":null,"Crate":"4"} ,{"MLrate":"33","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"8","up":null,"Crate":"2"} ]}
{"IFAM":"EQR","KTM":1430006400000,"COL":22,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"0"} ,{"MLrate":"35","Nrout":"1","up":null,"Crate":"5"} ,{"MLrate":"30","Nrout":"6","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"38","Nrout":"8","up":null,"Crate":"1"} ]}

If you use read.json on above structure you'll see it is parsed as expected:

scala> sqlContext.read.json("namefile").printSchema
root
 |-- COL: long (nullable = true)
 |-- DATA: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- Crate: string (nullable = true)
 |    |    |-- MLrate: string (nullable = true)
 |    |    |-- Nrout: string (nullable = true)
 |    |    |-- up: string (nullable = true)
 |-- IFAM: string (nullable = true)
 |-- KTM: long (nullable = true)
like image 96
zero323 Avatar answered Nov 30 '25 06:11

zero323



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!