Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JSON Struct to Map[String,String] using sqlContext

I am trying to read json data in spark streaming job. By default sqlContext.read.json(rdd) is converting all map types to struct types.

|-- legal_name: struct (nullable = true)
 |    |-- first_name: string (nullable = true)
 |    |-- last_name: string (nullable = true)
 |    |-- middle_name: string (nullable = true)

But when i read from hive table using sqlContext

val a = sqlContext.sql("select * from student_record")

below is the schema.

|-- leagalname: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

Is there any way we can read data using read.json(rdd) and get Map data type?

Is there any option like spark.sql.schema.convertStructToMap?

Any help is appreciated.

like image 409
yoga Avatar asked Oct 30 '22 13:10

yoga


1 Answers

You need to explicitly define your schema, when calling read.json.

You can read about the details in Programmatically specifying the schema in the Spark SQL Documentation.

For example in your specific case it would be

import org.apache.spark.sql.types._
val schema = StructType(List(StructField("legal_name",MapType(StringType,StringType,true))))

That would be one column legal_name being a map.

When you have defined you schema you can call sqlContext.read.json(rdd, schema) to create your data frame from your JSON dataset with the desired schema.

like image 97
Cedrik Neumann Avatar answered Nov 15 '22 08:11

Cedrik Neumann