I have code to create data frame and this works fine if there is no array in my input data.
I tried using Json data which doen't have array and it runs successfully. My code is
val vals = sc.parallelize(
"""{"id":"1","name":"alex"}""" ::
Nil
)
val schema = (new StructType)
.add("id", StringType)
.add("name", StringType)
sqlContext.read.schema(schema).json(vals).select($"*").printSchema()
My question is, if I have input data with array like below then how to create schema?
val vals = sc.parallelize(
"""{"id":"1","name":"alex","score":[{"keyword":"read","point":10}]}""" ::
Nil
)
val schema = (new StructType)
.add("id", StringType)
.add("name", StringType)
Thanks.
Oke, I could have solution in my code.
Create schema in array in data frame spark you can this code.
val vals = sc.parallelize(
"""{"id":"1","name":"alex","score":[{"keyword":"read","point":10}]}""" ::
Nil
)
val schema = StructType(
Array(
StructField("id", StringType),
StructField("name", StringType),
StructField("score", ArrayType(StructType(Array(
StructField("keyword", StringType),
StructField("point", IntegerType)
))))
)
)
and you print schema
sqlContext.read.schema(schema).json(vals).select($"*").printSchema()
Thanks is resolved
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With