Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create schema Array in data frame with spark

I have code to create data frame and this works fine if there is no array in my input data.

I tried using Json data which doen't have array and it runs successfully. My code is

val vals = sc.parallelize(
  """{"id":"1","name":"alex"}""" ::
  Nil
)

val schema = (new StructType)
      .add("id", StringType)
      .add("name", StringType)


  sqlContext.read.schema(schema).json(vals).select($"*").printSchema()

My question is, if I have input data with array like below then how to create schema?

     val vals = sc.parallelize(
  """{"id":"1","name":"alex","score":[{"keyword":"read","point":10}]}""" ::
  Nil
)


val schema = (new StructType)
      .add("id", StringType)
      .add("name", StringType)

Thanks.

like image 327
RJK Avatar asked Sep 14 '16 08:09

RJK


1 Answers

Oke, I could have solution in my code.

Create schema in array in data frame spark you can this code.

val vals = sc.parallelize(
  """{"id":"1","name":"alex","score":[{"keyword":"read","point":10}]}""" ::
  Nil
)

val schema = StructType(
      Array(
        StructField("id", StringType),
        StructField("name", StringType),
        StructField("score", ArrayType(StructType(Array(
          StructField("keyword", StringType),
          StructField("point", IntegerType)
        ))))
      )
    )

and you print schema

sqlContext.read.schema(schema).json(vals).select($"*").printSchema()

Thanks is resolved

like image 131
RJK Avatar answered Oct 25 '22 17:10

RJK