Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Access names of fields in struct Spark SQL

I am trying to 'lift' the fields of a struct to the top level in a dataframe, as illustrated by this example:

case class A(a1: String, a2: String)
case class B(b1: String, b2: A)

val df = Seq(B("X",A("Y","Z"))).toDF

df.show    
+---+-----+
| b1|   b2|
+---+-----+
|  X|[Y,Z]|
+---+-----+

df.printSchema
root
 |-- b1: string (nullable = true)
 |-- b2: struct (nullable = true)
 |    |-- a1: string (nullable = true)
 |    |-- a2: string (nullable = true)

val lifted = df.withColumn("a1", $"b2.a1").withColumn("a2", $"b2.a2").drop("b2")

lifted.show
+---+---+---+
| b1| a1| a2|
+---+---+---+
|  X|  Y|  Z|
+---+---+---+

lifted.printSchema
 root
 |-- b1: string (nullable = true)
 |-- a1: string (nullable = true)
 |-- a2: string (nullable = true)

This works. I would like to create a little utility method which does this for me, probably through pimping DataFrame to enable something like df.lift("b2").

To do this, I think I want a way of obtaining a list of all fields within a Struct. E.g. given "b2" as input, return ["a1","a2"]. How do I do this?

like image 677
user2682459 Avatar asked May 27 '17 09:05

user2682459


2 Answers

If I understand your question correctly, you want to be able to list the nested fields of column b2.

So you would need to filter on b2, access the StructType of b2 and then map the names of the columns from within the fields (StructField):

import org.apache.spark.sql.types.StructType

val nested_fields = df.schema
                   .filter(c => c.name == "b2")
                   .flatMap(_.dataType.asInstanceOf[StructType].fields)
                   .map(_.name)

// nested_fields: Seq[String] = List(a1, a2)
like image 173
eliasah Avatar answered Oct 19 '22 11:10

eliasah


Actually you can use ".fieldNames.toList".

val nested_fields = df.schema("b2").fieldNames.toList

It returns a list of String. If you want a list of columns make a map.

I hope it helps.

like image 20
Sophie D. Avatar answered Oct 19 '22 09:10

Sophie D.