Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get elements of type structure of row by name in SPARK SCALA

In a DataFrame object in Apache Spark (I'm using the Scala interface), if I'm iterating over its Row objects, is there any way to extract structure values by name?

I am using the below code to extract by name but I am facing problem on how to read the struct value .

If values had been of type string then we could have done this:

 val resultDF=joinedDF.rdd.map{row=> 
      val id=row.getAs[Long]("id")
      val values=row.getAs[String]("slotSize")
      val feilds=row.getAs[String](values)
      (id,values,feilds)
      }.toDF("id","values","feilds")

But in my case values has the below schema

v1: struct (nullable = true)
     |    |-- level1: string (nullable = true)
     |    |-- level2: string (nullable = true)
     |    |-- level3: string (nullable = true)
     |    |-- level4: string (nullable = true)
     |    |-- level5: string (nullable = true)

What shall I replace this line with to make the code work given that value has the above structure.

  row.getAs[String](values)
like image 595
satyambansal117 Avatar asked Nov 10 '16 11:11

satyambansal117


People also ask

How do I get specific rows in Spark DataFrame?

Method 6: Using select() with collect() method This method is used to select a particular row from the dataframe, It can be used with collect() function. where, dataframe is the pyspark dataframe. Columns is the list of columns to be displayed in each row.

What is StructType and StructField in Spark?

Spark SQL StructType & StructField classes are used to programmatically specify the schema to the DataFrame and creating complex columns like nested struct, array and map columns. StructType is a collection of StructField's.

How do I select a struct column in Spark?

If you have a struct (StructType) column on PySpark DataFrame, you need to use an explicit column qualifier in order to select the nested struct columns.

How do I see the structure schema of the DataFrame in Spark SQL?

To get the schema of the Spark DataFrame, use printSchema() on Spark DataFrame object. From the above example, printSchema() prints the schema to console( stdout ) and show() displays the content of the Spark DataFrame.


1 Answers

You can access the struct elements my first extracting another Row (structs are modeled as another Row in spark) from the toplevel Row like this:

Scala Implementation

val level1 = row.getAs[Row]("struct").getAs[String]("level1")

Java Implementation

 String level1 = f.<Row>getAs("struct).getAs("level1").toString();  
like image 74
Raphael Roth Avatar answered Oct 19 '22 08:10

Raphael Roth