Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Scala, how to check if nested column is present in dataframe

I'm reading a dataframe from parquet file, which has nested columns (struct). How can I check if nested columns are present?

It might be like this

+----------------------+
| column1              |
+----------------------+
|{a_id:[1], b_id:[1,2]}|
+----------------------+

or like this

+---------------------+
| column1             |
+---------------------+
|{a_id:[3,5]}         |
+---------------------+

I know, how to check if top-level column is present, as answered here: How do I detect if a Spark DataFrame has a column :

df.schema.fieldNames.contains("column_name")

But how can I check for nested column?

like image 906
statanly Avatar asked Mar 14 '19 13:03

statanly


1 Answers

You can get schema of nested field as struct, and then check if your field is present in field names of it:

val index = df.schema.fieldIndex("column1")
val is_b_id_present = df.schema(index).dataType.asInstanceOf[StructType]
                          .fieldNames.contains("b_id")
like image 135
Viacheslav Shalamov Avatar answered Oct 20 '22 04:10

Viacheslav Shalamov