I'm reading a dataframe from parquet file, which has nested columns (struct).
How can I check if nested columns are present?
It might be like this
+----------------------+
| column1              |
+----------------------+
|{a_id:[1], b_id:[1,2]}|
+----------------------+
or like this
+---------------------+
| column1             |
+---------------------+
|{a_id:[3,5]}         |
+---------------------+
I know, how to check if top-level column is present, as answered here: How do I detect if a Spark DataFrame has a column :
df.schema.fieldNames.contains("column_name")
But how can I check for nested column?
You can get schema of nested field as struct, and then check if your field is present in field names of it:
val index = df.schema.fieldIndex("column1")
val is_b_id_present = df.schema(index).dataType.asInstanceOf[StructType]
                          .fieldNames.contains("b_id")
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With