I have a data frame with a structure like this:
root
|-- npaDetails: struct (nullable = true)
| |-- additionalInformation: struct (nullable = true)
| |-- npaStatus: struct (nullable = true)
| |-- npaDetails: struct (nullable = true)
|-- npaHeaderData: struct (nullable = true)
| |-- npaNumber: string (nullable = true)
| |-- npaDownloadDate: string (nullable = true)
| |-- npaDownloadTime: string (nullable = true)
I want to retrieve all npaNumber from all the rows in the dataframe.
My approach was to iterate over all rows in the data frame to extract for each one the value stored in the column npaHeaderData, in the field npaNumber. So I code the following lines:
parquetFileDF.foreach { newRow =>
//To retrieve the second column
val column = newRow.get(1)
//The following line is not allowed
//val npaNumber= column.getAs[String]("npaNumber")
println(column)
}
The content of column printed in each iteration looks like:
[207400956,27FEB17,09.30.00]
But column is of type Any and I am not able extract any of its fields. Can anyone tell what am I doing wrong or what approach should I follow instead of this?
Thanks
if you are looking to extract only npaNumber then you can do
parquetFileDF.select($"npaHeaderData.npaNumber".as("npaNumber"))
you should have a dataframe with npaNumber column only.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With