I have a Spark Dataset similar to the example below:
0 1 2 3
+------+------------+--------------------+---+
|ItemID|Manufacturer| Category |UPC|
+------+------------+--------------------+---+
| 804| ael|Brush & Broom Han...|123|
| 805| ael|Wheel Brush Parts...|124|
+------+------------+--------------------+---+
I need to find the position of a column by searching the column header.
For Example:
int position=getColumnPosition("Category");
This should return 2.
Is there any Spark function supported on Dataset<Row>
datatype to find the column index or any java functions which can run on Spark dataset?
In order to convert Spark DataFrame Column to List, first select() the column you want, next use the Spark map() transformation to convert the Row to String, finally collect() the data to the driver which returns an Array[String] .
You can get the column index from the column name in Pandas using DataFrame. columns. get_loc() method.
Auto Loader also attempts to infer partition columns from the underlying directory structure of the data if the data is laid out in Hive style partitioning.
Spark Check if Column Exists in DataFrame Spark DataFrame has an attribute columns that returns all column names as an Array[String] , once you have the columns, you can use the array function contains() to check if the column present. Note that df. columns returns only top level columns but not nested struct columns.
You need to access the schema and read the field index as follows:
int position = df.schema().fieldIndex("Category");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With