I am using the Python API of Spark version 1.4.1.
My row object looks like this :
row_info = Row(name = Tim, age = 5, is_subscribed = false)
How can I get as a result, a list of the object attributes ?
Something like : ["name", "age", "is_subscribed"]
You can find all column names & data types (DataType) of PySpark DataFrame by using df. dtypes and df. schema and you can also retrieve the data type of a specific column name using df. schema["name"].
We can use col() function from pyspark. sql. functions module to specify the particular columns.
PYSPARK ROW is a class that represents the Data Frame as a record. We can create row objects in PySpark by certain parameters in PySpark. The row class extends the tuple, so the variable arguments are open while creating the row class. We can create a row object and can retrieve the data from the Row.
If you don't care about the order you can simply extract these from a dict
:
list(row_info.asDict())
otherwise the only option I am aware of is using __fields__
directly:
row_info.__fields__
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With