Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I get from 'pyspark.sql.types.Row' all the columns/attributes name?

I am using the Python API of Spark version 1.4.1.

My row object looks like this :

row_info = Row(name = Tim, age = 5, is_subscribed = false)

How can I get as a result, a list of the object attributes ? Something like : ["name", "age", "is_subscribed"]

like image 983
dng Avatar asked Jan 28 '16 16:01

dng


People also ask

How do I get all the column names in PySpark?

You can find all column names & data types (DataType) of PySpark DataFrame by using df. dtypes and df. schema and you can also retrieve the data type of a specific column name using df. schema["name"].

How do I get column values in PySpark?

We can use col() function from pyspark. sql. functions module to specify the particular columns.

What is PySpark SQL types row?

PYSPARK ROW is a class that represents the Data Frame as a record. We can create row objects in PySpark by certain parameters in PySpark. The row class extends the tuple, so the variable arguments are open while creating the row class. We can create a row object and can retrieve the data from the Row.


1 Answers

If you don't care about the order you can simply extract these from a dict:

list(row_info.asDict())

otherwise the only option I am aware of is using __fields__ directly:

row_info.__fields__
like image 87
zero323 Avatar answered Nov 13 '22 19:11

zero323