My initial data set are:
{'ID': [Row(userid=17562323, gross_merchandise_value=6072210944, country=u'ID'), Row(userid=29989283, gross_merchandise_value=4931252224, country=u'ID')]
the type of dict value is pyspark.sql.types.Row
How to convert the dict to the userid list? like below:
[17562323, 29989283],
just get the userid list.
PYSPARK ROW is a class that represents the Data Frame as a record. We can create row objects in PySpark by certain parameters in PySpark. The row class extends the tuple, so the variable arguments are open while creating the row class. We can create a row object and can retrieve the data from the Row.
All data types from the below table are supported in PySpark SQL. DataType class is a base class for all PySpark Types. Some types like IntegerType, DecimalType, ByteType e.t.c are subclass of NumericType which is a subclass of DataType. PySpark SQL Data Types
Using the row type as List. Insert the list elements as the Row Type and pass it to the parameter needed for the creation of the data frame in PySpark. These are the method by which a list can be created to Data Frame in PySpark.
DataType – Base Class of all PySpark SQL Types 1 All data types from the below table are supported in PySpark SQL. 2 DataType class is a base class for all PySpark Types. 3 Some types like IntegerType, DecimalType, ByteType e.t.c are subclass of NumericType which is a subclass of DataType.
thank you above all,the problem solved.I use row_ele.asDict()['userid'] in old_row_list to get the new_userid_list
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With