I have the following lists of rows that I want to convert to a PySpark df:
data= [Row(id=u'1', probability=0.0, thresh=10, prob_opt=0.45),
Row(id=u'2', probability=0.4444444444444444, thresh=60, prob_opt=0.45),
Row(id=u'3', probability=0.0, thresh=10, prob_opt=0.45),
Row(id=u'80000000808', probability=0.0, thresh=100, prob_opt=0.45)]
I need to convert it to a PySpark DF.
I have tried doing data.toDF()
:
AttributeError: 'list' object has no attribute 'toDF'
To append row to dataframe one can use collect method also. collect() function converts dataframe to list and you can directly append data to list and again convert list to dataframe.
This seems to work:
spark.createDataFrame(data)
Test results:
from pyspark.sql import SparkSession, Row
spark = SparkSession.builder.getOrCreate()
data = [Row(id=u'1', probability=0.0, thresh=10, prob_opt=0.45),
Row(id=u'2', probability=0.4444444444444444, thresh=60, prob_opt=0.45),
Row(id=u'3', probability=0.0, thresh=10, prob_opt=0.45),
Row(id=u'80000000808', probability=0.0, thresh=100, prob_opt=0.45)]
df = spark.createDataFrame(data)
df.show()
# +-----------+------------------+------+--------+
# | id| probability|thresh|prob_opt|
# +-----------+------------------+------+--------+
# | 1| 0.0| 10| 0.45|
# | 2|0.4444444444444444| 60| 0.45|
# | 3| 0.0| 10| 0.45|
# |80000000808| 0.0| 100| 0.45|
# +-----------+------------------+------+--------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With