Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting a list of rows to a PySpark dataframe

I have the following lists of rows that I want to convert to a PySpark df:

data= [Row(id=u'1', probability=0.0, thresh=10, prob_opt=0.45),
 Row(id=u'2', probability=0.4444444444444444, thresh=60, prob_opt=0.45),
 Row(id=u'3', probability=0.0, thresh=10, prob_opt=0.45),
 Row(id=u'80000000808', probability=0.0, thresh=100, prob_opt=0.45)]

I need to convert it to a PySpark DF.

I have tried doing data.toDF():

AttributeError: 'list' object has no attribute 'toDF'

like image 939
Marcela Bejarano Avatar asked Aug 19 '19 15:08

Marcela Bejarano


People also ask

How do I add rows to a DataFrame in PySpark?

To append row to dataframe one can use collect method also. collect() function converts dataframe to list and you can directly append data to list and again convert list to dataframe.


1 Answers

This seems to work:

spark.createDataFrame(data)

Test results:

from pyspark.sql import SparkSession, Row

spark = SparkSession.builder.getOrCreate()

data = [Row(id=u'1', probability=0.0, thresh=10, prob_opt=0.45),
        Row(id=u'2', probability=0.4444444444444444, thresh=60, prob_opt=0.45),
        Row(id=u'3', probability=0.0, thresh=10, prob_opt=0.45),
        Row(id=u'80000000808', probability=0.0, thresh=100, prob_opt=0.45)]

df = spark.createDataFrame(data)
df.show()
#  +-----------+------------------+------+--------+
#  |         id|       probability|thresh|prob_opt|
#  +-----------+------------------+------+--------+
#  |          1|               0.0|    10|    0.45|
#  |          2|0.4444444444444444|    60|    0.45|
#  |          3|               0.0|    10|    0.45|
#  |80000000808|               0.0|   100|    0.45|
#  +-----------+------------------+------+--------+
like image 132
ZygD Avatar answered Oct 05 '22 12:10

ZygD