I have a list of list of type:
[[1, 2, 3], ['A', 'B', 'C'], ['aa', 'bb', 'cc']]
Each list contains the values of the attributes 'A1', 'A2', and 'A3'.
I want to get the next dataframe:
+----------+----------+----------+
| A1 | A2 | A3 |
+----------+----------+----------+
| 1 | A | aa |
+----------+----------+----------+
| 2 | B | bb |
+----------+----------+----------+
| 3 | C | cc |
+----------+----------+----------+
How can I do it?
You can create a Row Class with the header as fields, and use zip
to loop through the list row wise and construct a row object for each row:
lst = [[1, 2, 3], ['A', 'B', 'C'], ['aa', 'bb', 'cc']]
from pyspark.sql import Row
R = Row("A1", "A2", "A3")
sc.parallelize([R(*r) for r in zip(*lst)]).toDF().show()
+---+---+---+
| A1| A2| A3|
+---+---+---+
| 1| A| aa|
| 2| B| bb|
| 3| C| cc|
+---+---+---+
Or if you have pandas installed, create a pandas data frame first; You can create a spark data frame from pandas data frame directly by using spark.createDataFrame
:
import pandas as pd
headers = ['A1', 'A2', 'A3']
pdf = pd.DataFrame.from_dict(dict(zip(headers, lst)))
spark.createDataFrame(pdf).show()
+---+---+---+
| A1| A2| A3|
+---+---+---+
| 1| A| aa|
| 2| B| bb|
| 3| C| cc|
+---+---+---+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With