Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert a standard python key value dictionary list to pyspark data frame

Consider i have a list of python dictionary key value pairs , where key correspond to column name of a table, so for below list how to convert it into a pyspark dataframe with two cols arg1 arg2?

 [{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""}]

How can i use the following construct to do it?

df = sc.parallelize([
    ...
]).toDF

Where to place arg1 arg2 in the above code (...)

like image 338
stackit Avatar asked Jun 02 '16 06:06

stackit


People also ask

How do I convert a dictionary to a DataFrame in PySpark?

To do this spark. createDataFrame() method method is used. This method takes two argument data and columns. The data attribute will contain the dataframe and the columns attribute will contain the list of columns name.

How do I turn a list into a DataFrame PySpark?

To do this first create a list of data and a list of column names. Then pass this zipped data to spark. createDataFrame() method. This method is used to create DataFrame.

Can I convert dictionary to DataFrame?

We can convert a dictionary to a pandas dataframe by using the pd. DataFrame. from_dict() class-method.


1 Answers

Old way:

sc.parallelize([{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""}]).toDF()

New way:

from pyspark.sql import Row
from collections import OrderedDict

def convert_to_row(d: dict) -> Row:
    return Row(**OrderedDict(sorted(d.items())))

sc.parallelize([{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""}]) \
    .map(convert_to_row) \ 
    .toDF()
like image 154
652bb3ca Avatar answered Oct 13 '22 23:10

652bb3ca