Convert a standard python key value dictionary list to pyspark data frame

Tags:

Consider i have a list of python dictionary key value pairs , where key correspond to column name of a table, so for below list how to convert it into a pyspark dataframe with two cols arg1 arg2?

 [{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""}]

How can i use the following construct to do it?

df = sc.parallelize([
    ...
]).toDF

Where to place arg1 arg2 in the above code (...)

338

asked Jun 02 '16 06:06

stackit

1 Answers

Old way:

sc.parallelize([{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""}]).toDF()

New way:

from pyspark.sql import Row
from collections import OrderedDict

def convert_to_row(d: dict) -> Row:
    return Row(**OrderedDict(sorted(d.items())))

sc.parallelize([{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""}]) \
    .map(convert_to_row) \ 
    .toDF()

154

answered Oct 13 '22 23:10

652bb3ca

Related questions
                            
                                Passing python dict to template [duplicate]
                            
                                Kafka-python retrieve the list of topics
                            
                                Multiple Python Interpreters used in the same project?
                            
                                Importing caffe results in ImportError: "No module named google.protobuf.internal" (import enum_type_wrapper)
                            
                                What is a `"Python"` layer in caffe?
                            
                                Make a Label Bold Tkinter
                            
                                Tensorflow: Attempting to use uninitialized value beta1_power
                            
                                How can we get the default behavior of __repr__()?
                            
                                VS Code Error: (this.configurationService.getValue(...) || []).filter is not a function
                            
                                Python: is there a C-like for loop available?
                            
                                Python random sequence with seed
                            
                                Python: Excluding Modules Pyinstaller
                            
                                Behaviour of Python's "yield"
                            
                                Play video file with VLC, then quit VLC
                            
                                Skip unittest if some-condition in SetUpClass fails
                            
                                Removing the common elements between two lists [duplicate]
                            
                                Image Cropping Tool (Python)
                            
                                How to remove this \xa0 from a string in python?
                            
                                Pickle File too large to load
                            
                                Pandas: select all dates with specific month and day

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Convert a standard python key value dictionary list to pyspark data frame

Tags:

python

dictionary

apache-spark

pyspark

stackit

People also ask

1 Answers

652bb3ca

Recent Activity

Donate For Us