How to convert list of dictionaries into Pyspark DataFrame

Tags:

I want to convert my list of dictionaries into DataFrame. This is the list:

mylist =  [   {"type_activity_id":1,"type_activity_name":"xxx"},   {"type_activity_id":2,"type_activity_name":"yyy"},   {"type_activity_id":3,"type_activity_name":"zzz"} ]

This is my code:

from pyspark.sql.types import StringType  df = spark.createDataFrame(mylist, StringType())  df.show(2,False)  +-----------------------------------------+ |                                    value| +-----------------------------------------+ |{type_activity_id=1,type_activity_id=xxx}| |{type_activity_id=2,type_activity_id=yyy}| |{type_activity_id=3,type_activity_id=zzz}| +-----------------------------------------+

I assume that I should provide some mapping and types for each column, but I don't know how to do it.

Update:

I also tried this:

schema = ArrayType(     StructType([StructField("type_activity_id", IntegerType()),                 StructField("type_activity_name", StringType())                 ])) df = spark.createDataFrame(mylist, StringType()) df = df.withColumn("value", from_json(df.value, schema))

But then I get null values:

+-----+ |value| +-----+ | null| | null| +-----+

584

asked Sep 08 '18 19:09

Markus

2 Answers

In the past, you were able to simply pass a dictionary to spark.createDataFrame(), but this is now deprecated:

mylist = [   {"type_activity_id":1,"type_activity_name":"xxx"},   {"type_activity_id":2,"type_activity_name":"yyy"},   {"type_activity_id":3,"type_activity_name":"zzz"} ] df = spark.createDataFrame(mylist) #UserWarning: inferring schema from dict is deprecated,please use pyspark.sql.Row instead #  warnings.warn("inferring schema from dict is deprecated,"

As this warning message says, you should use pyspark.sql.Row instead.

from pyspark.sql import Row spark.createDataFrame(Row(**x) for x in mylist).show(truncate=False) #+----------------+------------------+ #|type_activity_id|type_activity_name| #+----------------+------------------+ #|1               |xxx               | #|2               |yyy               | #|3               |zzz               | #+----------------+------------------+

Here I used ** (keyword argument unpacking) to pass the dictionaries to the Row constructor.

167

answered Sep 22 '22 12:09

pault

You can do it like this. You will get a dataframe with 2 columns.

mylist = [   {"type_activity_id":1,"type_activity_name":"xxx"},   {"type_activity_id":2,"type_activity_name":"yyy"},   {"type_activity_id":3,"type_activity_name":"zzz"} ]  myJson = sc.parallelize(mylist) myDf = sqlContext.read.json(myJson)

Output :

+----------------+------------------+ |type_activity_id|type_activity_name| +----------------+------------------+ |               1|               xxx| |               2|               yyy| |               3|               zzz| +----------------+------------------+

answered Sep 25 '22 12:09

pissall

Related questions
                            
                                Jenkins: what is the correct format for private key in Credentials
                            
                                I found invalid data while decoding error updating NuGet packages
                            
                                Module 'cv2.cv2' has no attribute 'ximgproc'
                            
                                Auto reconnect Blazor Serverside
                            
                                Uploading flutter app to AppStore gives App.framework does not support the minimum OS Version specified in the Info.plist
                            
                                Is a contiguous_range always a sized_range?
                            
                                How do I make Powershell run a batch file and then stay open?
                            
                                What’s your logging philosophy? [closed]
                            
                                "No symbols loaded for the current document" while debugging JavaScript in Visual Studio
                            
                                How can I tell if SP1 has been installed on VS2008?
                            
                                How to check if two objects are of the same type in Actionscript?
                            
                                How to trace T-SQL function calls

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With