I have the following Python code that uses Spark:
from pyspark.sql import Row
def simulate(a, b, c):
dict = Row(a=a, b=b, c=c)
df = sqlContext.createDataFrame(dict)
return df
df = simulate("a","b",10)
df.collect()
I am creating a Row
object and I want to save it as a DataFrame
.
However, I am getting this error:
TypeError: Can not infer schema for type: <type 'str'>
It occurs on this line:
df = sqlContext.createDataFrame(dict)
What am I doing wrong?
We can create a DataFrame programmatically using the following three steps. Create an RDD of Rows from an Original RDD. Create the schema represented by a StructType matching the structure of Rows in the RDD created in Step 1. Apply the schema to the RDD of Rows via createDataFrame method provided by SQLContext.
Convert PySpark RDD to DataFrame In PySpark, toDF() function of the RDD is used to convert RDD to DataFrame. We would need to convert RDD to DataFrame as DataFrame provides more advantages over RDD.
It is pointless to create single element data frame. If you want to make it work despite that use list: df = sqlContext.createDataFrame([dict])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With