Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can not infer schema for type: <type 'str'>

I have the following Python code that uses Spark:

from pyspark.sql import Row

def simulate(a, b, c):
  dict = Row(a=a, b=b, c=c)
  df = sqlContext.createDataFrame(dict)
  return df

df = simulate("a","b",10)
df.collect()

I am creating a Row object and I want to save it as a DataFrame.

However, I am getting this error:

TypeError: Can not infer schema for type: <type 'str'>

It occurs on this line:

df = sqlContext.createDataFrame(dict)

What am I doing wrong?

like image 325
octavian Avatar asked Jul 05 '16 14:07

octavian


People also ask

How do you create a DataFrame from a schema?

We can create a DataFrame programmatically using the following three steps. Create an RDD of Rows from an Original RDD. Create the schema represented by a StructType matching the structure of Rows in the RDD created in Step 1. Apply the schema to the RDD of Rows via createDataFrame method provided by SQLContext.

How do you use toDF in PySpark?

Convert PySpark RDD to DataFrame In PySpark, toDF() function of the RDD is used to convert RDD to DataFrame. We would need to convert RDD to DataFrame as DataFrame provides more advantages over RDD.


1 Answers

It is pointless to create single element data frame. If you want to make it work despite that use list: df = sqlContext.createDataFrame([dict])

like image 50
user6022341 Avatar answered Oct 01 '22 05:10

user6022341