Did my research, but didn't find anything on this. I want to convert a simple pandas.DataFrame to a spark dataframe, like this:
df = pd.DataFrame({'col1': ['a', 'b', 'c'], 'col2': [1, 2, 3]})
sc_sql.createDataFrame(df, schema=df.columns.tolist()) 
The error I get is:
TypeError: Can not infer schema for type: <class 'str'>
I tried something even simpler:
df = pd.DataFrame([1, 2, 3])
sc_sql.createDataFrame(df)
And I get:
TypeError: Can not infer schema for type: <class 'numpy.int64'>
Any help? Do manually need to specify a schema or so?
sc_sql is a pyspark.sql.SQLContext, I am in a jupyter notebook on python 3.4 and spark 1.6.
Thanks!
It's related to your spark version, latest update of spark makes type inference more intelligent. You could have fixed this by adding the schema like this :
mySchema = StructType([ StructField("col1", StringType(), True), StructField("col2", IntegerType(), True)])
sc_sql.createDataFrame(df,schema=mySchema)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With